Friday, April 29, 2016

Processing Large Cumulative Files with Camel and Filtering the duplicates

     It is very uncommon in enterprise world where applications should consume data from files, which range to millions of records and often containing duplicates. The application requirements could be filter out the data , insert it in to a database or queue for further processing.Camel has a component called Idempotent Consumer for the exact purpose,. Combined with the File component, let us examine how efficient it can be.

     We are going to use a file with 1 million orders (111112 records are duplicate ) enclosed by orders tag as below , we will read this file , split the file and then insert the order records to the Order table and product records to the product table.


     Now obviously loading this file completely in to memory to process is not always a feasible approach , so camel provides us with the Splitter EIP along with its streaming capability , what this does is to read the file chunk by chunk based on the token provided as delimiter and stream these results ( advantage there is no xml models loaded , we can receive the chunk as a plain string).


     The code snippet reads the xml file and splits it at orders , marshals the tokenized xml to order object and aggregates 1000 orders before sending it to the next end point , Observe lines 14 and 16 , there are two completion strategies provided to the aggregator.The line number 16 ensures that the aggregate does not keep waiting infinitely for the 1000 configured records to arrive ( and there by making the program to hand forever if they do not come / or not present ) and time out the provided mill second lapse.

     So the above code reads a file splits the xml as tokens and aggregates them by 1000 before sending it to DB end point , what about filtering the records that we talked about. Would you believe adding 4 lines of code would now enable us do so , that is exactly how powerful camel can get.

     Just add the value ( could be the node in the xml , or a value of the node that needs to be unique ) that needs to be unique to the header , add a idempotent repository component and provide the header to the repository as key as the basis for duplicate filtering.And just like that we have a code that reads from large file filters duplicate orders and sends to a different endpoint for further processing.
     I was able to achieve about 10582 records inserted to db per second and I have not even started to look in to improving the code for memory reduction, hit me with your comments.

For the complete Code and the processing details refer to my Git Hub Link Here.

Wednesday, April 27, 2016

Colocated Symmetrical Live and Backup Cluster on Jboss EAP - With Parameterization

            For demonstrating the HA fail-over mode with parameters, this article will use two nodes, the configuration can however be extended as needed to numerous nodes. Traditionally for the Collocated failover mode setup, the full ha profile in the domain.xml will be replicated (after the hornetq server is itself copied over in the profile to create a back-up), this however will become tedious approach when there are say 4 or 5 cluster live and back up combinations.  Follow the below steps and life should be a little more easy

Step 1 : 
Dowload the jboss eap 6.4 version
Make two copies of the unzipped folder as master and node.
Step 2 :
Open domain.xml under the master/domain/configuration directory
Delete the profile default , ha , full
Navigate to the messaging subsystem under full-ha profile
Copy the hornetq-server and paste it right after the first hornet-server section
Provide a name to the second hornetq-server to differentiate it from the first , it can be any name , let us choose backup as the name.


Step 3:  Make changes to the  default hornet-server section  as below
           

            
The line no 14 shows the parameterization , the group name of the default server is now provided by the parameter groupa while the servers are configured to start. Also change the server-id of the in-vm connector and acceptor to a unique number on line numbers 24 and 35 ,this is because we are have to differentiate between the live and back up server ( which we are going to configure in next step )


Step 4:  Change the configuration the back-up server created by copying the original hornetq-server as below
            


                The line number 6 is changed to appropriately represent that this server will act as a back up . The backup group name is now represented by the parameter groupb .Also the server-id is now incremented to 2 to make it different from the live server running on  the same server . Also observe the changes on the socket-binding to messaging2 , this is to cater to the need that the backup server belonging to ${groupb} will come up when its master on a different server comes down . During such times this will help resolve the port-conflict that will arise with the groupa live server running on the current server. ( Jump to Step 4 to see the actual configuration of the socket-binding )

Step 4 :
            Move to the full-ha-sockets section of the socket-binding-group, create a new socket-binding named messaging2 and provide the port as 5446 ( you can opt to provide any number which would be different from the messaging socket-binding and which will not conflict with the other eap ports , this is a tested out port).

Step 5 :
            Move to the server-groups section and remove one of the server-group referring to the “full” profile . Change the server group name to “hornetqparamcluster”, this can be any thing or you can choose to leave the name as is, just remember the name.

Step 6:
Create a copy of the host-slave.xml file under the master/../configuration directory.

Step 7 :
Run the script ./add-user.sh under the master/bin directory ( this will be domain controller node ) , create a Management Realm user called admin .
Run the script once again and create a user called myhornetqcluster , during the interactive steps , as shown below.



         Copy the text <secret value="xxxxxx"/> 

Step 8 : 
  •  Open the host-slave.xml file under the master/../configuration directory and add the below text under the management security realm node 
         <server-identities>
               <secret value="xxxxxx"/>
         </server-identities>
         where xxxxxx is the text copied in the Step 7.
  • Move to the domain-controller section and remove the <local/> node and uncomment the <remote> node, add the attribute  username to the node with the value as “myhornetqcluster”  created in step 7.
  • Repeat the above two steps for the file host-slave-node.xml
  • Move to the servers section of the file host-slave.xml and change the group of the server to hornetqparamcluster .
  • Add the system properties section to provide values to the ${groupa}  and ${groupb} parameters , provide the offset to 1000. Remove the second server configuration.
  • Repeat the above step by editing the host-slave-node.xml but reverse the values of groupa and groub parameters , also provide the offset to 2000 , (remember we are running all the nodes on the single server ).

Step 9
  • Start the domain controller  by running domain.sh under /master/bin as below

            ./domain.sh --domain-config=domain.xml --host-config=host-master.xml -b=192.168.56.101 
           -bmanagement=192.168.56.101
  • Start the hornetq live server group by running the domain.sh under /master/bin in a separate window

         ./domain.sh --domain-config=domain.xml --host-config=host-slave.xml 
        -b=192.168.56.101 -bmanagement=192.168.56.101 
        -Djboss.domain.master.address=192.168.56.101 -Djboss.management.native.port=9993
  • Start the hornet second live server group by running the domain.sh under /node/bin in a separate window

         ./domain.sh --domain-config=domain.xml --host-config=host-slave-node.xml 
        -b=192.168.56.101 -bmanagement=192.168.56.101 
        -Djboss.domain.master.address=192.168.56.101 -Djboss.management.native.port=9993


        Change the IP address to the address of your server.
          You should now have the hornetq servers in Colocated Symmetrical Live and Backup Cluster on Jboss EAP.