Monday, September 10, 2012

How to Run Dynamic Cloud Tests with 800 Tomcats, Amazon EC2, Jenkins and LiveRebel



I was brainstorming in the shower the other day, and I thought "Eureka!" - I need to bootstrap and test my Java app on a dynamic cluster with 800 Tomcat servers right now! Then, breakfast.

Obviously, every now and then you need to build a dynamic cluster of 800 Tomcat machines and then run some tests. Oh, wait, you don’t? Well, lets say you do. Provisioning your machines on the cloud for testing is a great way to "exercise" your app and work on:
  • Warming up: Bootstrap a clean slate, install the software, run your tests
  • Checking your Processes: Smoke testing for deploying the app to production
  • Ensuring success: Checking load handling before launching the application to real clientelle
  • Leaving nothing behind: After you've got all green lights, shut it all down and watch it disappear

At ZeroTurnaround, we need this for testing LiveRebel with larger deployments. LiveRebel is a tool to manage JEE production and QA deployments and online updates. It is crucial that we support large clusters of machines. Testing such environments is not an easy task but luckily in 2012 it is not about buying 800 machines but only provisioning them in the cloud for some limited time. In this article I will walk you through setting up a dynamic Tomcat cluster and running some simple tests on them. (Quick note: When I started writing this article, we had only tested this out with 100 Tomcat machines, but since then we grew to be able to support 800 instances with LiveRebel and the other tools).

Technical Requirements

Let me define a bunch of non-functional requirements that I've thought up. The end result should have 800 Tomcat nodes, each configured with LiveRebel. A load balancer should sit in front of the nodes and provide a single URL for tests, and we'll use the LiveRebel Command Center to manage our deployments.

Naturally, this is all easier said than done. The hurdles that we will need to overcome to achieve this are:
  • Provisioning all the nodes - starting/stopping Amazon EC2 instances
  • Installing required software - Java, Tomcat, LiveRebel
  • Configuring Tomcat cluster - (think jvmRoute param in conf/server.xml)
  • Configuring a load balancer (Apache) with all the provisioned IP addresses
  • Automation - one-click provision/start/stop/terminate on the cluster using Jenkins

Tools

We chose Amazon AWS as our cloud provider, namely because we've become familiar with them over the last couple years. For provisioning we use Knife and for configuration management we like Chef. For automation, we went with Jenkins (I love Jenkins), and we have two jobs. One to start the cluster and one to stop the cluster. Tests are not automated at the moment. Before going further you have to have a Chef server running on some machine (it should not be necessarily your own workstation) and Knife installed and configured on your Jenkins machine.

Architecture

Loadbalancer

First we have to create/launch a loadbalancer instance. Software to configure:

  • Install Apache
  • Install/enable Apache loadbalancer module
  • Update Apache configuration
  • Install LiveRebel Command Center and start it (could be a separate machine but we’ll use this instance for 2 services)

The load balancer should check in with chef-server and provide his own IP address. LiveRebel Command Center should be running and accepting incoming requests on default port (9001).

LiveRebel node(s)

As soon as the load balancer is ready we will create/launch nodes. Node instances need to:

  • Install a Tomcat instance
  • Figure out the IP of the load balancer
  • Download lr-agent-installer.jar from the LiveRebel CC
  • Run it (java -jar lr-agent-installer.jar)
  • Start Tomcat
Again, when everything is done the node will check in with chef-server and provide its IP address.

After all nodes are ready we must update the Apache load balancer configuration and provide all the IP addresses of the nodes. This is because of the architecture of the load balancer. It needs to know the IP addresses of the machines it balances.

Code (The Fun Part)

As we are using Chef, the natural way to act is to create several cookbooks and a couple of roles, that will help us with configuration. There are four cookbooks in total, one for each application: Apache, lrcc, Tomcat and Java. You can get familiar with them on Github. The code is provided more just for information, because it will not run as is. There are some download links missing. Another thing is that it was tested only on Ubuntu, so if you are using some other distribution, you may need to tune it up.
We are going to use Knife command line tool to start and bootstrap our instances. Don’t forget to install and configure Knife-EC2 rubygem. First step is to create the load balancer. Provided you have configured the Knife EC2 plugin and prepared the right AMI to launch (or use the default provided by Ubuntu) it is relatively easy, just run (with right parameters):

knife ec2 server create --run-list role[lr-loadbalancer] --node-name ‘loadbalancer’

When the process finishes successfully you can go to https://your-server-address:9001/ and check if the LiveRebel is running. It should be, but you will have to register or provide a license file. If you already have a license file you can automate the registration step by copying the license into LiveRebel home folder in your cookbook. Another thing to check is - if the load balancer has registered with chef-server.
Next step - creating lr-nodes. Your 800 nodes can be created using a similar Knife EC2 command run in loop:

for i in {1..800} ; do
    knife ec2 server create --run-list role[lr-node] --node-name ‘lr-node-$i’
done

Everything is almost ready! All we need now is to create Jenkins jobs. The first one - we’ll name it lr-cluster-create - should run these 2 commands and start the cluster. And the other one lr-cluster-delete - stops it with these commands:

ids=`knife ec2 server list | grep "lr-node|loadbalancer" | awk {'print $1'} | tr 'n' ' '`
knife ec2 server delete $ids --yes
knife node bulk delete "lr-node-.*" --yes
knife client bulk delete "lr-node-.*" --yes
knife node delete "loadbalancer" --yes
knife client delete "loadbalancer" --yes

Conclusions

At this point, you should be well on your way towards bootstrapping a clean environment, installing, running your tests, checking load handling, and then you can shut it all down once you've seen everything working to your satisfaction.

Your two Jenkins jobs are now able to spawn a dynamic Tomcat cluster. You can even parameterize your job and supply a number of nodes that you are interested in for a really dynamic cluster.
One thing to note is that as in EC2, Amazon charges for EBS snapshots, so it is not very cost-effective to just stop the cluster. Termination here will save you some money, especially if you like bigger clusters.
Another thing is provisioning. Parallel provisioning for the 800 nodes takes roughly 30 minutes. Starting a new instance from AMI takes some time, but most of it goes to bootstrapping the clean environment with the Chef installation and downloading packages and archives.
Once you have the cluster started you still need to run tests. We test deploying, updating the whole cluster with LiveRebel. You could be testing your own WEB application and see how it handles the load.
The next steps for us is to automate the test suite and have these large scales tests executed regularly. This will give us valuable feedback about releases in progress and their scalability.
I hope this article has helped you get started with dynamic Tomcat clusters and I’m more than happy to go into more detail about any step here if you have questions - just contact me.

Tuesday, March 27, 2012

GeekOut 2012 14-15 June

The GeekOut is back, now twice as long, informative, interesting and exiting! :D Registration has opened - become an early geek and get the lowest price. Don't forget to check the programme out.

Thursday, December 22, 2011

Stanford AI Class Finished

Stanford AI class is over. I was very interesting and I learned a lot new things. Unfortunately I sucked at the first 2 questions of final exam and my score is not so high as I would like it to be. But there will be new interesting courses in spring - I will take my "revenge" there. So far I have signed up to Machine Learning, Game Theory and Design and Analysis of Algorithms courses and I hope I will have enough time for that. :)



Monday, November 14, 2011

Java: Save InputStream Into File

Imagine we need to save an InputStream into file. That can happen when requesting some url, or when just copying the file from one place to another on hard disk. There are a lot of answers provided by google on request java save inputstream to file. I have checked the first result page and everywhere almost 1 and the same solution is provided, which includes the following loop:

int read = 0;
byte[] bytes = new byte[1024];
 
while ((read = inputStream.read(bytes)) != -1) {
 out.write(bytes, 0, read);
}

Seriously, guys, don't you think there is something wrong here? Even in C++ people do not copy streams by operating bytes anymore! There should be a lot better way :) (I am not considering now usage of any additional libraries that require some additional jar files).

import sun.misc.IOUtils;

new FileOutputStream("tmp.txt").write(IOUtils.readFully(inputStream, -1, false));

Wednesday, August 24, 2011

Top 10 Jenkins Must-Have Plugins

We at ZeroTurnaround have been using Jenkins for a long time already, and at last decided to create a small review of plugins and features we use.

Friday, June 10, 2011

Ruby Pack/Unpack

Having a party in ZeroTurnaround new office in Tartu. There is a mat on the floor near the entrance door that says:

01010111011001010110110001100011011011110110110101100101

Using ruby, we can quickly figure out, what that actually means:

["01010111011001010110110001100011011011110110110101100101"].pack('B*') #==> Welcome

Ruby string pack unpack detailed usage.

Thursday, June 9, 2011

GeekOut: The First Java Conference In Estonia

Today I have attended the first Java conference in Estonia: GeekOut. That is also my first conference at all, as I have never participated earlier in such an event. Nipping on ahead I could say, that the activity was a success. It was good organized, informative, interesting, with great food and beer (although I do not drink alcohol :) ).

The day started with small introduction from Jevgeni Kabanov, the founder and CTO of ZeroTurnaround, the organizers of the GeekOut. After that the main part began.

The first talk was about Java 7 by Martijn Verburg. It was about the new features that are coming in JDK 7. Project Coin announced:

  • Strings in switch
  • Automatic Resource Management (such as c# using construction)
  • Numeric literals with underscores. (Like in ruby: 3_456_789)
  • Improved Type Inference for Generic Instance Creation
    We can skip generics on the right side:
    Map<String, List<String>> anagrams = new HashMap<>();
  • New file handling mechanism, which "is finally done right, I hope" (M.V.)
  • Nonblocking I/O for sockets and files, new Files and Paths classes that provide some util functions to work with filelike objects, that are quicker than in JDK 6, multicatch and some others...

Next speech was about "Riding the data tsunami and coming out on top" by Alex Snaps. He introduced Terracotta - an open source solution for application scalability, availability and performance.

After a coffee break Joonas Lehtinen told us about Vaadin}> - the web framework for java, that allows creating rich internet applications fast and without a line in javascript. Interesting fact is that Vaadin is actually "10 years old, but 21 months young" - as the framework exists already for a long time, but only recently it became popular. You can write everything in java, no javascript debugging, no html, and if you are writing in some other jvm language, then you don't need java too! Scala, Groovy, Jruby ... are all good. Also a nice coding session, where Joonas wrote a nice web application in 15 minutes! I liked it, very impressive.

John Davies talked about integration of applications. Banking realities: FpML, 100-100000 messages/sec, latency critical - 10ms, xml is evil - no time for <>, network cards process the messages, not cpu - how the time is critical, 30PB of cache?! We also learned, why common message mapping is bad, when you have huge amount of data. That is converting to common message format and from it, when we can actually convert straight. What we need actually is to know how to convert each field of the message between POJO and particular format. XPath using in objects... and so on.

After the lunch noSql databases, particularly GraphDB called Neo4j was covered by Peter Neubauer. It was a kind of introductory presentation, showing the concepts of the database, with examples of searching and storing the data and also conditions in which GraphDBs can and should be used instead of Sql databases.

Jevgeni Kabanov showed problems that one may face doing the update of web application. For instance, the out of memory error problem arises because of previous version of application not being garbage collected, if there is a not matter how small leak, because it has the link to the classloader and it has the link to the previous version of application.

The last talk, presented once again by Martijn Verburg, was a relaxing one. Very funny, but at the same time serious. Have to figure out yourself, what qualities of a Diabolic Programmer are good, what not, and what good to some extent.

The closing of the GeekOut conference was in cafe in the same building. Free pizza and some drinks - nice ending of the good day. Next time I definitely will attend GeekOut again!

The Slides:

PS. I will list all the slides of the presentations here as soon as they appear.

Wednesday, May 11, 2011

Google Code Jam 2011 Qualification Round

This Saturday Google Code Jam 2011 Qualification Round took place. I didn't have much time to spend on the problems, but still have solved a couple and proceeded to Round 1.

The ones that I solved are Magicka and Candy Splitting.

Magicka

First I tried to solve it with some cunning string substitution involving regular expressions. But in the end the simple simulation did the trick. I actually was a little bit disappointed after the official solutions were announced, because I thought that there should be some trick in here.

#!/usr/bin/ruby

lines = ARGF.readlines

t = lines.first.to_i
(1..t).each do |test_id|
  arr = lines[test_id].split " "
  c = arr[0].to_i
  combines = arr[1..c].inject({}){|memo, e| memo[e[0..1]]=e[2]; memo[e[1]+e[0]]=e[2]; memo }
  d = arr[c+1].to_i
  opposed = arr[c+2..c+1+d]
  n = arr[c+2+d].to_i
  elems = arr[c+3+d] #.split("").collect {|i| i.to_sym }

  result = []
  elems.split("").each do |elem|
    result << elem
    if result.length > 1
      last2 = result[-2..-1].join
      if combines.key?(last2)
        result.pop(2)
        result = result << combines[last2]
      end
      opposed.each do |o|
        if result.include?(o[0]) && result.include?(o[1])
          result = []
          break
        end
      end
    end
  end

  puts "Case ##{test_id}: [#{result.join(", ")}]"
end

Candy Splitting

This one was a little bit more tricky and the author's solution is much more simpler and clear than mine. I made it using old school brute force, just like Goro (strength is my strength :) )

#!/usr/bin/ruby

def s( candies, pile1, pile2 )
 #p "c:#{candies}, 1:#{pile1}, 2:#{pile2}"
 if candies.empty?
  if !pile1.empty? && !pile2.empty? && pile1.inject(0){|memo, i| memo ^ i} == pile2.inject(0){|memo, i| memo ^ i}
   return pile1
  else
   return false
  end
 end
 x = candies.shift
 p1 = pile1.clone
 return s( candies, p1 << x, pile2) || s( candies, pile1, pile2 << x)
end

lines = ARGF.readlines

t = lines.first.to_i
(1..t).each do |test_id|
 n_candies = lines[test_id*2-1]
 candies = lines[test_id*2].split(" ").collect{|i| i.to_i }

 r = s( candies.sort.reverse, [], [] )
 r = !r ? "NO" : r.inject(:+)

 puts "Case ##{test_id}: #{r}"
end