Wednesday 22 February 2017

Suddenly

I decided to start writing again, but I moved to github pages.
Follow me on https://dracoater.github.io

Saturday 14 December 2013

Testing Chef Cookbooks. Part 2.5. Speeding Up ChefSpec Run

Disclaimer: as of Jan 4 2014, this is already implemented inside ChefSpec (>= 3.1.2), so you don't have to do anything. The post just describes the problem and solution with more details.

Last time we were speaking about testing Chef recipes, I introduced to you ChefSpec as a very good tool for running unit tests on your cookbooks. But lately I encountered a problem with it. I have over 800 unit tests (aka examples in RSpec world) and now it takes about 20 minutes to run. 20 minutes!!! That's a extremely long time for this kind of task. So I decided to delve, what exactly is responsible for taking so much time.

My examples look like that (many recipes have similar example groups for windows and mac_os_x):

describe "example::default" do
  context 'ubuntu' do
    subject { ChefSpec::ChefRunner.new( :platform => 'ubuntu', :version => '12.04' ).converge described_recipe }
    let( :node ) { subject.node }

    it { should do_something }
    it 'does some other thing' do
      should do_another_thing
    end
    it { should do_much_more }
  end
end

I put some printouts inside describe, context, subject and let blocks, as well as read RSpec documentation about let and subject. Turned out, that subject and let blocks are called for every test, i.e. they are cached when accessed inside 1 test (it block), but not across tests inside test group (in our case ubuntu context). So for these tests subject is actually calculated 3 times. That is not a problem for ordinary RSpec tests, where subject most of the time is an object returned by constructor, e.g. User.new. But in ChefSpec case we have a converge operation as Subject under Test (SuT), which is more costly and takes more time to calculate. Another difference is that, opposing to ordinary RSpec tests we do not change the SuT in ChefSpec, but just make sure that it has right resources with right actions. So running converge for every example is a huge overhead.

How can we fix that? Well, obviously we should somehow save the value across the examples. I tried different approaches, some of them worked partially, some didn't at all. The simplest thing was to use before :all block.

describe "example::default" do
  context 'ubuntu' do
    before :all { @chef_run = ChefSpec::ChefRunner.new( :platform => 'ubuntu', :version => '12.04' ).converge described_recipe }
    subject { @chef_run }
    [...]
  end
end

It does not require any more than small change in spec files, but the drawback of this approach is no mocking is supported in before :all block. So if you have to mock for example file existence, it would not work:

describe "example::default" do
  context 'ubuntu' do
    before :all do
      ::File.stub( :exists? ).with( '/some/path/' ).and_return false
      @chef_run = ChefSpec::ChefRunner.new( :platform => 'ubuntu', :version => '12.04' ).converge described_recipe
    end
    subject { @chef_run }
    [...]
  end
end

RSpec allows to extend modules with your own methods and the idea was to write method similar to let, but which will cache the results across examples too. Create a spec_helper.rb file somewhere in your Chef project and add the following lines there:

module SpecHelper
  @@cache = {}
  FINALIZER = lambda {|id| @@cache.delete id }

  def shared( name, &block )
    location = ancestors.first.metadata[:example_group][:location]
    define_method( name ) do
      unless @@cache.has_key? Thread.current.object_id
        ObjectSpace.define_finalizer Thread.current, FINALIZER
      end
      @@cache[Thread.current.object_id] ||= {}
      @@cache[Thread.current.object_id][location + name.to_s] ||= instance_eval( &block )
    end
  end

  def shared!( name, &block )
    shared name, &block
    before { __send__ name }
  end
end

RSpec.configure do |config|
  config.extend SpecHelper
end

Values from @@cache are never deleted, and you can use same names with this block, so I also use location of the usage, which looks like that: "./cookbooks/my_cookbook/spec/default_spec.rb:3". Now change subject into shared( :subject ) in your specs:

describe "example::default" do
  context 'ubuntu' do
    shared( :subject ) { ChefSpec::ChefRunner.new( :platform => 'ubuntu', :version => '12.04' ).converge described_recipe }
    [...]
  end
end

And when running the tests you will now have to include the spec_helper.rb too:

rspec --include ./relative/path/spec_helper.rb cookbooks/*/spec/*_spec.rb

If you use the rake task I introduced in previous post, add the following line to it.

desc 'Runs specs with chefspec.'
RSpec::Core::RakeTask.new :spec, [:cookbook, :recipe, :output_file] do |t, args|
  [...]
  t.rspec_opts += ' --require ./relative/path/spec_helper.rb'
  [...]
end

And that's all! Now tests run in 2 minutes. 10 times faster!

Wednesday 2 October 2013

Extracting Files From tar.gz With Ruby

I always thought that it should be a trivial task. There are even some stackoverflow answers on that topic, but there is actually a catch that none of the answers talks about. Originally tar did not support paths longer than 100 chars. GNU tar is better and they implemented support for longer paths, but it was made through a hack called ././@LongLink. Shortly speaking, if you stumble upon an entry in tar archive which path equals to above mentioned ././@LongLink, that means that the following entry path is longer than 100 chars and is truncated. The full path of the following entry is actually the value of the current entry. So when extracting files from tar we also must have in mind this possibility.
require 'rubygems/package'
require 'zlib'

TAR_LONGLINK = '././@LongLink'
tar_gz_archive = '/path/to/archive.tar.gz'
destination = '/where/extract/to'

Gem::Package::TarReader.new( Zlib::GzipReader.open tar_gz_archive ) do |tar|
  dest = nil
  tar.each do |entry|
    if entry.full_name == TAR_LONGLINK
      dest = File.join destination, entry.read.strip
      next
    end
    dest ||= File.join destination, entry.full_name
    if entry.directory?
      FileUtils.rm_rf dest unless File.directory? dest
      FileUtils.mkdir_p dest, :mode => entry.header.mode, :verbose => false
    elsif entry.file?
      FileUtils.rm_rf dest unless File.file? dest
      File.open dest, "wb" do |f|
        f.print entry.read
      end
      FileUtils.chmod entry.header.mode, dest, :verbose => false
    elsif entry.header.typeflag == '2' #Symlink!
      File.symlink entry.header.linkname, dest
    end
    dest = nil
  end
end

Monday 23 September 2013

Testing Chef Cookbooks. Part 2. Chefspec.

So now you have less errors and typos in your cookbooks, thanks to foodcritic. But you are still far from confident that your cookbook will not fail to run on some node. Next step for acquiring it is unit tests (aka specs in ruby world).

Ruby has already a great spec library useful for unit testing every kind of project - it's called rspec. Many specialized unit test libraries are based on it and so is chefspec - the gem to write unit tests for your cookbooks.

Chefspec makes it easy to write unit tests for Chef recipes, get feedback fast on changes in cookbooks. So first let's install it.

sudo gem install rake chefspec --no-ri --no-rdoc

This will also add create_specs command to knife, which creates specs for particular existing cookbook:

knife cookbook create_specs my_cookbook

After this you will get a separate *_spec.rb file in my_cookbook/specs/ for every recipe file. Chefspec readme has very good examples teaching how to write tests. A couple of things I personally do different is I use subject and should instead of let(:chef_run) and expect(chef_run).to, because it allows to omit subject in some cases: (Read why RSpec developers actually recommend using expect_to syntax)

#Chefspec recommendations
describe "example::default" do
  let( :chef_run ){ ChefSpec::ChefRunner.new.converge described_recipe }
  it { expect(chef_run).to do_something }
  it 'does some other thing' do
    expect(chef_run).to do_another_thing
  end
end

#My typical specs
describe "example::default" do
  subject { ChefSpec::ChefRunner.new.converge described_recipe }
  it { should do_something }
  it 'does some other thing' do
    should do_another_thing
  end
end

We can also integrate it with Jenkins by making rspec output results in JUnit xml format that Jenkins understands. We need another gem for that:
sudo gem install rake rspec_junit_formatter --no-ri --no-rdoc
Now we can run rspec with the following parameters and it will output test results into test-results.xml:
rspec my_cookbook --format RspecJunitFormatter --out test-results.xml
Rspec also supports rake, so it may be more convenient to use it to run specs on your cookbooks:
desc 'Runs specs with chefspec.'
RSpec::Core::RakeTask.new :spec, [:cookbook, :recipe, :output_file] do |t, args|
 args.with_defaults( :cookbook => '*', :recipe => '*', :output_file => nil )
 t.verbose = false
 t.fail_on_error = false
 t.rspec_opts = args.output_file.nil? ? '--format d' : "--format RspecJunitFormatter --out #{args.output_file}"
 t.ruby_opts = '-W0' #it supports ruby options too
 t.pattern = "cookbooks/#{args.cookbook}/spec/#{args.recipe}_spec.rb"
end

Tuesday 17 September 2013

Testing Chef Cookbooks. Part 1. Foodcritic.

This post have been in draft for almost a year. At last I have made myself to turn back to my blog and continue writing.

A couple of years ago we have adopted automating our server configuration through chef recipes. In the beginning we didn't have many cookbooks and all the recipes were more or less simple. But as time passed new applications had to be installed and configured, including some not so simple scenarios. Turned out that "hit and miss" method wasn't too good. We came across a lot of errors, such as:
  • typo: simple as that. Although knife makes a syntax check on uploading cookbooks to server, but many times it was a wrong path or something like that, which could be revealed only after we tried to provision a node.
  • missed dependencies: we ran our scripts on ec2 instances, and to save time on bootstrapping the new instance, we didn't do it every time we changed something in our scripts, but only the first time. When we finally got the recipe working on this node, sometimes it turned out that the recipe will not be working on another clean node, because we forgot some dependencies, that were somehow already installed on our test node.
  • interfering applications: some recipes were used for configuring several applications. Although they were very similar, they were not identical, and sometimes changing the recipe for installing one application broke installation of the other one.

At last we came to the idea, that infrastructure code as any other code should be tested. Currently we have built a pipeline using Jenkins that first runs syntax check, then coding conventions tests, then unit tests and finally integration tests on cookbooks. And only if all the tests are passing, Jenkins runs knife cookbook upload, publishing the cookbooks on chef-server.

In my following posts I will share with you how we established this testing architecture for our chef recipes. There will 3 parts in the tutorial, we will start from the easiest checks that only make sure that your ruby code can be parsed and follows some code conventions.

First of all make sure you are running ruby 1.9.2 or newer. (We use Ubuntu on our linux servers, so all the code I provide is tested in Ubuntu 12.04 LTS.)
sudo aptitude install ruby1.9.3
Now we need rake for building our project and foodcritic for testing.
sudo gem install rake foodcritic --no-ri --no-rdoc
Chef cookbook repository has a Rakefile in it. Now when we have rake installed, we can try and run rake inside the folder Rakefile is in.
rake
This should run syntax check on your recipes, templates and files inside cookbooks. Next we can try to run foodcritic tests on your cookbooks. Type
foodcritic cookbooks
and hit enter (assuming that "cookbooks" is the folder your cookbooks are in). If it founds some warnings, it will print them out, otherwise an empty string will be printed.

Of course you may not agree with some of the rules and would like to ignore them. This could be achieved by providing additional options to foodcritic command. You can also write your own rules and add them into checks. For example, there was a FC001 rule, which stated "Use strings in preference to symbols to access node attributes". I actually preferred vice versa using symbols to strings. So I created a new rule:
rule "JT001", "Use symbols in preference to strings to access node attributes" do
 tags %w{style attributes jt}
 recipe do |ast|
  attribute_access( ast, :type => :string )
 end
end
and saved it into foodcritic-rules.rb file. Then it was easy to disable the existing FC001 rule and enable mine with:
foodcritic cookbooks --include foodcritic-rules.rb --tags ~FC001
There are also some 3rd party rules available, so you have something to start with.

Now we will join rake and foodcritic together by creating a rake task that runs foodcritic tests. Add a new rake task similar to:
desc "Runs foodcritic linter"
task :foodcritic do
  if Gem::Version.new("1.9.2") <= Gem::Version.new(RUBY_VERSION.dup)
    sh "foodcritic cookbooks --epic-fail correctness"
  else
    puts "WARN: foodcritic run is skipped as Ruby #{RUBY_VERSION} is < 1.9.2."
  end
end
task :default => 'foodcritic'
Now when you run rake it should run both syntax and code conventions tests.

Next post will be about unit tests using chefspec.

Monday 10 September 2012

How to Run Dynamic Cloud Tests with 800 Tomcats, Amazon EC2, Jenkins and LiveRebel



I was brainstorming in the shower the other day, and I thought "Eureka!" - I need to bootstrap and test my Java app on a dynamic cluster with 800 Tomcat servers right now! Then, breakfast.

Obviously, every now and then you need to build a dynamic cluster of 800 Tomcat machines and then run some tests. Oh, wait, you don’t? Well, lets say you do. Provisioning your machines on the cloud for testing is a great way to "exercise" your app and work on:
  • Warming up: Bootstrap a clean slate, install the software, run your tests
  • Checking your Processes: Smoke testing for deploying the app to production
  • Ensuring success: Checking load handling before launching the application to real clientelle
  • Leaving nothing behind: After you've got all green lights, shut it all down and watch it disappear

At ZeroTurnaround, we need this for testing LiveRebel with larger deployments. LiveRebel is a tool to manage JEE production and QA deployments and online updates. It is crucial that we support large clusters of machines. Testing such environments is not an easy task but luckily in 2012 it is not about buying 800 machines but only provisioning them in the cloud for some limited time. In this article I will walk you through setting up a dynamic Tomcat cluster and running some simple tests on them. (Quick note: When I started writing this article, we had only tested this out with 100 Tomcat machines, but since then we grew to be able to support 800 instances with LiveRebel and the other tools).

Technical Requirements

Let me define a bunch of non-functional requirements that I've thought up. The end result should have 800 Tomcat nodes, each configured with LiveRebel. A load balancer should sit in front of the nodes and provide a single URL for tests, and we'll use the LiveRebel Command Center to manage our deployments.

Naturally, this is all easier said than done. The hurdles that we will need to overcome to achieve this are:
  • Provisioning all the nodes - starting/stopping Amazon EC2 instances
  • Installing required software - Java, Tomcat, LiveRebel
  • Configuring Tomcat cluster - (think jvmRoute param in conf/server.xml)
  • Configuring a load balancer (Apache) with all the provisioned IP addresses
  • Automation - one-click provision/start/stop/terminate on the cluster using Jenkins

Tools

We chose Amazon AWS as our cloud provider, namely because we've become familiar with them over the last couple years. For provisioning we use Knife and for configuration management we like Chef. For automation, we went with Jenkins (I love Jenkins), and we have two jobs. One to start the cluster and one to stop the cluster. Tests are not automated at the moment. Before going further you have to have a Chef server running on some machine (it should not be necessarily your own workstation) and Knife installed and configured on your Jenkins machine.

Architecture

Loadbalancer

First we have to create/launch a loadbalancer instance. Software to configure:

  • Install Apache
  • Install/enable Apache loadbalancer module
  • Update Apache configuration
  • Install LiveRebel Command Center and start it (could be a separate machine but we’ll use this instance for 2 services)

The load balancer should check in with chef-server and provide his own IP address. LiveRebel Command Center should be running and accepting incoming requests on default port (9001).

LiveRebel node(s)

As soon as the load balancer is ready we will create/launch nodes. Node instances need to:

  • Install a Tomcat instance
  • Figure out the IP of the load balancer
  • Download lr-agent-installer.jar from the LiveRebel CC
  • Run it (java -jar lr-agent-installer.jar)
  • Start Tomcat
Again, when everything is done the node will check in with chef-server and provide its IP address.

After all nodes are ready we must update the Apache load balancer configuration and provide all the IP addresses of the nodes. This is because of the architecture of the load balancer. It needs to know the IP addresses of the machines it balances.

Code (The Fun Part)

As we are using Chef, the natural way to act is to create several cookbooks and a couple of roles, that will help us with configuration. There are four cookbooks in total, one for each application: Apache, lrcc, Tomcat and Java. You can get familiar with them on Github. The code is provided more just for information, because it will not run as is. There are some download links missing. Another thing is that it was tested only on Ubuntu, so if you are using some other distribution, you may need to tune it up.
We are going to use Knife command line tool to start and bootstrap our instances. Don’t forget to install and configure Knife-EC2 rubygem. First step is to create the load balancer. Provided you have configured the Knife EC2 plugin and prepared the right AMI to launch (or use the default provided by Ubuntu) it is relatively easy, just run (with right parameters):

knife ec2 server create --run-list role[lr-loadbalancer] --node-name ‘loadbalancer’

When the process finishes successfully you can go to https://your-server-address:9001/ and check if the LiveRebel is running. It should be, but you will have to register or provide a license file. If you already have a license file you can automate the registration step by copying the license into LiveRebel home folder in your cookbook. Another thing to check is - if the load balancer has registered with chef-server.
Next step - creating lr-nodes. Your 800 nodes can be created using a similar Knife EC2 command run in loop:

for i in {1..800} ; do
    knife ec2 server create --run-list role[lr-node] --node-name ‘lr-node-$i’
done

Everything is almost ready! All we need now is to create Jenkins jobs. The first one - we’ll name it lr-cluster-create - should run these 2 commands and start the cluster. And the other one lr-cluster-delete - stops it with these commands:

ids=`knife ec2 server list | grep "lr-node|loadbalancer" | awk {'print $1'} | tr 'n' ' '`
knife ec2 server delete $ids --yes
knife node bulk delete "lr-node-.*" --yes
knife client bulk delete "lr-node-.*" --yes
knife node delete "loadbalancer" --yes
knife client delete "loadbalancer" --yes

Conclusions

At this point, you should be well on your way towards bootstrapping a clean environment, installing, running your tests, checking load handling, and then you can shut it all down once you've seen everything working to your satisfaction.

Your two Jenkins jobs are now able to spawn a dynamic Tomcat cluster. You can even parameterize your job and supply a number of nodes that you are interested in for a really dynamic cluster.
One thing to note is that as in EC2, Amazon charges for EBS snapshots, so it is not very cost-effective to just stop the cluster. Termination here will save you some money, especially if you like bigger clusters.
Another thing is provisioning. Parallel provisioning for the 800 nodes takes roughly 30 minutes. Starting a new instance from AMI takes some time, but most of it goes to bootstrapping the clean environment with the Chef installation and downloading packages and archives.
Once you have the cluster started you still need to run tests. We test deploying, updating the whole cluster with LiveRebel. You could be testing your own WEB application and see how it handles the load.
The next steps for us is to automate the test suite and have these large scales tests executed regularly. This will give us valuable feedback about releases in progress and their scalability.
I hope this article has helped you get started with dynamic Tomcat clusters and I’m more than happy to go into more detail about any step here if you have questions - just contact me.

Tuesday 27 March 2012

GeekOut 2012 14-15 June

The GeekOut is back, now twice as long, informative, interesting and exiting! :D Registration has opened - become an early geek and get the lowest price. Don't forget to check the programme out.

Thursday 22 December 2011

Stanford AI Class Finished

Stanford AI class is over. I was very interesting and I learned a lot new things. Unfortunately I sucked at the first 2 questions of final exam and my score is not so high as I would like it to be. But there will be new interesting courses in spring - I will take my "revenge" there. So far I have signed up to Machine Learning, Game Theory and Design and Analysis of Algorithms courses and I hope I will have enough time for that. :)