Wednesday 2 October 2013

Extracting Files From tar.gz With Ruby

I always thought that it should be a trivial task. There are even some stackoverflow answers on that topic, but there is actually a catch that none of the answers talks about. Originally tar did not support paths longer than 100 chars. GNU tar is better and they implemented support for longer paths, but it was made through a hack called ././@LongLink. Shortly speaking, if you stumble upon an entry in tar archive which path equals to above mentioned ././@LongLink, that means that the following entry path is longer than 100 chars and is truncated. The full path of the following entry is actually the value of the current entry. So when extracting files from tar we also must have in mind this possibility.
require 'rubygems/package'
require 'zlib'

TAR_LONGLINK = '././@LongLink'
tar_gz_archive = '/path/to/archive.tar.gz'
destination = '/where/extract/to'

Gem::Package::TarReader.new( Zlib::GzipReader.open tar_gz_archive ) do |tar|
  dest = nil
  tar.each do |entry|
    if entry.full_name == TAR_LONGLINK
      dest = File.join destination, entry.read.strip
      next
    end
    dest ||= File.join destination, entry.full_name
    if entry.directory?
      FileUtils.rm_rf dest unless File.directory? dest
      FileUtils.mkdir_p dest, :mode => entry.header.mode, :verbose => false
    elsif entry.file?
      FileUtils.rm_rf dest unless File.file? dest
      File.open dest, "wb" do |f|
        f.print entry.read
      end
      FileUtils.chmod entry.header.mode, dest, :verbose => false
    elsif entry.header.typeflag == '2' #Symlink!
      File.symlink entry.header.linkname, dest
    end
    dest = nil
  end
end