Wednesday, 2 October 2013

Extracting Files From tar.gz With Ruby

I always thought that it should be a trivial task. There are even some stackoverflow answers on that topic, but there is actually a catch that none of the answers talks about. Originally tar did not support paths longer than 100 chars. GNU tar is better and they implemented support for longer paths, but it was made through a hack called ././@LongLink. Shortly speaking, if you stumble upon an entry in tar archive which path equals to above mentioned ././@LongLink, that means that the following entry path is longer than 100 chars and is truncated. The full path of the following entry is actually the value of the current entry. So when extracting files from tar we also must have in mind this possibility.
require 'rubygems/package'
require 'zlib'

TAR_LONGLINK = '././@LongLink'
tar_gz_archive = '/path/to/archive.tar.gz'
destination = '/where/extract/to' tar_gz_archive ) do |tar|
  dest = nil
  tar.each do |entry|
    if entry.full_name == TAR_LONGLINK
      dest = File.join destination,
    dest ||= File.join destination, entry.full_name
      FileUtils.rm_rf dest unless dest
      FileUtils.mkdir_p dest, :mode => entry.header.mode, :verbose => false
    elsif entry.file?
      FileUtils.rm_rf dest unless File.file? dest dest, "wb" do |f|
      FileUtils.chmod entry.header.mode, dest, :verbose => false
    elsif entry.header.typeflag == '2' #Symlink!
      File.symlink entry.header.linkname, dest
    dest = nil