Download large csv file from sftp server in chunks ruby

1.2k views Asked by At

I want to download and process csv file that is on sftp server line by line. If I am using download! or sftp.file.open, it is buffering whole data in memory that I want to avoid.

Here is my source code:

sftp = Net::SFTP.start(@sftp_details['server_ip'], @sftp_details['server_username'], :password => decoded_pswd)
  if sftp
    begin
      sftp.dir.foreach(@sftp_details['server_folder_path']) do |entry|
        print_memory_usage do
          print_time_spent do
            if entry.file? && entry.name.end_with?("csv")
              batch_size_cnt = 0
              sftp.file.open("#{@sftp_details['server_folder_path']}/#{entry.name}") do |file|
                header = file.gets
                header = header.force_encoding(header.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
                csv_data = ''
                while line = file.gets
                  batch_size_cnt += 1
                  csv_data.concat(line.force_encoding(line.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: ''))
                  if batch_size_cnt == 1000 || file.eof?
                    CSV.parse(csv_data, {headers: header, write_headers: true}) do |row|
                      row.delete(nil) 
                      entities << row.to_hash       
                    end
                    csv_data, batch_size_cnt = '', 0
                    courses.delete_if(&:blank?)
                    # DO PROCESSING PART
                    entities = []
                  end
                end if header
              end
              sftp.rename("#{@sftp_details['server_folder_path']}/#{entry.name}", "#{@sftp_details['processed_file_path']}/#{entry.name}")
            end
          end
        end
end

Can someone please help? Thanks

1

There are 1 answers

0
tukan On

You need to add some kind of buffer to be able to read chunks and then write them all together. I think it would be wise to split in your script parsing and downloading. Focus on one thing at the time:

Your original line:

   ...
   sftp.file.open("#{@sftp_details['server_folder_path']}/#{entry.name}") do |file|
   ...

If you check the source file of the download! (don't forget the bang!) method you can use 'stringio'. A stub which you can easily adjust. Usually the default buffer, which is 32kB, is sufficient. You can change it if you want (see the example).

Replace with (works only with single files) :

The StringIO usage:

   ...
  io = StringIO.new
  sftp.download!("#{@sftp_details['server_folder_path']}/#{entry.name}", io.puts, :read_size => 16000))

OR you can just download a file

  ...
  file = File.open("/your_local_path/#{entry.name}",'wb')
  sftp.download!("#{@sftp_details['server_folder_path']}/#{entry.name}", file, :read_size => 16000)
  ....

From the Doc's you can use an option :read_size:

:read_size - the maximum number of bytes to read at a time from the source. Increasing this value might improve throughput. It defaults to 32,000 bytes.