I want to download and process csv file that is on sftp server line by line.
If I am using download! or sftp.file.open, it is buffering whole data in memory that I want to avoid.
Here is my source code:
sftp = Net::SFTP.start(@sftp_details['server_ip'], @sftp_details['server_username'], :password => decoded_pswd)
if sftp
begin
sftp.dir.foreach(@sftp_details['server_folder_path']) do |entry|
print_memory_usage do
print_time_spent do
if entry.file? && entry.name.end_with?("csv")
batch_size_cnt = 0
sftp.file.open("#{@sftp_details['server_folder_path']}/#{entry.name}") do |file|
header = file.gets
header = header.force_encoding(header.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
csv_data = ''
while line = file.gets
batch_size_cnt += 1
csv_data.concat(line.force_encoding(line.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: ''))
if batch_size_cnt == 1000 || file.eof?
CSV.parse(csv_data, {headers: header, write_headers: true}) do |row|
row.delete(nil)
entities << row.to_hash
end
csv_data, batch_size_cnt = '', 0
courses.delete_if(&:blank?)
# DO PROCESSING PART
entities = []
end
end if header
end
sftp.rename("#{@sftp_details['server_folder_path']}/#{entry.name}", "#{@sftp_details['processed_file_path']}/#{entry.name}")
end
end
end
end
Can someone please help? Thanks
You need to add some kind of buffer to be able to read chunks and then write them all together. I think it would be wise to split in your script parsing and downloading. Focus on one thing at the time:
Your original line:
If you check the source file of the
download!(don't forget the bang!) method you can use 'stringio'. A stub which you can easily adjust. Usually the default buffer, which is 32kB, is sufficient. You can change it if you want (see the example).Replace with (works only with single files) :
The
StringIOusage:OR you can just download a file
From the Doc's you can use an option
:read_size: