I am trying to export a table with partitions. By default, it generates number of files based on the number of partitions. Is there a property I can set to merge the files, what is the performance consideration for making this change.
Few of the properties I found around merging small files, but all them seem to work inside a partition.
set hive.merge.tezfiles=true;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.size.per.task=128000000;
set hive.merge.smallfiles.avgsize=128000000;
I also don't have the option to write a separate concat code to append the files at the end.
if I understood your question correctly, you could do a
select * from table_nameand export the result into a file. This will have all the data along with the partition name in a separate columns.More on beeline output in the official doc.
I don't think merging all the files from partition is a good approach as it may lead to data corruption.