Why does the tez engine also add a reduce phase to the simplest insert statement, and how to remove it through configuration?

38 views Asked by At

This is the hive-sql:

insert into my_orc_table_25 select * from my_orc_table limit 5;

And these are the schemas:

CREATE TABLE my_orc_table (
    id INT,
    name STRING
)

STORED AS ORC;

CREATE TABLE my_orc_table_25 as select id,name from my_orc_table limit 25;

Explain Plan

My env : hive-3.1.0,tez-0.10.2

I tried to modify the following configuration items, but it didn't work

set hive.compute.query.using.stats=false;
set hive.stats.fetch.column.stats=false;
set hive.stats.fetch.partition.stats=false;
set hive.groupby.skewindata=false;
set hive.exec.dynamic.partition=false;
2

There are 2 answers

0
OneCricketeer On

Reduce stages are required to write data to HDFS.

Map-only jobs read data as-is, and don't further process it.

0
Raid On

From explain plan I can see reducer is doing compute stats and file merge.

You can try after disabling both of them using below settings:

set hive.stats.autogather=false;
set hive.merge.tezfiles=false;