Why number of buckets in hive should be equal to number of reducers?

Question

Why number of buckets in hive should be equal to number of reducers?

1.6k views Asked by Ramprakash At 03 August 2017 at 06:42

In hive, why number of buckets should be equal to number of reducers?

There are 2 answers

Archit Agarwal On 29 January 2019 at 16:14

Number of reducers launched while inserting into a bucketed table is a divisor of number of buckets in that table. The divisor, which is closest to the max reducers set, is selected and that many reducers are launched.

Example:

Num of buckets in a table 5956.
hive.exec.reducers.max=1009
divisors of 5956=1489*4
number of launched reducers: 4

so either 1489 or 4 reducers can be launched but since max reducers that can be launched are 1009, only 4 reducers will run which can take a decade to run for big sized table.

Setting hive.exec.reducers.max=2000 will launch 1489 reducers.

**Cloudkollektiv** · Accepted Answer · 2017-09-18T09:56:51+00:00

Because this is the most optimized way of working for mapreduce (all else equal). Tasks will be divided among reducers.

In hive 0.x and 1.x you have to specify the following: hive.enforce.bucketing = true. This means that the number of reducers will be automatically determined based on the number of buckets in your table. In later versions of hive (2.x) this is set by default.

Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables

TechQA.

Why number of buckets in hive should be equal to number of reducers?

There are 2 answers

Related Questions in APACHE

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in PARTITIONING

Related Questions in BUCKETS

Popular Questions

Trending Questions