In hive, why number of buckets should be equal to number of reducers?
Why number of buckets in hive should be equal to number of reducers?
1.6k views Asked by Ramprakash At
2
There are 2 answers
0
Archit Agarwal
On
Number of reducers launched while inserting into a bucketed table is a divisor of number of buckets in that table. The divisor, which is closest to the max reducers set, is selected and that many reducers are launched.
Example:
Num of buckets in a table 5956.
hive.exec.reducers.max=1009
divisors of 5956=1489*4
number of launched reducers: 4
so either 1489 or 4 reducers can be launched but since max reducers that can be launched are 1009, only 4 reducers will run which can take a decade to run for big sized table.
Setting hive.exec.reducers.max=2000 will launch 1489 reducers.
Related Questions in APACHE
- Special access rule in an .htaccess file for IP addresses, authorized only for one directory structure
- How to isolate PHP apps from each other on a local machine(Windows or Linux)?
- Cannot load modules/mod_dav_svn.so into server
- How to ignore case in regexp mapping in a .htaccess rewrite rule?
- Oracle Http server ISNT-07551
- I cant access file directory with PHP local host on XAMPP. it just shows one of the files I have in my visual studio code
- Apache Reverse Proxy: only one proxy directive is working. Second one is ignored
- Issue with Django --> Apache WSGI deployment
- changing the node version used by apache web server
- Apache: How can I redirect to a subfolder with a URL param but serve required content via the main URL?
- Why/How does Apache auto-include "DHE" TLS1.2 ciphers while nginx needs "dhparams" file?
- Set up MX records in apache/Ubuntu to point to external mail server
- How to proxy to another port?
- Php can not upload file out of /var/www/html even after disabling Selinux
- Serve static site on S3 + CloudFlare with Apache retaining the source URL
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in HIVE
- Type Adapter for Offset in hive flutter
- HIVE Sql Date conversion
- How to set spark.executor.extraClassPath & spark.driver.extraClassPath in hive query without adding those in hive-site.xml
- Hive query on HUE shows different timestamp than programatically/on data
- descending order of data in hive using collect_set
- How to optimize writing to a large table in Hive/HDFS using Spark
- Spark SQL repartition before insert operation
- Alter datatype of complex type(array<struct>>) in hive
- SqlAlchemy connection to Hive using http thrift transport and basic auth
- Aggregate values into a new column while retaining the old column
- Is it possible to query MAPR hdfs/hive tables from Trino?
- Can we make a column having both partitioning and bucketing in hive?
- converting varchar(7) to decimal (7,5) in hive
- Extract all characters before numeric values in hive SQL
- Livy session to submit pyspark from HDFS
Related Questions in PARTITIONING
- Can't resize a partition using Mini Tool?
- SQL Server Data Model and Insert Performance
- Solution Indication - Database
- Distribute a list of positive numbers into a desired number of sets, aiming to have sums as close as possible between them
- Does the following value partition cover the day of December 31st?
- Issue implementing Hoare's algorithm in Typescript
- What happens to child table data outside of retention period with partman?
- Oracle 19c. REF Partitioning. Start redefinition Holds TM lock on parent table in 4 mode (when insert data into interim table)
- How do I sum the number of order quantities for each part number from the current date to the current date - lead time?
- DolphinDB: How to solve the error The number of partitions [xxxx] relevant to the query is too large?
- MariaDB does not select according to partition on partitioned table
- How is data read parallelly in Spark from an external data source?
- View on redshift and query optimiser
- Can partitioning be used for LIKE queries in Postgres?
- How to properly partition by hashkey in spark (to achieve co-partitioning)?
Related Questions in BUCKETS
- Connecting aspera on cloud with S3bucket
- Google Analytics Site Speed Page Timings Distribution buckets
- Aggregations and filters in Elastic - find the last hits and filter them afterwards
- googlec-storage-object-creator@project-name.iam.gserviceaccount.com does not have storage.objects.delete access to bucket-x/xxx.jpg
- how to handle two buckets in firebase storage when uploading files to firebase using angular
- How to parse the buckets key values of an ElasticSearch Aggregations query to a list of integers in C#
- How to find number of distinct phones per customer and put the customers(counts) in different buckets as per the counts?
- CommandException: "mb" command does not support "file://" URLs. Did you mean to use a gs:// URL?
- Elasticsearch get top 2 per group(bucket), then sort all the elements among all the groups
- Keen.IO - Bucketing results by numeric value rather than time / Histogram
- Algorithm strategy to prevent values from bouncing between 2 values when on edge of two 'buckets'
- Spark Bucketizer - show all buckets even if there are no elements
- Oracle SQL NTILE - equal distribution
- Why number of buckets in hive should be equal to number of reducers?
- Where/What do I code when I see "Error, Access Denied" when creating a bucket?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Because this is the most optimized way of working for mapreduce (all else equal). Tasks will be divided among reducers.
In hive 0.x and 1.x you have to specify the following: hive.enforce.bucketing = true. This means that the number of reducers will be automatically determined based on the number of buckets in your table. In later versions of hive (2.x) this is set by default.
Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables