I've seen some good explanations of creating a table with partitions which are CLUSTERED BY and SORTED BY. How does this compare with creating a table with partitions, then populating the table (with INSERT OVERWRITE for instance) using CLUSTER BY? Is the CLUSTER BY a persistent sort within the table?
Hive difference between PARTITIONED BY, CLUSTERED BY and SORTED BY with BUCKETS and insert overwrite with PARTITIONED and CLUSTER BY?
2.4k views Asked by chuckfinley At
1
There are 1 answers
Related Questions in SORTING
- Sorting a List by its property renames all the objects in the List
- Does Sort() method in C# use recursion?
- ARM Assembly code is not executing in Vitis IDE
- Creating an efficent and time-saving algorithm to find difference between greater than and lesser than combination
- Heap sort with multithreading
- Laravel Livewire data table sorting livewire update payload
- basic MergeSort exercise
- How to import a range into a variant array in Excel VBA and sort using the sort method?
- Looker Studio | pivot chart - sorting by metric and last month
- how to create an array of multiples of 5 and display it in reverse
- matplotlib sort barh by values
- Custom Sorting Javascript with A-Z set
- Mainframe Programming Sorting, OUTFIL REMOVECC,NODETAIL
- Soft list based on another list
- SQL query : creating table with distinct values on selected columns
Related Questions in HIVE
- Type Adapter for Offset in hive flutter
- HIVE Sql Date conversion
- How to set spark.executor.extraClassPath & spark.driver.extraClassPath in hive query without adding those in hive-site.xml
- Hive query on HUE shows different timestamp than programatically/on data
- descending order of data in hive using collect_set
- How to optimize writing to a large table in Hive/HDFS using Spark
- Spark SQL repartition before insert operation
- Alter datatype of complex type(array<struct>>) in hive
- SqlAlchemy connection to Hive using http thrift transport and basic auth
- Aggregate values into a new column while retaining the old column
- Is it possible to query MAPR hdfs/hive tables from Trino?
- Can we make a column having both partitioning and bucketing in hive?
- converting varchar(7) to decimal (7,5) in hive
- Extract all characters before numeric values in hive SQL
- Livy session to submit pyspark from HDFS
Related Questions in HIVEDDL
- Creating Hive View - Turn off metadata lookup from Hive Metastore
- Any production scenario where External table in Hive is definitely needed?
- Hive - Load pipe delimited data starting with pipe
- Do I need to do msck repair table after alter table?
- INSERT OVERWRITE on just created table
- Hive - incomplete rows in select from managed partitioned table
- Parse timestamp in Hive during table creation
- Hive Reading only one json row
- How to retain last N partitions for a hive external table?
- LATERAL VIEW explode funtion in hive
- Hive Table name starts with underscore select statement issue
- Hive alter table column fails when it has struct column
- Unable to understand significance of external keyword in hive
- how to register an existing delta table to hive
- How to add multi-level partition in hive?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Even if INSERT OVERWRITE + CLUSTER BY would produce table with persistently sorted data there is no way to tell hive that data is already sorted other than create CLUSTERED BY table. you can benefit from sorted data (sort-merge-join for example) only when the Hive knows about it and therefore can optimize the query. The data is not necessarily written to the disk in the same order it was produced or passed to the writer unless you specified that table is clustered(sorted). Usual (heap) tables are not sorted in theory. Writer process does not write data in the same order that the input because it is buffered (deferred write) and parallel.