If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?
How to handle CSV files in the Bronze layer without the extra layer
319 views Asked by Su1tan At
1
There are 1 answers
Related Questions in DATABRICKS
- Not able to read text file from local file path - Spark CSV reader
- Spark with Scala: write null-like field value in Cassandra instead of TupleValue
- Spark SQL get max & min dynamically from datasource
- How to convert RDD string(xml format) to dataframe in spark java?
- Zeppelin 6.5 + Apache Kafka connector for Structured Streaming 2.0.2
- How to connect Tableau to Databricks Spark cluster?
- Confused about the behavior of Reduce function in map reduce
- Extract String from Spark DataFrame
- Saving a file locally in Databricks PySpark
- How to add Header info to row info while parsing a xml with spark
Related Questions in DELTA-LAKE
- How to use delta lake with Spark 2.4.4
- check if delta table exists on a path or not in databricks
- Why Databricks Delta is copying unmodified rows even when merge doesn't update anything?
- DeltaLake: How to Time Travel infinitely across Datasets?
- Add new column to the existing table in Delta lake(Gen2 blob storage)
- Error when trying to move data from on-prem SQL database to Azure Delta lake
- Deduplicate Delta Lake Table
- Streaming data into delta lake, reading filtered results
- Optimize blob storage Deltalake using local scope table on Azure Databricks
- How to add Delta Lake support to Zeppelin's spark interpreter?
Related Questions in DATA-LAKE
- Powershell -recursive in Azure Data Lake Store
- Handle new data in "Raw" zone of Lakehouse
- Dbeaver doesn't display metadata from one of our hive instances. How to fix?
- How to deal with historicization data in a data lake vs data warehouse?
- Delta Lake: don't we need time partition for full reprocessed tables anymore
- What Happens When a Delta Table is Created in Delta Lake?
- How to convert timestamp to AWS data lake s3 timestamp
- using multiple integration tools on hdfs
- Unable to see tables in the AWS datalake/glue UI
- Can Glue Crawler crawl the deltalake files to create tables in aws glue catalogue?
Related Questions in DATA-LAKEHOUSE
- In Model view for the sql endpoint of a lakehouse, model doesn't persist, Mark Date Table greyed out. Using MS Fabric / PowerBI / Synapse Engineering
- Opensource Datalakehouse with Multi-Node Multi-Drive MinIO object storage
- Delta Lake Size Requirements
- Handle new data in "Raw" zone of Lakehouse
- Do you store data in the Delta Lake Silver layer in a normalized format or do you derive it?
- How to handle CSV files in the Bronze layer without the extra layer
- how are Updates and Deletes handled in both Data Warehouses and Data Lakes?
- ETL / ELT pipelines - Metainformation about the pipeline
- lakeFS, Hudi, Delta Lake merge and merge conflicts
- Fabric Lakehouse PowerBI report: Couldn't load the data for this visual
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
A bit of an open question, however with respect to retaining the "raw" data in CSV I would normally recommend this as storage of these data is usually cheap relative to the utility of being able to re-process if there are problems or for purpose of data audit/traceability.
I would normally take the approach of compressing the raw files after processing and perhaps tar-balling the files. In addition moving these files to colder/cheaper storage.