I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?
Number of input splits is equals to number of mappers?
1.7k views Asked by koti developer At
2
There are 2 answers
0
Soma Sekhar Kuruva
On
No of blocks = NO of the Mappers; If the only one file with the 1 GB size and block size of 64 MB, no of chunks(Blocks) => 1026 MB/64 MB = 16 . So no of mappers = 16. By default we will get only one Reducer, if we want to run more reducers you can set job.setNumReduceTasks();
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in MAPREDUCE
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- Hadoop MapReduce WordPairsCount produces inconsistent results
- Hadoop MiniCluster Web UI
- Java lang runtime exception or jar file does not exist error
- basic python but wierd problem in hadoop-stream text value changes in MapReduce
- Hadoop is writing to file using context.write() but output file turns out empty
- Error while executing load_summarize_chain with custom prompts
- Apache Crunch Job On AWS EMR using Oozie
- Hadoop MapReducee WordCountLength - Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
- Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.io.FloatWritable
- I'm having trouble with a map reduce script
- No Output for MapReduce Program even after successful job completion on Cloudera VM
- Context.write method returns wrong result in Mapreduce java
Related Questions in HDFS
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- How to optimize writing to a large table in Hive/HDFS using Spark
- Update hadoop hadoop-2.6.5 to haddop 3.x. Operation category WRITE is not supported in state standby
- Copy/Merge multiple HDFS files using Nifi Processor
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- ERROR flume.SinkRunner: Unable to deliver event
- Apache flume does not run hadoop 3.1.0 Flume 1.11
- Livy session to submit pyspark from HDFS
- ClickHouse Server Exception: Code: 210.DB::Exception: Fail to read from HDFS:
- Confluent HDFS Sink connector error while connecting HDFS to Hive
- Node Transitioned from NEW to UNHEALTHY and Attempting to remove non-existent node
- Error associated with Azure Datalake Gen2 and Hadoop connection
- How do I directly read files from HDFS using dask?
Related Questions in MAPPER
- ModelMapper map properties with diferent name and different type
- In Mapstruct, how to ignore null objects and fields
- With Java 17 and Spring Boot 3.2.3 @Mapper is not able create Beans
- constructor PersonaMapper cannot be applied to given types
- Snaplogic - Using both mapValue and mapKey functions in the mapper snap
- What's the point of creating mappers to convert entity<>DTOs?
- Create a custom mapping method to map a property using Riok.Mapperly
- Incorrect number of arguments supplied for call to method
- Exclude empty Arrays and values from Jackson XmlMapper
- Mapstruct adding parameter impossible?
- @Mapper annotation not being implemented at compile: missing bean from mapper class
- Validate a chain of mappers types
- Tomcat mapperContextRootRedirectEnabled attribute is not working
- How to add custom Method to ProjectTo in AutoMapper?
- how to enable ExplicitExpansion globally
Related Questions in INPUT-SPLIT
- AttributeError: 'builtin_function_or_method' object has no attribute 'split' (3)
- Calculating input splits in MapReduce
- MapReduce basics
- InputSplits in mapreduce
- How and where is input split size mentioned or passed to a MR program?
- In Python 3.X, how do you program a print to only occur if the input.split() contains none of the items being checked in a for loop?
- Does the splits like FileSplit in Hadoop change the blocks?
- Python Input Split with a limit range
- Input split and block in hadoop
- hadoop - how would input splits form if a file has only one record and the size of file is more than block size?
- Input Splits in Hadoop
- Hadoop input split for a compressed block
- Number of input splits is equals to number of mappers?
- Mapper not executing on the hostname returned from getLocations() of InputSplit in Hadoop
- Hadoop MapReduce RecordReader Implementation Necessary?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
So if your file size is 1GB (1024/64) you will have 16 mappers running.
Your input split is different from the block size. Block is a physical representation that contains the actual data but input split is just a logical representation which just contains the split length and the split location.
However number of mappers also depends on various factors.
issplittable()in Inputformat class is set to false, then your file is not splittable and then also you will have one mapper running.job.setNumReduceTasks()will do that. If not set then the number of reducers would be 1 by default.I think the number of input splits depends upon the input file size.