I cannot understand the difference between multi-threading and partitioning in Spring batch. The implementation is of course different: In partitioning you need to prepare the partitions then process it. I want to know what is the difference and which one is more efficient way to process when the bottleneck is the item-processor.
Spring batch difference between Multithreading vs partitioning
10.7k views Asked by mettok At
1
There are 1 answers
Related Questions in MULTITHREADING
- How can I outsource worker processes within a for loop?
- OpenMP & oneTbb difference
- Receiving Notifications for Individual Task Completion OmniThreadLibrary Parallel.ForEach
- C++ error: no matching member function for call to 'enqueue' futures.emplace_back(TP.enqueue(sum_plus_one, x, &M));
- How can I create a thread in Haskell that will restart if it gets killed due to any reason?
- Qt: running callback in the main thread from the worker thread
- Using `static` on a AVX2 counter function increases performance ~10x in MT environment without any change in Compiler optimizations
- Heap sort with multithreading
- windows multithreading CreateMutex
- The problem of "fine-grained locks and two-phase locking algorithm"
- OpenMP multi-threading not working if OpenMPI set to use one or two MPI processor
- WPF Windows Initializing is locking the separated thread in .Net 8
- TCP Client Losing Connection When Writing Data
- vc++ thread constructor throwing compiler error c2672
- ASP.NET Core 6 Web API : best way to pause before resending email
Related Questions in SPRING
- HTTPS configuration in Spring Boot, server returning timeout
- Multi Tenancy in Spring - Partitioned Data Approach
- How to create beans of the same class for multiple template parameters in Spring
- org.telegram.telegrambots.meta.exceptions.TelegramApiException: Bot token and username can't be empty
- Springboot: How to get an entity optional property and check null?
- How do I propagate the current SecurityContext to my @RabbitListener in Spring Boot?
- Spring's XML based bean configuration for Object Mapper's Case Insensitive property
- Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured. I'm using Postgresql
- springboot class org.hibernate.mapping.Bag cannot be cast to class org.hibernate.mapping.SimpleValue
- Issue while deploying JDK 17 and Spring 6 application in Tomcat 10.1.20
- Spring JPA Data Auditing - How to design it?
- Springframework test: Async not started
- Error: Cannot invoke "jakarta.servlet.http.HttpSession.getAttribute(String)" because "session" is null
- How does spring-retry determine which methods to retry when @Retryable is placed at the class level?
- problem with edge server registration in Eureka
Related Questions in SPRING-BATCH
- How to customize the Spring Batch default configuration without breaking autoconfiguration
- Is there any catch in Reading and Updating to the same Oracle table in Single Threaded Spring batch job?
- Starting first job causes - Dispatcher has no subscribers for channel exception when declared multiple DirectChannels
- Looping Over a Group of Steps in Spring Batch
- Already value [datasource.ConnectionHolder@6b144bd8] for key [datasource.DriverManagerDataSource@56739ee9] bound to thread [main]
- How to Use ControllerAdvice for Exception Handling in Spring Batch Application?
- Writing in multiple related tables in spring batch
- Restarting a FAILED job is not processing the failed chunk data again : Continuation
- Spring Batch Not persisting Data Properly in MYSQL Database
- Strange Spring Batch exception, something I did wrong?
- Parameter 0 of method jobBean in com.nextuple.BatchProcessing.config.BatchConfig required a bean named 'dataSource' that could not be found
- is spring batch supports parquet file reading and writing
- Spring Batch 5 - Don't want it to automatically create tables in database
- Transaction flow for distributed transaction in Spring Batch
- How to solve JobBuilder deprecated in SpringBatch 5.1.1
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
TL;DR;
Neither approach is intended to help when the bottleneck is in the processor. You will see some gains by having multiple items going through a processor at the same time, but both of the options you point out get their full benefits when used in processes that are I/O bound. The
AsyncItemProcessor/AsyncItemWritermay be a better option.Overview of Spring Batch Scalability
There are five options for scaling Spring Batch jobs:
AsyncItemProcessor/AsyncItemWriterEach has it's own benefits and disadvantages. Let's walk through each:
Multithreaded step
A multithreaded step takes a single step and executes each chunk within that step on a separate thread. This means that the same instances of each of the batch components (readers, writers, etc) are shared across the threads. This can increase performance by adding some parallelism to the step at the cost of restartability in most cases. You sacrifice restartability because in most cases, the ability to restart is based on the state maintained within the reader/writer/etc. With multiple threads updating that state, it becomes invalid and useless for restart. Because of this, you typically need to turn save state off on individual components and set the restartable flag to false on the job.
Parallel steps
Parallel steps are achieved via a split. It allows you to execute multiple, independent steps in parallel via threads. This does not sacrifice restartability, but does not help improve the performance of a single step or piece of business logic.
Partitioning
Partitioning is the dividing of data, in advance, into smaller chunks (called partitions) by a master step and then having slaves work independently on the partitions. In Spring Batch, both the master and each slave, is an independent step so you can get the benefits of parallelism within a single step without sacrificing restartability. Partitioning also provides the ability to scale beyond a single JVM in that the slaves do not have to be local (you can use various communication mechanisms to communicate with remote slaves).
An important note about partitioning is that the only communication between the master and slave is a description of the data and not the data itself. For example, the master may tell slave1 to process records 1-100, slave2 to process records 101-200, etc. The master does not send the actual data, only the information required for the slave to obtain the data it is supposed to process. Because of this, the data must be local to the slave processes and the master can be located anywhere.
Remote chunking
Remote chunking allows you to scale the process and optionally the write logic across JVMs. In this use case, the master reads the data and then sends it over the wire to the slaves where it is processed and then either written locally to the slave or returned to the master for writing local to the master.
The important difference between partitioning and remote chunking is that instead of a description going over the wire, remote chunking sends the actual data over the wire. So instead of a single packet saying process records 1-100, remote chunking is going to send the actual records 1-100. This can have a large impact on the I/O profile of a step, but if the processor is enough of a bottleneck, this can be useful.
AsyncItemProcessor/AsyncItemWriterThe final option for scaling Spring Batch processes is the
AsyncItemProcessor/AsycnItemWritercombination. In this case, theAsyncItemProcessorwraps yourItemProcessorimplementation and executes the call to your implementation in a separate thread. TheAsyncItemProcessorthen returns aFuturethat is passed to theAsyncItemWriterwhere it is unwrapped and passed to the delegateItemWriterimplementation.Because of the nature of how data flows through this option, certain listener scenarios are not supported (since we don't know the outcome of the
ItemProcessorcall until inside theItemWriter) but overall, it can provide a useful tool for parallelizing just theItemProcessorlogic in a single JVM without sacrificing restartability.