eSKape eSKape - 10 days ago 6
Java Question

Clarification on Step Chaining

Because of many different opinions on Step Chaining in Spring Batch, depending on the use case, I want to know what is the most common sense:

Chaining of Steps, i.e. a Job has a flow of Steps, where every Step has Reader, Processer & Writer. Data between Steps is exchanged using the Job ExecutionContext.

OR

Chaining of ItemProcessors, i.e. a job only has one step and but a flow of ItemProcessors.

The 1st possibility is the more reasonable in my opinion, as the name 'Job' implies that there are several Steps to finish it. The downside in many use cases could be, that there will be redundant or sometimes 'empty' reading & writing at start and end of a step.
The 2nd one is the most common solution, but I think this 'one step' solution isn't quite what batch processing is intended for.

What's your opinion on this?

Answer

ItemProcessors' usefulness is pretty limited, they're best for cases where you want to transform each item that you read in. You can use them to filter out lines you don't want, but in some cases (when your reader executes a SQL query) that becomes wasteful fast, it's a lot more efficient if you can avoid having to read those lines in the first place.

It's nice to have a hook in the process to be able to drop in ItemProcessors, but I wouldn't overuse it. Most non-trivial jobs seem to have multiple steps, and the framework provides support for steps with error-handling, chunking, partitioning, etc., where ItemProcessors compared to steps are extremely lightweight and the framework doesn't provide any support for them beyond providing a place for them in the workflow.

(The statement "Data between Steps is exchanged using the Job ExecutionContext" seems questionable. I've used it to hold things like counts of the number of lines read or written. It's not a good place to put anything much bigger than that.)

Comments