Have you ever heard someone say "Man I wish we had a toy like that when I was a kid! That would have been awesome!" For me, that’s usually when I’m wrapping the latest entry in Nerf’s arsenal for Christmas, someone’s birthday or any time my kids and I are running through the house attempting to shoot each other in the backside with a Nerf dart.
But how many times have you heard someone say something similar on a software project? For the past year I’ve been having those exact thoughts numerous times while working with Spring Batch. I can’t recount the number of times I’ve recalled past projects and wished I could have used Spring Batch on them. Everything I’ve worked on in the past from processing admin files to setting up new companies within a system to loading a complete supplier catalog into a procurement system would have benefited greatly from leveraging Spring Batch. For the current project I’ve been working on, we’ve been converting a large list of COBOL batch programs into Spring Batch jobs. The project encompasses quite a wide range of tasks from calculating and assessing fees to processing large vendor feeds and running recon reports.
Over the last several years, there have been numerous frameworks created to simplify the development of web applications, web services, ORM tools and so on. In comparison, open source frameworks for handling batch processing have been almost non-existent yet there has always been a considerable amount of time devoted to developing in house batch processing applications. Spring Batch was designed to fulfill that role and leverages the Spring Framework as its foundation.
In my previous article, I discussed and made the argument for using Mockito to unit test Spring Batch components. Since I didn’t really provide any background or features about Spring Batch in that article, I thought now would be a good time to provide a short overview and simple hands on primer to get you started using it.
So what is Spring Batch? Since the definition was articulated so well, here’s a direct quote from the project’s website:
"Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advanced enterprise services when necessary." –http://static.springsource.org/spring-batch/
Spring Batch was designed as a three layered architecture that consists of the Batch Application, Batch Core and Batch Infrastructure. The Batch Application layer contains all of batch jobs and custom code written by a developer that will be implementing job processes using Spring Batch. The Batch Core layer implements all of the necessary runtime classes needed to launch, control and record statistics about batch jobs. The Batch Infrastructure contains the common readers, writers and services used by both application developers creating jobs and the core batch framework itself.
Spring Batch uses a simple naming convention that should sound pretty familiar to anyone who has worked with batch processes in general. A Job is the main component of Spring Batch and encompasses an entire batch process, which is typically made up of a series of Steps. As part of the Job there are also references to a JobInstance and JobExecution. A JobInstance refers to the concept of a logical job run, for example running the "EndOfDay" job for 2012/07/01. A JobExecution refers to the technical concept of a single attempt to execute the job, for instance the first attempted execution of the "End of Day" job for 2012/07/01.
A Step is an independent process of a batch Job that contains all of the information necessary to define and control a particular phase in the job execution. It’s also at the Step level where you will find the transaction isolation, so keep that in mind when you are designing how your batch process will execute. The Step may contain a single Tasklet that is used for simple processing such as validating job parameters when launching a job, setting up various resources, cleaning up resources, etc. The Tasklet interface has one "execute" method that will be called repeatedly until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. A more common Step that requires the processing of business rules would use a "Chunk Oriented" implementation that wraps an ItemReader, optional
ItemWriter for the Step execution. The chunk-oriented approach to batch processing reads and processes data in chunks, for example reading and processing 100 items at a time from a file to load them into a database. The chunk size is also used as the basis for any transaction commits.
ItemReader interface has one "read" method that is called multiple times, each call returns one item read from the source and returning null when all input data has been exhausted. The resulting output of the
ItemReader is collected into a list that is used to apply the business rules. There are many default implementations of
ItemReader that have been provided with Spring Batch such as
FlatFileItemReader, JdbcCursorItemReader, JdbcPagingItemReader, JpaPagingItemReader and StoredProcedureItemReader. Due to the extensibility of Spring Batch, you also have the ability to implement your own custom ItemReader if your requirements fall outside of the scope of the existing pre-built implementations.
ItemProcessor interface has one "process" method that is used for item transformation through applied business rules. Given an input item, which is one item resulting from the output of the
ItemReader, apply the business rules and the processor either returns the modified item or a new item for continued processing. Or if continued processing of the item should not take place, the
ItemProcessor should return a null value effectively filtering out the item. You also have the ability to chain processors together to apply very complex business rules, with the output of one processor becoming the input of the next processor in the chain and so on. Within the
ItemProcessor implementation is where the bulk of the work by the developer will be as this is where most of your business logic will be applied. The resulting output of the
ItemProcessor is collected into a list that will then be fed to the
ItemWriter for output processing.
ItemWriter interface has one "write" method and that is called one time for the chunk being processed and is supplied the list items for generic output. There are many default implementations of
ItemWriter that have been provided by Spring Batch such as
FlatFileItemWriter, JdbcBatchItemWriter, and
JpaItemWriter to name a few. Once again you also have ability to implement your own
ItemWriter if you find that the provided implementations don’t fit your needs, one such example would be a PDF writer for generating reports.
In addition to the Steps configured within a Job, there are also many points within the execution of a Job that you are able to intercept runtime execution and perform additional processing through several interfaces provided with Spring Batch. Some of the Listeners, the associated methods and any corresponding annotations:
- JobExectionListener ( beforeJob, afterJob) @BeforeJob, @AfterJob
- StepExecutionListener ( beforeStep, afterStep) @BeforeStep, @AfterStep
- ChunkListener( beforeChunk, afterChunk ) @BeforeChunk, @AfterChunk
- ItemReadListener (beforeRead, afterRead, onReadError) @BeforeRead, @AfterRead, @OnReadError
- ItemProcessListener( beforeProcess, afterProcess, onProcessError ) @BeforeProcess, @AfterProcess, @OnProcessError
- ItemWriteListener ( beforeWrite, afterWrite, onWriteError ) @BeforeWriter, @AfterWrite, @OnWriteError
As you can see from the basic features we’ve covered here, Spring Batch has removed a lot of the hassle related to solving some of the technical issues surrounding enterprise batch processing. Here’s a look at some of the additional features not covered here.
- Repeat Operations: an abstraction for grouping repeated operations together and moving the iteration logic into the framework.
- Retry Operations: an abstraction for automatic retries.
- Execution contexts at both the Job and Step level for sharing information between components.
- Late binding of environment properties, job parameters and execution context values into a Step when it starts.
- Persistence of Job meta data for management and reporting purposes that record stats for every component of the Job.
- Configurable exception handling strategies allowing fault tolerance and record skipping.
- Concurrent execution of chunks.
- Remote chunking of steps.
- Partitioning: steps execute concurrently and optionally in separate processes.
- OSGi support for deploying the Spring Batch framework as a set of OSGi services. Deploy individual jobs for groups of jobs as additional bundles that depend on the core.
- Non-sequential models for Job control configuration ( branching and decision support of Step flow )
- Also available is a Spring Batch Admin project that is a web-app designed to provide an interface for launching, monitoring and viewing Job executions.
Benefits of Using Spring Batch
- Since the foundation of Spring Batch is based upon the Spring framework, you also get all of the benefits of Spring such as dependency injection and bean management based upon simple POJOs.
- For developers already accustomed to developing Spring based applications, it’s a very short ramp up time getting familiar with using the Spring Batch framework.
- The majority of the technical aspects surrounding the creation of batch applications have been solved and the developer instead spends more time solving the business needs.
- By leveraging the additional functionality of Spring Integration, you can further increase the scalability of more distributed processes.
Part Two in this series will discuss what it takes to get Spring Batch up and running. Stay tuned!
– Jonny Hackett,