Data dumping through REST API using Spring Batch

Published on
read
Data dumping through REST API using Spring Batch

Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload.  To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file. 

In this blog, we will use one of the free-to-use API from Coursera, to take the dump of e-courses. Coursera is one of the popular MOOCs site which exposes its e-courses through the REST API. To have a basic introduction about Spring Batch and getting started docs, please refer to the previous blog.

In Spring Batch, tasklet We can use tasklet which will give free-handed to kick start the task and repeat it as per our designed logic. Tasklet will be a single task executed inside a step. The traditional step will have a reader, processor and writer, which works well for file transformation or loading. Fitting our paginated get and dump scenario will be a bit cumbersome. Tasklet gives us the free-hand of placing the GET API request inside the execute and repeat logic till we reach the end of data. 

public class CourseGetTasklet implements Tasklet, StepExecutionListener {
    public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) 
	throws Exception {
        //task logic happens here..
        //if RepeatStatus.CONTINUABLE given, this will execute the tasklet again.. 
    }

    public void beforeStep(StepExecution stepExecution) {
        //before starting the tasklet, it will get executed.. 
    }

    public ExitStatus afterStep(StepExecution stepExecution) {
        //after completion of the tasklet, it will get executed.. 
    }
}

Let's set up the spring batch application, through the annotation itself. Create a class and provide for SpringApplication run method. It will have the @EnableBatchProcessing method which enables Spring Batch features and provide a base configuration for setting up batch jobs in an @Configuration class, roughly equivalent to using the <batch:*> XML namespace. @Configuration will mark this class as Spring Configuration class. @EnableAutoConfiguration will scan and adds the other class beans available in the classpath. 

JobBuilderFactory is used to create the job with the job id having the RunIdIncrementer. StepBuilderFactory is for creating the steps which kick start the tasklet (CourseGetTasklet) option to build it.  

@Configuration
@EnableAutoConfiguration
@EnableBatchProcessing
public class SampleBatchApplication {

    @Autowired
    private JobBuilderFactory jobs;
    @Autowired
    private StepBuilderFactory steps;
    @Bean
    public Job job() throws Exception {
        return this.jobs.get("job").incrementer(new RunIdIncrementer())
                .listener(new JobExecutionListener()).start(step1()).build();
    }
    @Bean
    protected Step step1() throws Exception {
        String epochStr = String.valueOf(new Date().getTime());
        return this.steps.get("step1v" + epochStr)
                        .tasklet(new CourseGetTasklet()).throttleLimit(1).build();
    }
    public static void main(String[] args) throws Exception {
        // System.exit is common for Batch applications since the exit code can be used to
        // drive a workflow
        System.exit(SpringApplication.exit(SpringApplication.run(
                SampleBatchApplication.class, args)));
    }
}

Discussion (0)

Subscribe