Java 使用Spring-Batch

前言：

什么是Spring的批处理？简而言之，批处理执行作业。单个作业至少包含一个步骤。

前段时间公司需求让在现有系统中添加新功能。该系统必须从外部数据库获取数据。该外部数据库可通过REST API访问，但我们无法“动态”使用它，因为它花了太多时间。因此，我们决定从外部数据库和我们系统使用的数据库中复制数据。换句话说，我们必须下载我们需要的东西并将其存储到我们的数据库中。之后我们可以“在运行中”使用数据而不会产生任何性能开销。

该应用程序应执行如下操作：

通过REST API从外部数据库下载数据并将其另存为CSV文件。
每个数据集都是独立的，因此作业中的步骤也必须是独立的（有一个例外）。
创建与CSV文件一样多的MongoDB集合。
将每行从CSV文件转换为MongoDB文档。
将文档保存在相关集合中。
之后删除CSV文件。
保留批次，作业和步骤执行的详细信息。

项目架构：

Java 8
Maven 3.5.2
Spring Boot 1.5.10
Spring Web 4.3.13
MongoDB 3.4

pom.xml：

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-web</artifactId>
  <version>4.3.14.RELEASE</version>
</dependency>
<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-csv</artifactId>
  <version>1.5</version>
</dependency>
<dependency>
  <groupId>commons-io</groupId>
  <artifactId>commons-io</artifactId>
  <version>2.6</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.5.5</version>
</dependency>

Spring-Batch和MongoDB启动器创建一个Spring Boot应用程序。我为spring-web添加了一个单独的依赖项来拥有一个REST客户端。Spring-web使用httpcomponents。并且commons-io用于处理CSV文件。

操作

Spring提供的有关Spring Batch的所有文档，该步骤应如下所示：

收集有关CSV标头的信息
以CSV格式下载数据
将每一行转换为MongoDB文档
以块的形式插入数据库
重复上述步骤，直到下载并处理完所有内容

直接看代码。application.properties包含MongoDB详细信息以及一些其他属性：

spring.data.mongodb.host= mongodb://localhost/
spring.data.mongodb.port= 27017
spring.data.mongodb.database= db-replica
spring.data.mongodb.repositories.enabled= true
mongodb.bulk.batchSize= 1000
# [s,S] stands for seconds, [m,M] stands for minutes, [h,H] stands for hours. Format is 1m = 1 minute, 12H == 12 hours
request.connection.timeout= 2h
# csv中有多少行将被加载到程序内存 To many could cause Out Of Memory exception.
csv.chunkSize= 50000
#保存文件到服务器位置
csv.directory= ./csv
# a 日志
log.while.waiting.for.response= true
# 日志消息在日志中显示的频率（秒
# 5表示每隔15秒，日志中会显示一条消息，直到请求完成。
log.wait.interval= 5

Tasklet 为每个步骤定义了相同的抽象，因为每个步骤以相同的方式处理数据，但它必须将它们存储到不同的集合中：

abstract class AbstractStep implements Tasklet {
    protected final Logger logger;
    private static final String ERROR_COLLECTION_NAME = "csvErrors";
    @Value("${palantir.endpoint}")
    private String basicUrl;
    @Value("${mongodb.bulk.batchSize}")
    private int batchSize = 1000;
    @Value("${csv.chunkSize}")
    private int chunkSize = 50000;
    @Value("${palantir.branch}")
    private String branchName;
    @Value("${csv.directory}")
    private String csvDirectory = "./csv";
    private MongoTemplate mongoTemplate;
    private final PalantirClient palantirClient;
    private final String dataSetName;
    private final String collectionName;
    private final String dataSetRid;
    private String[] headers;
    private final String filePath;
    private final Collection<CreationDetails> creationDetails;
    protected AbstractStep(PalantirClient palantirClient, MongoTemplate mongoTemplate, 
                           String dataSetName, String collectionName, String dataSetRid) {
        this.palantirClient = palantirClient;
        this.mongoTemplate = mongoTemplate;
        this.dataSetName = dataSetName;
        this.collectionName = collectionName;
        this.dataSetRid = dataSetRid;
        this.filePath = String.format("%s/%s.csv", csvDirectory, dataSetName);
        this.creationDetails = new HashSet<>();
        this.logger = LoggerFactory.getLogger(this.getClass());
    }
    @Override
    public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) 
      throws IOException, JSONException {
        TimeLog timeLog = new TimeLog(getClass().getSimpleName());
        timeLog.logTime("dropTemporaryCollection");
        dropTemporaryCollection();
        timeLog.logTime("extractColumns");
        extractColumns();
        timeLog.logTime("downloadCsv");
        downloadCsv();
        timeLog.logTime("mapAndSaveToDB");
        mapAndSaveToDB();
        timeLog.logTime("createIndexes");
        createIndexes();
        timeLog.logTime("renameTempCollection");
        renameTempCollection();
        timeLog.logTime("upsertCreationDetails");
        upsertCreationDetails();
        timeLog.logTime("deleteFile");
        deleteFile();
        timeLog.done();
        return RepeatStatus.FINISHED;
    }
...
    protected abstract void createIndexes();
    protected abstract QueryColumnBuilder getQueryColumnBuilder(String branchName);
    protected abstract CreationDetails createCreationDetail(DBObject dbObject, Date created);
}

AbstractStep 从Spring Batch模块实现接口Tasklet。在存储数据时，出于安全原因，首先建立临时收集。成功处理所有数据后，将创建正确的索引，并将临时集合重命名为目标集合。TimeLog AbstractStep 还包含一些特定于域的属性，不加解释。

@Component
public class OperationalRouteStep extends AbstractStep {
    private static final String COLLECTION_NAME = "operationalRoutes";
    private static final String DATA_SET_NAME = "operational_routes";
    private static final String DATA_SET_RID = "4ef1e435-cb2a-450e-ba18-e42263057379";
    @Autowired
    public OperationalRouteStep(MongoTemplate mongoTemplate, PalantirClient palantirClient) {
        super(palantirClient, mongoTemplate, DATA_SET_NAME, COLLECTION_NAME, DATA_SET_RID);
    }
    @Override
    protected void createIndexes() {
        DBCollection tempDBCollection = getTempDBCollection();
        String indexName = "shipment_version_instance_id_1";
        logger.debug("Creating index [{}]", indexName);
        Index index = new Index().on("SHIPMENT_VERSION_INSTANCE_ID", Sort.Direction.DESC)
                .background();
        tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
    }
    @Override
    protected QueryColumnBuilder getQueryColumnBuilder(String branchName) {
        return QueryColumnBuilder.queryForOperationalRoutesColumns(branchName);
    }
    @Override
    protected CreationDetails createCreationDetail(DBObject dbObject, Date creationDate) {
        return new CreationDetails(
                "SHIPMENT_VERSION_INSTANCE_ID",
                String.valueOf(dbObject.get("SHIPMENT_VERSION_INSTANCE_ID")),
                creationDate
        );
    }
}

OperationalRouteStep class 负责将数据集名称operation_routes 中的数据复制到名为的MongoDB集合 operationalRoutes。

@Component
public class EquipmentCargoStep extends AbstractStep {
    private static final String COLLECTION_NAME = "equipmentCargo";
    private static final String DATA_SET_NAME = "equipment_cargo";
    private static final String DATA_SET_RID = "0fc9d55a-142e-4385-883d-db1c1a5ef2b4";
    @Autowired
    public EquipmentCargoStep(MongoTemplate mongoTemplate, PalantirClient palantirClient) {
        super(palantirClient, mongoTemplate, DATA_SET_NAME, COLLECTION_NAME, DATA_SET_RID);
    }
    @Override
    protected void createIndexes() {
        DBCollection tempDBCollection = getTempDBCollection();
        String indexName = "fk_shipment_version_1";
        logger.debug("Creating index [{}]", indexName);
        Index index = new Index().on("FK_SHIPMENT_VERSION", Sort.Direction.DESC)
                .background();
        tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
        indexName = "equipment_assignment_instance_id_1";
        logger.debug("Creating index [{}]", indexName);
        index = new Index().on("EQUIPMENT_ASSIGNMENT_INSTANCE_ID", Sort.Direction.DESC)
                .background();
        tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
    }
    @Override
    protected QueryColumnBuilder getQueryColumnBuilder(String branchName) {
        return QueryColumnBuilder.queryForEquipmentCargoColumns(branchName);
    }
    @Override
    protected CreationDetails createCreationDetail(DBObject dbObject, Date creationDate) {
        return new CreationDetails(
                "EQUIPMENT_ASSIGNMENT_INSTANCE_ID",
                String.valueOf(dbObject.get("EQUIPMENT_ASSIGNMENT_INSTANCE_ID")),
                creationDate
        );
    }
}

批量配置在类MainBatchConfigurer中完成。

扫描二维码关注公众号，回复： 5651007 查看本文章

@Configuration
public class MainBatchConfigurer implements BatchConfigurer {
    @Autowired
    private ExecutionContextDao mongoExecutionContextDao;
    @Autowired
    private JobExecutionDao mongoJobExecutionDao;
    @Autowired
    private JobInstanceDao mongoJobInstanceDao;
    @Autowired
    private StepExecutionDao mongoStepExecutionDao;
    @Override
    public JobRepository getJobRepository() {
        return new SimpleJobRepository(
          mongoJobInstanceDao, 
          mongoJobExecutionDao, 
          mongoStepExecutionDao, 
          mongoExecutionContextDao
        );
    }
    @Override
    public PlatformTransactionManager getTransactionManager() {
        return new ResourcelessTransactionManager();
    }
    @Override
    public SimpleJobLauncher getJobLauncher() throws Exception {
        SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
        jobLauncher.setJobRepository(getJobRepository());
        jobLauncher.afterPropertiesSet();
        return jobLauncher;
    }
    @Override
    public JobExplorer getJobExplorer() {
        return new SimpleJobExplorer(
          mongoJobInstanceDao, 
          mongoJobExecutionDao, 
          mongoStepExecutionDao, 
          mongoExecutionContextDao
        );
    }
    private JobOperator jobOperator() throws Exception {
        SimpleJobOperator jobOperator = new SimpleJobOperator();
        jobOperator.setJobLauncher(getJobLauncher());
        jobOperator.setJobExplorer(getJobExplorer());
        jobOperator.setJobRepository(getJobRepository());
        jobOperator.setJobRegistry(jobRegistry());
        return jobOperator;
    }
    private JobRegistry jobRegistry() {
        return new MapJobRegistry();
    }
    private JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor() {
        JobRegistryBeanPostProcessor postProcessor = new JobRegistryBeanPostProcessor();
        postProcessor.setJobRegistry(jobRegistry());
        return postProcessor;
    }
}

调度任务 JobConfigurer：

@Component
public class JobConfigurer {
    private static final String JOB_NAME = "SYNC-DBS";
    private JobBuilderFactory jobBuilderFactory;
    private StepBuilderFactory stepBuilderFactory;
    private final Tasklet equipmentCargoStep;
    private final Tasklet haulageEquipmentStep;
    private final Tasklet haulageInfoStep;
    private final Tasklet operationalRouteStep;
    private final Tasklet trackingBookingStep;
    private final Tasklet cargoConditioningStep;
    @Autowired
    public JobConfigurer(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory,
                         Tasklet equipmentCargoStep, Tasklet haulageEquipmentStep, 
                         Tasklet haulageInfoStep, Tasklet operationalRouteStep, 
                         Tasklet trackingBookingStep, Tasklet cargoConditioningStep) {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
        this.equipmentCargoStep = equipmentCargoStep;
        this.haulageEquipmentStep = haulageEquipmentStep;
        this.haulageInfoStep = haulageInfoStep;
        this.operationalRouteStep = operationalRouteStep;
        this.trackingBookingStep = trackingBookingStep;
        this.cargoConditioningStep = cargoConditioningStep;
    }
    public Job synchroniseDatabasesJob() {
        return jobBuilderFactory.get(JOB_NAME)
                .incrementer(parametersIncrementer())
                .preventRestart()
                // '*' means that we do not care if it fails or success, step are independent and should be executed that way
                .start(trackingBookingStep()).on("*").to(operationRouteStep())
                .from(operationRouteStep()).on("*").to(equipmentCargoStep())
                .from(equipmentCargoStep()).on("*").to(cargoConditioningStep())
                .from(cargoConditioningStep()).on("*").to(haulageInfoStep())
                // when the plan is COMPLETED, only then we can download equipments for plan
                .from(haulageInfoStep()).on("COMPLETED").to(haulageEquipmentStep())
                .from(haulageEquipmentStep()).on("*").end()
                .end()
                .build();
    }
    private JobParametersIncrementer parametersIncrementer() {
        return jobParameters -> {
            if (jobParameters == null || jobParameters.isEmpty()) {
                return new JobParametersBuilder()
                  .addDate(JobParameter.RUN_ID.key(), new Date(System.currentTimeMillis()))
                  .toJobParameters();
            }
            Date id = jobParameters.getDate(JobParameter.RUN_ID.key(), new Date(System.currentTimeMillis()));
            return new JobParametersBuilder()
              .addDate(JobParameter.RUN_ID.key(), id)
              .toJobParameters();
        };
    }
    private Step trackingBookingStep() {
        return this.stepBuilderFactory.get(trackingBookingStep.getClass().getSimpleName())
                .tasklet(trackingBookingStep)
                .build();
    }
    private Step operationRouteStep() {
        return this.stepBuilderFactory.get(operationalRouteStep.getClass().getSimpleName())
                .tasklet(operationalRouteStep)
                .build();
    }
    private Step cargoConditioningStep() {
        return this.stepBuilderFactory.get(operationalRouteStep.getClass().getSimpleName())
                .tasklet(cargoConditioningStep)
                .build();
    }
    private Step equipmentCargoStep() {
        return this.stepBuilderFactory.get(equipmentCargoStep.getClass().getSimpleName())
                .tasklet(equipmentCargoStep)
                .build();
    }
    private Step haulageInfoStep() {
        return this.stepBuilderFactory.get(haulageInfoStep.getClass().getSimpleName())
                .tasklet(haulageInfoStep)
                .build();
    }
    private Step haulageEquipmentStep() {
        return this.stepBuilderFactory.get(haulageEquipmentStep.getClass().getSimpleName())
                .tasklet(haulageEquipmentStep)
                .build();
    }
}

JobConfigurer 由于每个步骤都是独立的，因此前一个作业的结果是否成功并不重要。无论如何都应该执行下一步。

使用该方法Main：

@EnableBatchProcessing
@SpringBootApplication(exclude = { DataSourceAutoConfiguration.class })
public class Application implements CommandLineRunner {
    private static final Logger logger = LoggerFactory.getLogger(Application.class);
@Autowired
    private MainBatchConfigurer batchConfigurer;
@Autowired
private JobConfigurer jobConfigurer;
@Value("${csv.directory}")
    private String csvDirectory = "./csv";
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Override
public void run(String... strings) {
        TimeLog timeLog = new TimeLog(getClass().getSimpleName());
        try {
            long timestamp = System.currentTimeMillis();
            logger.debug("Start GCSS batch with timestamp '{}'.", timestamp);
            SimpleJobLauncher jobLauncher = batchConfigurer.getJobLauncher();
            JobParameters jobParameters = new JobParametersBuilder()
              .addDate(JobParameter.RUN_ID.key(), new Date(timestamp))
              .toJobParameters();
timeLog.logTime("Job execution");
            JobExecution jobExecution = jobLauncher.run(jobConfigurer.synchroniseDatabasesJob(), jobParameters);
logger.debug("Job execution finished, timestamp = {}, job execution status = {}.", timestamp, jobExecution.getStatus());
} catch (Exception e) {
            logger.error("Error when launching job.", e);
} finally {
            try {
                timeLog.logTime("Delete " + csvDirectory);
                FileUtils.forceDelete(new File(csvDirectory));
            } catch (IOException e) {
                logger.error("Error when deleting  {} directory.", csvDirectory, e);
            }
        }
        timeLog.done();
        System.exit(0);
    }
}

总结：

构建Spring Batch应用程序，将CSV文件转换为MongoDB集合以上为简单示例说明。

原文地址：https://dzone.com/articles/spring-batch-typical-use-case

Java 使用Spring-Batch

前言：

项目架构：

操作

总结：

猜你喜欢