前言:
什么是Spring的批处理?简而言之,批处理执行作业。单个作业至少包含一个步骤。
前段时间公司需求让在现有系统中添加新功能。该系统必须从外部数据库获取数据。该外部数据库可通过REST API访问,但我们无法“动态”使用它,因为它花了太多时间。因此,我们决定从外部数据库和我们系统使用的数据库中复制数据。换句话说,我们必须下载我们需要的东西并将其存储到我们的数据库中。之后我们可以“在运行中”使用数据而不会产生任何性能开销。
该应用程序应执行如下操作:
- 通过REST API从外部数据库下载数据并将其另存为CSV文件。
- 每个数据集都是独立的,因此作业中的步骤也必须是独立的(有一个例外)。
- 创建与CSV文件一样多的MongoDB集合。
- 将每行从CSV文件转换为MongoDB文档。
- 将文档保存在相关集合中。
- 之后删除CSV文件。
- 保留批次,作业和步骤执行的详细信息。
项目架构:
- Java 8
- Maven 3.5.2
- Spring Boot 1.5.10
- Spring Web 4.3.13
- MongoDB 3.4
pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-web</artifactId>
<version>4.3.14.RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.5</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.5</version>
</dependency>
Spring-Batch和MongoDB启动器创建一个Spring Boot应用程序。我为spring-web添加了一个单独的依赖项来拥有一个REST客户端。Spring-web使用httpcomponents。并且commons-io用于处理CSV文件。
操作
Spring提供的有关Spring Batch的所有文档,该步骤应如下所示:
- 收集有关CSV标头的信息
- 以CSV格式下载数据
- 将每一行转换为MongoDB文档
- 以块的形式插入数据库
- 重复上述步骤,直到下载并处理完所有内容
直接看代码。application.properties包含MongoDB详细信息以及一些其他属性:
spring.data.mongodb.host= mongodb://localhost/
spring.data.mongodb.port= 27017
spring.data.mongodb.database= db-replica
spring.data.mongodb.repositories.enabled= true
mongodb.bulk.batchSize= 1000
# [s,S] stands for seconds, [m,M] stands for minutes, [h,H] stands for hours. Format is 1m = 1 minute, 12H == 12 hours
request.connection.timeout= 2h
# csv中有多少行将被加载到程序内存 To many could cause Out Of Memory exception.
csv.chunkSize= 50000
#保存文件到服务器位置
csv.directory= ./csv
# a 日志
log.while.waiting.for.response= true
# 日志消息在日志中显示的频率(秒
# 5表示每隔15秒,日志中会显示一条消息,直到请求完成。
log.wait.interval= 5
Tasklet 为每个步骤定义了相同的抽象,因为每个步骤以相同的方式处理数据,但它必须将它们存储到不同的集合中:
abstract class AbstractStep implements Tasklet {
protected final Logger logger;
private static final String ERROR_COLLECTION_NAME = "csvErrors";
@Value("${palantir.endpoint}")
private String basicUrl;
@Value("${mongodb.bulk.batchSize}")
private int batchSize = 1000;
@Value("${csv.chunkSize}")
private int chunkSize = 50000;
@Value("${palantir.branch}")
private String branchName;
@Value("${csv.directory}")
private String csvDirectory = "./csv";
private MongoTemplate mongoTemplate;
private final PalantirClient palantirClient;
private final String dataSetName;
private final String collectionName;
private final String dataSetRid;
private String[] headers;
private final String filePath;
private final Collection<CreationDetails> creationDetails;
protected AbstractStep(PalantirClient palantirClient, MongoTemplate mongoTemplate,
String dataSetName, String collectionName, String dataSetRid) {
this.palantirClient = palantirClient;
this.mongoTemplate = mongoTemplate;
this.dataSetName = dataSetName;
this.collectionName = collectionName;
this.dataSetRid = dataSetRid;
this.filePath = String.format("%s/%s.csv", csvDirectory, dataSetName);
this.creationDetails = new HashSet<>();
this.logger = LoggerFactory.getLogger(this.getClass());
}
@Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext)
throws IOException, JSONException {
TimeLog timeLog = new TimeLog(getClass().getSimpleName());
timeLog.logTime("dropTemporaryCollection");
dropTemporaryCollection();
timeLog.logTime("extractColumns");
extractColumns();
timeLog.logTime("downloadCsv");
downloadCsv();
timeLog.logTime("mapAndSaveToDB");
mapAndSaveToDB();
timeLog.logTime("createIndexes");
createIndexes();
timeLog.logTime("renameTempCollection");
renameTempCollection();
timeLog.logTime("upsertCreationDetails");
upsertCreationDetails();
timeLog.logTime("deleteFile");
deleteFile();
timeLog.done();
return RepeatStatus.FINISHED;
}
...
protected abstract void createIndexes();
protected abstract QueryColumnBuilder getQueryColumnBuilder(String branchName);
protected abstract CreationDetails createCreationDetail(DBObject dbObject, Date created);
}
AbstractStep
从Spring Batch模块实现接口Tasklet。在存储数据时,出于安全原因,首先建立临时收集。成功处理所有数据后,将创建正确的索引,并将临时集合重命名为目标集合。TimeLog
AbstractStep
还包含一些特定于域的属性,不加解释。
@Component
public class OperationalRouteStep extends AbstractStep {
private static final String COLLECTION_NAME = "operationalRoutes";
private static final String DATA_SET_NAME = "operational_routes";
private static final String DATA_SET_RID = "4ef1e435-cb2a-450e-ba18-e42263057379";
@Autowired
public OperationalRouteStep(MongoTemplate mongoTemplate, PalantirClient palantirClient) {
super(palantirClient, mongoTemplate, DATA_SET_NAME, COLLECTION_NAME, DATA_SET_RID);
}
@Override
protected void createIndexes() {
DBCollection tempDBCollection = getTempDBCollection();
String indexName = "shipment_version_instance_id_1";
logger.debug("Creating index [{}]", indexName);
Index index = new Index().on("SHIPMENT_VERSION_INSTANCE_ID", Sort.Direction.DESC)
.background();
tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
}
@Override
protected QueryColumnBuilder getQueryColumnBuilder(String branchName) {
return QueryColumnBuilder.queryForOperationalRoutesColumns(branchName);
}
@Override
protected CreationDetails createCreationDetail(DBObject dbObject, Date creationDate) {
return new CreationDetails(
"SHIPMENT_VERSION_INSTANCE_ID",
String.valueOf(dbObject.get("SHIPMENT_VERSION_INSTANCE_ID")),
creationDate
);
}
}
OperationalRouteStep
class 负责将数据集名称operation_routes 中的数据复制到名为的MongoDB集合 operationalRoutes
。
@Component
public class EquipmentCargoStep extends AbstractStep {
private static final String COLLECTION_NAME = "equipmentCargo";
private static final String DATA_SET_NAME = "equipment_cargo";
private static final String DATA_SET_RID = "0fc9d55a-142e-4385-883d-db1c1a5ef2b4";
@Autowired
public EquipmentCargoStep(MongoTemplate mongoTemplate, PalantirClient palantirClient) {
super(palantirClient, mongoTemplate, DATA_SET_NAME, COLLECTION_NAME, DATA_SET_RID);
}
@Override
protected void createIndexes() {
DBCollection tempDBCollection = getTempDBCollection();
String indexName = "fk_shipment_version_1";
logger.debug("Creating index [{}]", indexName);
Index index = new Index().on("FK_SHIPMENT_VERSION", Sort.Direction.DESC)
.background();
tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
indexName = "equipment_assignment_instance_id_1";
logger.debug("Creating index [{}]", indexName);
index = new Index().on("EQUIPMENT_ASSIGNMENT_INSTANCE_ID", Sort.Direction.DESC)
.background();
tempDBCollection.createIndex(index.getIndexKeys(), indexName, false);
}
@Override
protected QueryColumnBuilder getQueryColumnBuilder(String branchName) {
return QueryColumnBuilder.queryForEquipmentCargoColumns(branchName);
}
@Override
protected CreationDetails createCreationDetail(DBObject dbObject, Date creationDate) {
return new CreationDetails(
"EQUIPMENT_ASSIGNMENT_INSTANCE_ID",
String.valueOf(dbObject.get("EQUIPMENT_ASSIGNMENT_INSTANCE_ID")),
creationDate
);
}
}
批量配置在类MainBatchConfigurer
中完成。
@Configuration
public class MainBatchConfigurer implements BatchConfigurer {
@Autowired
private ExecutionContextDao mongoExecutionContextDao;
@Autowired
private JobExecutionDao mongoJobExecutionDao;
@Autowired
private JobInstanceDao mongoJobInstanceDao;
@Autowired
private StepExecutionDao mongoStepExecutionDao;
@Override
public JobRepository getJobRepository() {
return new SimpleJobRepository(
mongoJobInstanceDao,
mongoJobExecutionDao,
mongoStepExecutionDao,
mongoExecutionContextDao
);
}
@Override
public PlatformTransactionManager getTransactionManager() {
return new ResourcelessTransactionManager();
}
@Override
public SimpleJobLauncher getJobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
@Override
public JobExplorer getJobExplorer() {
return new SimpleJobExplorer(
mongoJobInstanceDao,
mongoJobExecutionDao,
mongoStepExecutionDao,
mongoExecutionContextDao
);
}
private JobOperator jobOperator() throws Exception {
SimpleJobOperator jobOperator = new SimpleJobOperator();
jobOperator.setJobLauncher(getJobLauncher());
jobOperator.setJobExplorer(getJobExplorer());
jobOperator.setJobRepository(getJobRepository());
jobOperator.setJobRegistry(jobRegistry());
return jobOperator;
}
private JobRegistry jobRegistry() {
return new MapJobRegistry();
}
private JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor() {
JobRegistryBeanPostProcessor postProcessor = new JobRegistryBeanPostProcessor();
postProcessor.setJobRegistry(jobRegistry());
return postProcessor;
}
}
调度任务 JobConfigurer:
@Component
public class JobConfigurer {
private static final String JOB_NAME = "SYNC-DBS";
private JobBuilderFactory jobBuilderFactory;
private StepBuilderFactory stepBuilderFactory;
private final Tasklet equipmentCargoStep;
private final Tasklet haulageEquipmentStep;
private final Tasklet haulageInfoStep;
private final Tasklet operationalRouteStep;
private final Tasklet trackingBookingStep;
private final Tasklet cargoConditioningStep;
@Autowired
public JobConfigurer(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory,
Tasklet equipmentCargoStep, Tasklet haulageEquipmentStep,
Tasklet haulageInfoStep, Tasklet operationalRouteStep,
Tasklet trackingBookingStep, Tasklet cargoConditioningStep) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
this.equipmentCargoStep = equipmentCargoStep;
this.haulageEquipmentStep = haulageEquipmentStep;
this.haulageInfoStep = haulageInfoStep;
this.operationalRouteStep = operationalRouteStep;
this.trackingBookingStep = trackingBookingStep;
this.cargoConditioningStep = cargoConditioningStep;
}
public Job synchroniseDatabasesJob() {
return jobBuilderFactory.get(JOB_NAME)
.incrementer(parametersIncrementer())
.preventRestart()
// '*' means that we do not care if it fails or success, step are independent and should be executed that way
.start(trackingBookingStep()).on("*").to(operationRouteStep())
.from(operationRouteStep()).on("*").to(equipmentCargoStep())
.from(equipmentCargoStep()).on("*").to(cargoConditioningStep())
.from(cargoConditioningStep()).on("*").to(haulageInfoStep())
// when the plan is COMPLETED, only then we can download equipments for plan
.from(haulageInfoStep()).on("COMPLETED").to(haulageEquipmentStep())
.from(haulageEquipmentStep()).on("*").end()
.end()
.build();
}
private JobParametersIncrementer parametersIncrementer() {
return jobParameters -> {
if (jobParameters == null || jobParameters.isEmpty()) {
return new JobParametersBuilder()
.addDate(JobParameter.RUN_ID.key(), new Date(System.currentTimeMillis()))
.toJobParameters();
}
Date id = jobParameters.getDate(JobParameter.RUN_ID.key(), new Date(System.currentTimeMillis()));
return new JobParametersBuilder()
.addDate(JobParameter.RUN_ID.key(), id)
.toJobParameters();
};
}
private Step trackingBookingStep() {
return this.stepBuilderFactory.get(trackingBookingStep.getClass().getSimpleName())
.tasklet(trackingBookingStep)
.build();
}
private Step operationRouteStep() {
return this.stepBuilderFactory.get(operationalRouteStep.getClass().getSimpleName())
.tasklet(operationalRouteStep)
.build();
}
private Step cargoConditioningStep() {
return this.stepBuilderFactory.get(operationalRouteStep.getClass().getSimpleName())
.tasklet(cargoConditioningStep)
.build();
}
private Step equipmentCargoStep() {
return this.stepBuilderFactory.get(equipmentCargoStep.getClass().getSimpleName())
.tasklet(equipmentCargoStep)
.build();
}
private Step haulageInfoStep() {
return this.stepBuilderFactory.get(haulageInfoStep.getClass().getSimpleName())
.tasklet(haulageInfoStep)
.build();
}
private Step haulageEquipmentStep() {
return this.stepBuilderFactory.get(haulageEquipmentStep.getClass().getSimpleName())
.tasklet(haulageEquipmentStep)
.build();
}
}
JobConfigurer
由于每个步骤都是独立的,因此前一个作业的结果是否成功并不重要。无论如何都应该执行下一步。
使用该方法Main:
@EnableBatchProcessing
@SpringBootApplication(exclude = { DataSourceAutoConfiguration.class })
public class Application implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(Application.class);
@Autowired
private MainBatchConfigurer batchConfigurer;
@Autowired
private JobConfigurer jobConfigurer;
@Value("${csv.directory}")
private String csvDirectory = "./csv";
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Override
public void run(String... strings) {
TimeLog timeLog = new TimeLog(getClass().getSimpleName());
try {
long timestamp = System.currentTimeMillis();
logger.debug("Start GCSS batch with timestamp '{}'.", timestamp);
SimpleJobLauncher jobLauncher = batchConfigurer.getJobLauncher();
JobParameters jobParameters = new JobParametersBuilder()
.addDate(JobParameter.RUN_ID.key(), new Date(timestamp))
.toJobParameters();
timeLog.logTime("Job execution");
JobExecution jobExecution = jobLauncher.run(jobConfigurer.synchroniseDatabasesJob(), jobParameters);
logger.debug("Job execution finished, timestamp = {}, job execution status = {}.", timestamp, jobExecution.getStatus());
} catch (Exception e) {
logger.error("Error when launching job.", e);
} finally {
try {
timeLog.logTime("Delete " + csvDirectory);
FileUtils.forceDelete(new File(csvDirectory));
} catch (IOException e) {
logger.error("Error when deleting {} directory.", csvDirectory, e);
}
}
timeLog.done();
System.exit(0);
}
}
总结:
构建Spring Batch应用程序,将CSV文件转换为MongoDB集合 以上为简单示例说明。
原文地址:https://dzone.com/articles/spring-batch-typical-use-case