使用Spring Batch Integration为AWS S3中的每个新文件启动JobLaunchRequest

吉尔赫姆·贝纳迪（Guilherme Bernardi）

我正在关注文档：Spring Batch Integration与Integration AWS相结合，用于合并AWS S3。

但是在某些情况下，每个文件的批处理执行不起作用。

AWS S3 Pooling正常运行，因此当我放置新文件或启动应用程序且存储桶中有文件时，应用程序将与本地目录同步：

    @Bean
    public S3SessionFactory s3SessionFactory(AmazonS3 pAmazonS3) {
        return new S3SessionFactory(pAmazonS3);
    }

    @Bean
    public S3InboundFileSynchronizer s3InboundFileSynchronizer(S3SessionFactory pS3SessionFactory) {
        S3InboundFileSynchronizer synchronizer = new S3InboundFileSynchronizer(pS3SessionFactory);
        synchronizer.setPreserveTimestamp(true);
        synchronizer.setDeleteRemoteFiles(false);
        synchronizer.setRemoteDirectory("remote-bucket");
        //synchronizer.setFilter(new S3PersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "simpleMetadataStore"));
        return synchronizer;
    }

    @Bean
    @InboundChannelAdapter(value = IN_CHANNEL_NAME, poller = @Poller(fixedDelay = "30"))
    public S3InboundFileSynchronizingMessageSource s3InboundFileSynchronizingMessageSource(
            S3InboundFileSynchronizer pS3InboundFileSynchronizer) {
        S3InboundFileSynchronizingMessageSource messageSource = new S3InboundFileSynchronizingMessageSource(pS3InboundFileSynchronizer);
        messageSource.setAutoCreateLocalDirectory(true);
        messageSource.setLocalDirectory(new FileSystemResource("files").getFile());
        //messageSource.setLocalFilter(new FileSystemPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "fsSimpleMetadataStore"));
        return messageSource;
    }

    @Bean("s3filesChannel")
    public PollableChannel s3FilesChannel() {
        return new QueueChannel();
    }

我按照教程进行操作，因此创建了代码，因为它与文档相同，所以我FileMessageToJobRequest 不会在此处放置代码

所以我创建了bean IntegrationFlow和FileMessageToJobRequest：

    @Bean
    public IntegrationFlow integrationFlow(
            S3InboundFileSynchronizingMessageSource pS3InboundFileSynchronizingMessageSource) {
        return IntegrationFlows.from(pS3InboundFileSynchronizingMessageSource, 
                         c -> c.poller(Pollers.fixedRate(1000).maxMessagesPerPoll(1)))
                .transform(fileMessageToJobRequest())
                .handle(jobLaunchingGateway())
                .log(LoggingHandler.Level.WARN, "headers.id + ': ' + payload")
                .get();
    }

    @Bean
    public FileMessageToJobRequest fileMessageToJobRequest() {
        FileMessageToJobRequest fileMessageToJobRequest = new FileMessageToJobRequest();
        fileMessageToJobRequest.setFileParameterName("input.file.name");
        fileMessageToJobRequest.setJob(delimitedFileJob);
        return fileMessageToJobRequest;
    }

因此，在JobLaunchingGateway中，我认为是问题所在：

如果我这样创建：

    @Bean
    public JobLaunchingGateway jobLaunchingGateway() {
        SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
        simpleJobLauncher.setJobRepository(jobRepository);
        simpleJobLauncher.setTaskExecutor(new SyncTaskExecutor());
        JobLaunchingGateway jobLaunchingGateway = new JobLaunchingGateway(simpleJobLauncher);

        return jobLaunchingGateway;
    }

情况1（应用程序启动时存储桶为空）：

我在AWS S3中上传了一个新文件；
池工作，文件出现在本地目录中；
但是转换/作业没有被解雇。

情况2（应用程序启动时，存储桶中已经有一个文件）：

作业启动：

2021-01-12 13:32:34.451  INFO 1955 --- [ask-scheduler-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=arquivoDelimitadoJob]] launched with the following parameters: [{input.file.name=files/FILE1.csv}]
2021-01-12 13:32:34.524  INFO 1955 --- [ask-scheduler-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [delimitedFileJob]

如果我在S3中添加第二个文件，则该工作不会像情况1那样启动。

情况3（存储桶有多个文件）：

文件在本地目录中正确同步
但是作业仅对最后一个文件执行一次。

因此，按照文档，我将网关更改为：

    @Bean
    @ServiceActivator(inputChannel = IN_CHANNEL_NAME, poller = @Poller(fixedRate="1000"))
    public JobLaunchingGateway jobLaunchingGateway() {
        SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
        simpleJobLauncher.setJobRepository(jobRepository);
        simpleJobLauncher.setTaskExecutor(new SyncTaskExecutor());

        //JobLaunchingGateway jobLaunchingGateway = new JobLaunchingGateway(jobLauncher());
        JobLaunchingGateway jobLaunchingGateway = new JobLaunchingGateway(simpleJobLauncher);
        //jobLaunchingGateway.setOutputChannel(replyChannel());
        jobLaunchingGateway.setOutputChannel(s3FilesChannel());
        return jobLaunchingGateway;
    }

有了这个新的网关实现，如果我在S3中放了一个新文件，则应用程序会做出反应，但不会进行转换，从而给出错误：

Caused by: java.lang.IllegalArgumentException: The payload must be of type JobLaunchRequest. Object of class [java.io.File] must be an instance of class org.springframework.batch.integration.launch.JobLaunchRequest

并且，如果存储桶中有两个文件（应用程序启动时）为FILE1.csv和FILE2.csv，则该作业会正确运行FILE1.csv，但会为FILE2.csv提供上述错误。

实现这样的东西的正确方法是什么？

为了清楚起见，我想在这个存储桶中接收数千个csv文件，使用Spring Batch进行读取和处理，但是我还需要尽快从S3获取每个新文件。

提前致谢。

阿尔特姆·比兰（Artem Bilan）

该JobLaunchingGateway确实是从我们预计仅JobLaunchRequest作为有效载荷。

既然您@InboundChannelAdapter(value = IN_CHANNEL_NAME, poller = @Poller(fixedDelay = "30"))在S3InboundFileSynchronizingMessageSourcebean定义上有了它，那么@ServiceActivator(inputChannel = IN_CHANNEL_NAME在JobLaunchingGateway没有FileMessageToJobRequest变压器的情况下拥有它确实是错误的。

您integrationFlow看起来对我来说还可以，但是您确实需要@InboundChannelAdapter从S3InboundFileSynchronizingMessageSourceBean中删除它，并完全依赖于c.poller()配置。

另一种方法是留下@InboundChannelAdapter，但随后开始IntegrationFlow从IN_CHANNEL_NAME不是一个MessageSource。

由于您具有针对同一S3源的多个轮询器，而且这两个轮询器均基于同一本地目录，因此看到如此多的意外情况就不足为奇了。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-27

我来说两句

0 条评论

登录后参与评论

上一篇：如何构造请求异常处理并检查200个响应？

TOP 榜单

文章

使用Spring Batch Integration为AWS S3中的每个新文件启动JobLaunchRequest

使用Spring Batch Integration为AWS S3中的每个新文件启动JobLaunchRequest

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException