什么是使用Java从另一台postgres服务器中插入一百万行到postgresql服务器中的有效方法？

hwak 发表于 Java

我有两个postgresql服务器，我需要从第一种服务器格式复制表行，然后转换为另一种服务器格式（不同的列名）。

我将Java应用程序与spring boot和jpa仓库一起使用，该仓库实现了方法findAll，流读取大小为1000。

    @Query("select c from ExternalFormatEntity c")
    @QueryHints(@javax.persistence.QueryHint(name = "org.hibernate.fetchSize",
            value = Constants.DEFAULT_FETCH_SIZE))
    Stream<ExternalFormatEntity> findAllEntities();

阅读后，我将转换并批量插入1000行。

try (Stream<ExternalFormatEntity> allExtEntitiesStream = extFormatService.getAllEntities()) {
    LinkedList<CanonicalFormatEntity> canonicalEntityList = new LinkedList<>();
        allExtEntitiesStream.forEach(extEntity -> {
            if (Objects.nonNull(extEntity)) {
                canonicalEntityList.add(SomeConverter.convert(extEntity));
            }
            if (canonicalEntityList.size() >= DEFAULT_BATCH_SIZE) {
                List<CanonicalFormatEntity> copyList = new LinkedList<>(canonicalEntityList);
                canonicalEntityList.clear();
                Thread thread = new Thread(() -> {
                    canonicalEntityRepository.saveAll(copyList);
                    canonicalEntityRepository.flush();
                    copyList.clear();
                });
                thread.start();
            }
        });
}

我认为，对于100万条记录，此操作的当前速度可以快于1小时。我可以加快此操作的速度吗，如果可以，该怎么办？

首先，我试图将表记录从第一个数据库转换为CSV文件，将其保存在另一台服务器上，并使用Postgres Copy Api进行下载，但是由于硬盘的额外操作，汇总时间仍然不可接受。

也许postgres拥有流写作或其他功能？我在官方的PostgreSQL文档中找不到答案。

对于我来说，下一个解决方案有所帮助

使用zip压缩将外部表导出到csv文件（示例来自StackOverflow答案：https ://stackoverflow.com/a/3981807/3744622 ）
将小型zip文件复制到/ tmp文件夹中的postgres服务器 scp root@ext_server:/path/to/file root@target_server:/tmp/
从csv压缩文件导入表（来自StackOverflow答案的示例：https ://stackoverflow.com/a/46228247/3744622 ）