Google Cloud Dataflow + Java 8 vs Java 11：相同的管道，不同的CPU使用率

c f 发表于 Dev

碳纤维

我有一个Beam 2.25.0管道，该管道可以获取一些数据，生成一堆更多的数据（执行扇出操作），对新数据进行分区，并在生成的数据上并行运行计算。我为工作指定的机器是n1-highmem-4，我最多指定40个工人。

在Java 8下运行良好：所有提供给该工作的工作人员都得到了充分利用（> 90％的CPU）。吞吐量为40个元素/秒。

当我重新编译并重新运行管道以使用Java 11时，为该作业提供了相同数量的工作程序，但它们仅达到30％的CPU利用率，而吞吐量则低于18 / s。

In order for me to get the job to reach the same throughput numbers, I have to specify the --numberOfWorkerHarnessThreads=4 flag, and even then, throughput is still not 40/s like when I run the pipeline under Java 8.

What could be the difference between using Java 8 vs Java 11 for the pipeline? And why wouldn't the pipeline running under Java 11 automatically utilize the workers the same way as under Java 8?

I also tried recompiling and using Beam 2.26.0 for the Java 11 pipeline execution, but it had the same throughput.

Iñigo

There is one bug in Beam that makes the pipeline to default to only use 1 harness thread for Batch in Java 11. Specifying numberOfWorkerHarnessThreads=4 makes the pipeline to use 4 harness would make it to use 4 thread.

您可以看到工作人员确实使用了大约25％的Cpu，（由于您使用的是四核计算机，因为它是n1-highmem-4，从帖子中看起来），这意味着100％/ 4核= 25％。

从Jira来看，它应该在2.26.0中修复，但也许被延迟到2.27.0

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-27

我来说两句

0 条评论

登录后参与评论

上一篇：Google Apps脚本上的附件

如何使用Google Cloud Dataflow解决java.lang.NoSuchMethodError：com.google.api.services.dataflow.model.Environment.setSdkPipelineOptions？

TOP 榜单

文章

Google Cloud Dataflow + Java 8 vs Java 11：相同的管道，不同的CPU使用率

Google Cloud Dataflow + Java 8 vs Java 11：相同的管道，不同的CPU使用率

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID