How to submit multiple Flink Jobs using Single Flink Application

ardhani

Say I have a Flink application to filter, transform, and process stream.

How to break this application into two jobs and communicate b/w them without using the intermittent store.

Refer to the below image for dataflow.

Reason for the use case :

Event size : 2KB, Event lite : 200B, TPS: 1M

For effective usage of Java Heap to store more events at any given time transformation is required. Doing all three on single TaskManager has a disadvantage of storing the ingested events as well, where nearly 80% of events are not required.

Running these jobs on different task managers will give great flexibility in scaling the processing function.

Need help in achieving this, any suggestion is welcome. Also trying to understand how multiple jobs can be submitted via a single Flink Application.

enter image description here

David Anderson

Several points:

Application mode, introduced in Flink 1.11, allows a single main() method to submit multiple jobs, but there's no mechanism for direct communication between these jobs. Flink's approach to fault tolerance via snapshotting doesn't extend to managing state in more than one job.

You could, hypothetically, connect the jobs with a socket sink and socket source. But you'll give up fault tolerance if you do this.

You can achieve something similar to what you've asked for by configuring a slot sharing group that forces the final stage(s) of the pipeline into their own slot(s). However, this is almost certainly a bad idea, as it will force ser/de that might otherwise be unnecessary, and also result in poorer resource utilization. But it will separate those pipeline stages into another JVM.

If the goal is to have separately deployable and independently scalable components, you can get that by using remote functions with the Stateful Functions API.

To maximize performance (and minimize garbage collection) with the sort of ETL job you've shown, you're probably better off if you take advantage of operator chaining and object reuse, and keep everything in a single job.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Flink on Kubernetes: how to submit jobs to session clusters?

Multiple jobs or multiple pipelines in one job in Flink

How to submit a job, by using an uploaded jar in Apache Flink?

How to recover Flink Sql jobs from checkpoint?

How to submit multiple spark jobs to single AWS EMR cluster

How to submit Flink job to a remote YARN cluster?

Multiple elasticsearch sinks for a single flink pipeline

How to support multiple KeyBy in Flink

How to use multiple counters in Flink

Flink, how to set parallelism properly when using multiple Kafka source?

How to get state for multiple keyBy in flink using queryable state client?

Not able to submit arguments with values within single quote to flink job

how to stream a json using flink?

In Flink, how to write DataStream to single file?

How can i share state between my flink jobs?

How to build a Flink application with Maven in Linux

what is the way to submit jobs via console to flink-standalone zookeeper-recovery-mode cluster?

flink 1.12.1 example application failing on a single node yarn cluster

Equally distribute operators with single parallelism in a multi-parallel Flink application

How to "submit" an ad-hoc SQL to Beam on Flink

Failed to submit JobGraph Apache Flink

Flink Stateful Functions with an existing Flink application

Streaming metrics for Flink application

Latency Monitoring in Flink application

How to unit test flink timers using mockito

How to decode Kafka messages using Avro and Flink

How to write incremental data to hive using flink

How to stop a Flink job using REST API

How to read websocket data using Apache Flink