How to submit multiple Flink Jobs using Single Flink Application

ardhani Published at Dev

ardhani

Say I have a Flink application to filter, transform, and process stream.

How to break this application into two jobs and communicate b/w them without using the intermittent store.

Refer to the below image for dataflow.

Reason for the use case :

Event size : 2KB, Event lite : 200B, TPS: 1M

For effective usage of Java Heap to store more events at any given time transformation is required. Doing all three on single TaskManager has a disadvantage of storing the ingested events as well, where nearly 80% of events are not required.

Running these jobs on different task managers will give great flexibility in scaling the processing function.

Need help in achieving this, any suggestion is welcome. Also trying to understand how multiple jobs can be submitted via a single Flink Application.

David Anderson

Several points:

Application mode, introduced in Flink 1.11, allows a single main() method to submit multiple jobs, but there's no mechanism for direct communication between these jobs. Flink's approach to fault tolerance via snapshotting doesn't extend to managing state in more than one job.

You could, hypothetically, connect the jobs with a socket sink and socket source. But you'll give up fault tolerance if you do this.

You can achieve something similar to what you've asked for by configuring a slot sharing group that forces the final stage(s) of the pipeline into their own slot(s). However, this is almost certainly a bad idea, as it will force ser/de that might otherwise be unnecessary, and also result in poorer resource utilization. But it will separate those pipeline stages into another JVM.

If the goal is to have separately deployable and independently scalable components, you can get that by using remote functions with the Stateful Functions API.

To maximize performance (and minimize garbage collection) with the sort of ETL job you've shown, you're probably better off if you take advantage of operator chaining and object reuse, and keep everything in a single job.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-29

Comments

0 comments

TOP Ranking

Article

How to submit multiple Flink Jobs using Single Flink Application

How to submit multiple Flink Jobs using Single Flink Application

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?