How to use Apache Flink with lookup data?

user432024

.

Hi,

using Apache Flink 1.8. I have a stream of records coming in from Kafka as JSON and filtering them and that all works fine.

Now, I would like to enrich the data from Kafka with a look up value from a database table.

Is that just a case of creating 2 streams, loading the table in the 2nd stream and then joining the data?

The database table does get updated but not frequently and I would like to avoid looking up the DB on every record that comes through the stream.

zcarioca

Flink has state, which you could take advantage of here. I've done something similar, where I took a daily query from my lookup table (in my case it was a bulk webservice call) and through the results into a kafka topic. This kafka topic was being consumed by the same service flink job as that needed the data for lookups. Both topics were keyed by the same value, but I used the lookup topic to store data into a keyed state, and when processing the other topic, I'd pull the data back out of state.

I had some additional logic to check if there was NO state yet for a given key. If that was the case, I'd make an async request to the webservice. You may not need to do that however.

The caveat here is that I had memory for state management, and my lookup table was only about 30-million records, about 100 gigs spread across 45 slots on 15 nodes.

[In answer to question in comments] Sorry, but my answer was too long, so had to edit my post:

I had a python job that loaded the data via a bulk REST call (yours could just do a data lookup). It then transformed the data into the correct format and dumped it into Kafka. Then my flink flow had two sources, one was the 'real data' topic, the other was the 'lookup data' topic. Data coming from the lookup data topic was stored in state (I used a ValueState because each key mapped to a single possible value, but there are other state types. I also had a 24 hour expiration time for each entry, but that was my usecase.

The trick is that the same operation that stores the value in state from the lookup topic, has to be the operation that pulls the value back out of state from the 'real' topic. This is because flink state (even keyed states) are tied to the operator that created them.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related