Alternative for apache beam PubSub read withIdAttribute in normal pubsub client library

Xitrum

In beam sdk, pubusbIO read provides an option to deduplicate messages by using message id: https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html#withIdAttribute-java.lang.String-

When I checkout Pubsub client libs (for java and python), I don't see there is a similar option for using message id to deduplicate messages.

So my questions are:

  1. Do pubsub client libs (python and java) have similar functionality? Perhaps I missed it because of different naming.
  2. If they don't, how are you handling this situation? I'm just curious how it is solved as an inspiration. Cause I'm thinking about using a cache to store most recent message ids for deduplication purpose in my client application.

Thank you.

guillaume blaquiere

There isn't the same feature in the PubSub client library. Cloud Dataflow, that run Beam pipeline, keep a cache of the latest messageIds (I don't know how many and how many time, but it's only few minutes). It's a Beam feature.

When you use PubSub, and because PubSub guaranty only at-least-one-delivery, it's recommended to have your process idempotent

In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Apache Beam PubSub Reader Exceptions

Example with Apache Beam Java SDK and PubSub Source

How to extract Google PubSub publish time in Apache Beam

Apache Beam/Google Dataflow PubSub to BigQuery Pipeline: Handling Insert Errors and Unexpected Retry Behavior

How to Process my PubSub Message Object and Write all objects into BigQuery in Apache Beam using python?

PubSub to BigQuery - Dataflow/Beam template in Python?

gcloud PubSub library for embedded devices

Unable to read from GCP pubsub

How can I get the number of undelivered messages (metric api) present in Pubsub using python client library?

Memory leak in Google PubSub Python client

transactional pubsub in Redis with python redis client

Connecting Azure Web PubSub Client Disconnects others

How to add google cloud pubsub as a source in Beam SQL shell?

Beam / Cloud Dataflow: How to Add Kafka (or PubSub) topics to Running Stream

Beam / Dataflow Custom Python job - Cloud Storage to PubSub

Apache Spark’s Structured Streaming with Google PubSub

Read AVRO messages from PubSub in Dataflow Python

how to publish and parse custom payload for pubsub xmpp smack library

Is there a golang redis client that auto detects new shards for pubsub?

Google PubSub's unacked messaged not getting pulled by the client

Redis golang client periodically discarding bad PubSub connection (EOF)

Google PubSub python client returning StatusCode.UNAVAILABLE

Real time transformations from/to pubsub and websocket push to client

Dynamic number of attributes in PubSub publish() function for python client

Apache Beam - BigQueryIO read Projection

Read whole file in Apache Beam

How to read CSVRecord in apache beam?

Apache Camel with Google Pubsub authentication via Json File

Google Cloud PubSub: How to read only latest records