Apache Beam - skip pipeline step

Chris Halcrow :

I'm using Apache Beam to set up a pipeline consisting of 2 main steps:

transform the data using a Beam Transform
load the transformed data to BigQuery

The pipeline setup looks like this:

myPCollection = (org.apache.beam.sdk.values.PCollection<myCollectionObjectType>)myInputPCollection
                .apply("do a parallel transform"),
                     ParDo.of(new MyTransformClassName.MyTransformFn()));

 myPCollection
    .apply("Load BigQuery data for PCollection",
            BigQueryIO.<myCollectionObjectType>write()
            .to(new MyDataLoadClass.MyFactTableDestination(myDestination))
            .withFormatFunction(new MyDataLoadClass.MySerializationFn())

I've looked at this question:

Apache Beam: Skipping steps in an already-constructed pipeline

which suggests that I may be able to somehow dynamically change which output I can pass data to, following the parallel transform in step 1.

How do I do this? I don't know how to choose whether or not to pass myPCollection from step 1 to step 2. I need to skip step 2 if the object in myPCollection from step 1 is null.

Anton :

You just don't emit the element from your MyTransformClassName.MyTransformFn when you don't want it in the next step, for example something like this:

class MyTransformClassName.MyTransformFn extends...
  @ProcessElement
  public void processElement(ProcessContext c, ...) {
    ...
    result = ...
    if (result != null) {
       c.output(result);   //only output something that's not null
    }
  }

This way nulls don't reach the next step.

See the ParDo section of the guide for more details: https://beam.apache.org/documentation/programming-guide/#pardo

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-09-11

Comments

0 comments

skip header while reading a CSV file in Apache Beam

Create Apache Beam Pipeline that read from Google Pub/Sub

Group elements in Apache Beam pipeline

Collecting output from Apache Beam pipeline and displaying it to console

Apache Beam - skip pipeline step

Apache Beam - skip pipeline step

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

How to import an asset in swift using Bundle.main.path() in a react-native native module

pump.io port in URL

Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

BigQuery - concatenate ignoring NULL

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to remove the extra space from right in a webview?

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

flutter: dropdown item programmatically unselect problem

How to use merge windows unallocated space into Ubuntu using GParted?

Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

Nuget add packages gives access denied errors

Svchost high CPU from Microsoft.BingWeather app errors

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

12.04.3--- Dconf Editor won't show com>canonical>unity option

Any way to remove trailing whitespace *FOR EDITED* lines in Eclipse [for Java]?

maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

Any way to remove trailing whitespace FOR EDITED lines in Eclipse [for Java]?