Saulo Ricci Saulo Ricci - 3 months ago 39
Java Question

Querying a relational database through Google DataFlow Transformer

I would like to implement a

Transformer on my Dataflow Pipeline, that basically query a relational database based on the data provided by each element to be processed. I know every attribute in an user defined transformer must be serializable, but to query data to a database, using
I need to create a
that is naturally non serializable object.

Is still possible to do that in the Dataflow Pipeline context?


Yes it is possible. You could make your Connection object transient so that its not serialized and create it once per bundle through the startBundle method. Once all the elements in the bundle are processed, the connection can be closed through the finishBundle method.

class MyDoFn extends DoFn<X, Y> {
  private transient Connection jdbc;

  // Called once per bundle
  public void startBundle(Context c) {
    jdbc = // Create connection

  public void processElement(ProcessContext c) {
    // query database

  public void finishBundle(Context c) {
    // close connection