Skip to content

[BUG] Launching dataflow with Spanner #108

@Andrei-Strenkovskii

Description

@Andrei-Strenkovskii

Expected Behavior

Pipeline with Spanner is launched to Dataflow and executed successfully

Current Behavior

Facing an error when launching any Dataflow which has Spanner as source ot target database

2025-09-03 13:49:57.908 GET
Exception in thread "main" ; see consecutive INFO logs for details.
2025-09-03 13:49:57.928 GET
Error: Template launch failed: exit status 1
2025-09-03 13:50:35.685 GET
Error occurred in the launcher container: Template launch failed. See console logs.

Context

To build Docker image, I had to add one dependency to pom.xml

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>3.7.0</version>
    </dependency>

Without it, mvn clean package fails with an error. Can you also check it?

I am trying to sync data from Spanner table to BQ
Building docker:
mvn clean package -DskipTests -Dimage=us-east4-docker.pkg.dev/pp-ocean-staging/sxope-emr-files-loader/maven-dataflow-template:latest
Creating flex template:

gcloud dataflow flex-template build gs://sxp-ocean-test/beam_test/maven-flex-dataflow.json \
  --image "us-east4-docker.pkg.dev/pp-ocean-staging/sxope-emr-files-loader/maven-dataflow-template:latest" \
  --sdk-language "JAVA"

Running flex template:

gcloud dataflow flex-template run maven-dataflow-test \
  --region=us-east4 \
  --template-file-gcs-location=gs://sxp-ocean-test/beam_test/maven-flex-dataflow.json \
  --parameters=config="$(spanner-to-bigquery.json)”

Config example:

{
  "sources": [
    {
      "name": "spanner",
      "module": "spanner",
      "parameters": {
        "projectId": "pp-data-staging",
        "instanceId": "sxope-shared-instance",
        "databaseId": "sxope-member-datapoints-stagin",
        "query": "select emr_member_id from emr_members"
      }
    }
  ],
  "sinks": [
    {
      "name": "bigquery",
      "module": "bigquery",
      "input": "spanner",
      "parameters": {
        "table": "pp-import-staging:sb_temp.emr_data_ocean_11039",
        "createDisposition": "CREATE_IF_NEEDED",
        "writeDisposition": "WRITE_TRUNCATE"
      }
    }
  ]
}

Dataflow doesn't render anything and fails

Image

I also tried to launch an example config from README

sources:
  - name: bigquery
    module: bigquery
    parameters:
      query: |-
        SELECT
          *
        FROM
          `myproject.mydataset.mytable`
sinks:
  - name: spanner
    module: spanner
    inputs:
      - bigquery
    parameters:
      projectId: myproject
      instanceId: myinstance
      databaseId: mydatabase
      table: mytable

But it didnt work out for me also
Can you check is it working for you?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions