{
  "id": "spanner",
  "title": "Prepare Spanner for RDI",
  "url": "https://redis.io/docs/latest/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner/",
  "summary": "Prepare Google Cloud Spanner databases to work with RDI",
  "tags": [
    "docs",
    "integrate",
    "rs",
    "rdi"
  ],
  "last_updated": "2026-04-01T08:10:08-05:00",
  "page_type": "content",
  "content_hash": "8a537e32bb8d50df5695aad096dd1d9c2d75fc5d9dad7cc1e229bdf3c1d0fd7a",
  "sections": [
    {
      "id": "overview",
      "title": "Overview",
      "role": "overview",
      "text": "Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI.\nRDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot\nphase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the\ndatabase. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to\nthe monitored schemas and tables.\n\n\nSpanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database.\n\n\nThe following checklist summarizes the steps to prepare a Spanner\ndatabase for RDI, with links to the sections that explain the steps in\nfull detail. You may find it helpful to track your progress with the\nchecklist as you complete each step.\n\n[code example]"
    },
    {
      "id": "1-prepare-for-snapshot",
      "title": "1. Prepare for snapshot",
      "role": "content",
      "text": "During the snapshot phase, RDI executes multiple transactions to capture data at an exact point \nin time that remains consistent across all queries. This is achieved using a Spanner feature called \n[Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). \n\nThis feature relies on the \n[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), \nwhich is set to one hour by default. Depending on the database tier, the volume of data to be \ningested into RDI, and the load on the database, this setting may need to be increased. You can \nupdate it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period)."
    },
    {
      "id": "2-prepare-for-streaming",
      "title": "2. Prepare for streaming",
      "role": "content",
      "text": "To enable streaming, you must create a change stream in Spanner at the database level. Use the \noption `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated \nrow values.\n\nBe sure to specify only the tables you want to ingest from and, optionally, the specific columns \nyou're interested in. Here's an example using Google SQL syntax:\n\n[code example]\n\nRefer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) \nfor more details, including additional configuration options and dialect-specific syntax."
    },
    {
      "id": "3-create-a-service-account",
      "title": "3. Create a service account",
      "role": "content",
      "text": "To allow RDI to access the Spanner instance, you'll need to create a service account with the\nappropriate permissions. By default, RDI uses Google Cloud Workload Identity authentication. In this case RDI will assume the [service account is assigned to the GKE cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_on_clusters_and_node_pools). Alternatively, you can provide the\nservice account credentials as a Kubernetes secret (see step 4 for details).\n\n[code example]\n\n1. <a id=\"create-the-service-account\"></a>\n   Create the service account\n\n    [code example]\n\n1. <a id=\"grant-required-roles\"></a>\n   Grant required roles:\n\n    **Database Reader** (read access to Spanner data):\n\n    [code example]\n\n    **Database User** (query execution and metadata access):\n\n    [code example]\n\n    **Viewer** (viewing instance and database configuration):\n\n    [code example]\n\n1. <a id=\"download-the-service-account-key\"></a>\n   Download the service account key:\n\n    Save the credentials locally so they can be used later by RDI:\n\n    [code example]"
    },
    {
      "id": "authentication-methods",
      "title": "Authentication methods",
      "role": "security",
      "text": "RDI supports two authentication methods for accessing Spanner:\n\n1. **Workload Identity (default)**: The service account is assigned to the GKE cluster, and RDI\n   automatically uses the cluster's identity to authenticate. This is the recommended approach\n   as it's more secure and doesn't require managing credential files.\n\n2. **Service account credentials file**: You provide the service account key file as a Kubernetes\n   secret. This method requires setting `use_credentials_file: true` in your RDI configuration."
    },
    {
      "id": "4-set-up-secrets-for-kubernetes-deployment-optional",
      "title": "4. Set up secrets for Kubernetes deployment (optional)",
      "role": "content",
      "text": "Before deploying the RDI pipeline, you need to configure the necessary secrets for the target\ndatabase. Instructions for setting up the target database secrets are available in the\n[RDI deployment guide]().\n\n**Optional**: If you prefer to use a service account credentials file instead of Workload Identity\nauthentication, you'll need to create a Spanner-specific secret named `source-db-credentials`.\nThis secret should contain the service account key file generated during the Spanner setup phase.\nUse the command below to create it:\n\n[code example]\n\nBe sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is\nstored elsewhere.\n\n\nIf you create the `source-db-credentials` secret, you must also set `use_credentials_file: true`\nin your RDI configuration to use the credentials file instead of Workload Identity authentication."
    },
    {
      "id": "5-configure-rdi-for-spanner",
      "title": "5. Configure RDI for Spanner",
      "role": "content",
      "text": "When configuring your RDI pipeline for Spanner, use the following example configuration in your \n`config.yaml` file:\n\n[code example]\n\nMake sure to replace the relevant connection details with your own for both the Spanner and target \nRedis databases."
    },
    {
      "id": "6-additional-kubernetes-configuration",
      "title": "6. Additional Kubernetes configuration",
      "role": "content",
      "text": "In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` \nsection like this:\n\n[code example]"
    },
    {
      "id": "7-configuration-is-complete",
      "title": "7. Configuration is complete",
      "role": "content",
      "text": "Once you have followed the steps above, your Google Spanner database is ready for RDI to use."
    }
  ],
  "examples": [
    {
      "id": "overview-ex0",
      "language": "checklist {id=\"spannerlist\"}",
      "code": "- [ ] [Prepare for snapshot](#1-prepare-for-snapshot)\n- [ ] [Prepare for streaming](#2-prepare-for-streaming)\n- [ ] [Create a service account](#3-create-a-service-account)\n- [ ] [Set up secrets for Kubernetes deployment (optional)](#4-set-up-secrets-for-kubernetes-deployment-optional)\n- [ ] [Configure RDI for Spanner](#5-configure-rdi-for-spanner)\n- [ ] [Additional Kubernetes configuration](#6-additional-kubernetes-configuration)",
      "section_id": "overview"
    },
    {
      "id": "2-prepare-for-streaming-ex0",
      "language": "sql",
      "code": "CREATE CHANGE STREAM change_stream_table1_and_table2\n  FOR table1, table2\n  OPTIONS (\n    value_capture_type = 'NEW_ROW_AND_OLD_VALUES'\n  );",
      "section_id": "2-prepare-for-streaming"
    },
    {
      "id": "3-create-a-service-account-ex0",
      "language": "checklist {id=\"spanner-service-account\" nointeractive=\"true\" }",
      "code": "- [ ] [Create the service account](#create-the-service-account)\n- [ ] [Grant required roles](#grant-required-roles)\n- [ ] [Download the service account key](#download-the-service-account-key)",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "3-create-a-service-account-ex1",
      "language": "bash",
      "code": "gcloud iam service-accounts create spanner-reader-account \\\n        --display-name=\"Spanner Reader Service Account\" \\\n        --description=\"Service account for reading from Spanner databases\" \\\n        --project=YOUR_PROJECT_ID",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "3-create-a-service-account-ex2",
      "language": "bash",
      "code": "gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n        --member=\"serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n        --role=\"roles/spanner.databaseReader\"",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "3-create-a-service-account-ex3",
      "language": "bash",
      "code": "gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n        --member=\"serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n        --role=\"roles/spanner.databaseUser\"",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "3-create-a-service-account-ex4",
      "language": "bash",
      "code": "gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n        --member=\"serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n        --role=\"roles/spanner.viewer\"",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "3-create-a-service-account-ex5",
      "language": "bash",
      "code": "gcloud iam service-accounts keys create ~/spanner-reader-account.json \\\n        --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \\\n        --project=YOUR_PROJECT_ID",
      "section_id": "3-create-a-service-account"
    },
    {
      "id": "4-set-up-secrets-for-kubernetes-deployment-optional-ex0",
      "language": "bash",
      "code": "kubectl create secret generic source-db-credentials --namespace=rdi \\\n--from-file=gcp-service-account.json=~/spanner-reader-account.json \\\n--save-config --dry-run=client -o yaml | kubectl apply -f -",
      "section_id": "4-set-up-secrets-for-kubernetes-deployment-optional"
    },
    {
      "id": "5-configure-rdi-for-spanner-ex0",
      "language": "yaml",
      "code": "sources:\n  source:\n    type: flink\n    connection:\n      type: spanner\n      project_id: your-project-id\n      instance_id: your-spanner-instance\n      database_id: your-spanner-database\n      # use_credentials_file: false  # Default: uses Workload Identity. Set to true to use service account credentials file instead\n      change_streams:\n        change_stream_all:\n          {}\n          # retention_hours: 24\n    # schemas:\n    #  - DEFAULT\n    # tables:\n    #   products: {}\n    #   orders: {}\n    #   order_items: {}\n    # logging:\n    #   level: debug\n    # advanced:\n    #   source:\n    #     spanner.change.stream.retention.hours: 24\n    #     spanner.fetch.timeout.milliseconds: 20000\n    #     spanner.dialect: POSTGRESQL\n    #   flink:\n    #     jobmanager.rpc.port: 7123\n    #     jobmanager.memory.process.size: 1024m\n    #     taskmanager.numberOfTaskSlots: 3\n    #     taskmanager.rpc.port: 7122\n    #     taskmanager.memory.process.size: 2g\n    #     blob.server.port: 7124\n    #     rest.port: 8082\n    #     parallelism.default: 4\n    #     restart-strategy.type: fixed-delay\n    #     restart-strategy.fixed-delay.attempts: 3\ntargets:\n  target:\n    connection:\n      type: redis\n      host: ${HOST_IP}\n      port: 12000\n      user: ${TARGET_DB_USERNAME}\n      password: ${TARGET_DB_PASSWORD}\nprocessors:\n  target_data_type: hash",
      "section_id": "5-configure-rdi-for-spanner"
    },
    {
      "id": "6-additional-kubernetes-configuration-ex0",
      "language": "yaml",
      "code": "operator:\n  dataPlane:\n    flinkCollector:\n      enabled: true\n      jobManager:\n        ingress:\n          enabled: true\n          className: traefik # Replace with your ingress controller\n          hosts:\n            - hostname # Replace with your desired ingress hostname",
      "section_id": "6-additional-kubernetes-configuration"
    }
  ]
}