Prepare Snowflake for RDI
Prepare Snowflake databases to work with RDI
This guide describes the steps required to prepare a Snowflake database as a source for Redis Data Integration (RDI) pipelines.
During both the snapshot and
Change data capture (CDC)
phases, RDI uses Snowflake Streams to read data from the monitored
tables. For the initial snapshot, RDI creates the stream with SHOW_INITIAL_ROWS = TRUE so it can read the current
table contents before continuing with ongoing CDC. RDI automatically creates and manages the required streams.
Setup
The following checklist shows the steps to prepare a Snowflake database for RDI, with links to the sections that explain the steps in full detail. You may find it helpful to track your progress with the checklist as you complete each step.
- [ ] [Set up Snowflake permissions](#1-set-up-snowflake-permissions) - [ ] [Configure authentication](#2-configure-authentication) - [ ] [Set up secrets for Kubernetes deployment](#3-set-up-secrets-for-kubernetes-deployment) - [ ] [Configure RDI for Snowflake](#4-configure-rdi-for-snowflake)
1. Set up Snowflake permissions
The following are the minimum runtime permissions for the RDI role to read the source tables and create the Snowflake objects RDI uses for CDC:
USAGE,OPERATEon the warehouse used for RDI readsUSAGEon the source database and source schemaSELECTon the source tablesUSAGEon the CDC schema used by RDICREATE STREAM,CREATE TABLEon the CDC schema used by RDI
If you configure cdcDatabase and cdcSchema, grant the CDC permissions there. Otherwise, grant them in the source
schema. If your Snowflake setup requires it, also grant any additional cross-database privileges needed for the CDC
schema to reference the source tables.
RDI manages the Snowflake streams it uses for snapshot and CDC. The collector creates the stream in the configured CDC
schema and later issues CREATE OR REPLACE STREAM statements to keep the stream aligned with the expected offset, so
the RDI role must be able to create and own those stream objects in the CDC schema.
There is one stricter bootstrap requirement for the first stream created on a source table: if Snowflake change tracking is not already enabled on that table, only the table owner can create that initial stream. If the source tables are not owned by the RDI role, ask a Snowflake administrator or table owner to enable change tracking first:
ALTER TABLE MYDB.PUBLIC.customers SET CHANGE_TRACKING = TRUE;
ALTER TABLE MYDB.PUBLIC.orders SET CHANGE_TRACKING = TRUE;
Grant the required permissions to your RDI user:
-- Grant usage on the warehouse
GRANT USAGE, OPERATE ON WAREHOUSE COMPUTE_WH TO ROLE rdi_role;
-- Grant usage on the source database and schema
GRANT USAGE ON DATABASE MYDB TO ROLE rdi_role;
GRANT USAGE ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role;
-- Grant SELECT on tables to capture
GRANT SELECT ON TABLE MYDB.PUBLIC.customers TO ROLE rdi_role;
GRANT SELECT ON TABLE MYDB.PUBLIC.orders TO ROLE rdi_role;
-- Grant permissions on the schema RDI uses for CDC objects
GRANT USAGE ON SCHEMA MYDB.RDI_CDC TO ROLE rdi_role;
GRANT CREATE STREAM, CREATE TABLE ON SCHEMA MYDB.RDI_CDC TO ROLE rdi_role;
-- Assign the role to your RDI user
GRANT ROLE rdi_role TO USER rdi_user;
If you use centralized grant management, you can also add future grants in the CDC schema so newly created tables and streams automatically receive the desired privileges. These grants are optional and are not part of the minimum runtime permissions:
GRANT SELECT ON FUTURE TABLES IN SCHEMA MYDB.RDI_CDC TO ROLE rdi_role;
GRANT SELECT ON FUTURE STREAMS IN SCHEMA MYDB.RDI_CDC TO ROLE rdi_role;
2. Configure authentication
RDI supports two authentication methods for Snowflake. You must configure one of these methods.
Password authentication
Use standard username and password credentials. Store these securely using Kubernetes secrets (see step 3).
Private key authentication
For enhanced security, use key-pair authentication:
-
Generate a private key:
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt -
Generate the public key:
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub -
Register the public key with your Snowflake user:
ALTER USER rdi_user SET RSA_PUBLIC_KEY='<public_key_content>';
3. Set up secrets for Kubernetes deployment
Before deploying the RDI pipeline, configure the necessary secrets.
Password authentication
kubectl create secret generic source-db \
--namespace=rdi \
--from-literal=SOURCE_DB_USERNAME=your_username \
--from-literal=SOURCE_DB_PASSWORD=your_password
Private key authentication
Create a secret with the private key file:
kubectl create secret generic source-db-ssl \
--namespace=rdi \
--from-file=client.key=/path/to/rsa_key.p8
Also create the source-db secret with the username:
kubectl create secret generic source-db \
--namespace=rdi \
--from-literal=SOURCE_DB_USERNAME=your_username
4. Configure RDI for Snowflake
Use the following example configuration in your config.yaml file:
sources:
snowflake:
type: riotx
connection:
type: snowflake
url: "jdbc:snowflake://myaccount.snowflakecomputing.com/"
user: "${SOURCE_DB_USERNAME}"
password: "${SOURCE_DB_PASSWORD}" # Omit for key-pair auth
database: "MYDB"
warehouse: "COMPUTE_WH"
# role: "RDI_ROLE" # Optional: Snowflake role
# cdcDatabase: "CDC_DB" # Optional: Separate database for CDC streams
# cdcSchema: "CDC_SCHEMA" # Optional: Separate schema for CDC streams
schemas:
- PUBLIC
tables:
PUBLIC.customers: {}
PUBLIC.orders: {}
advanced:
riotx:
poll: "30s"
snapshot: "INITIAL" # Or "NEVER" to skip initial snapshot
# streamPrefix: "data:" # Optional: Redis stream prefix
# streamLimit: 100000 # Optional: Max stream length
# keyColumns: # Recommended: stable key columns
# - "id"
# clearOffset: false # Optional: Clear offset on start
targets:
target:
connection:
type: redis
host: ${TARGET_DB_HOST}
port: ${TARGET_DB_PORT}
user: ${TARGET_DB_USERNAME}
password: ${TARGET_DB_PASSWORD}
processors:
target_data_type: json
database and one or more source-level schemas. In the tables section, specify each
table as SCHEMA.table. Even when you configure only one schema, explicit SCHEMA.table names are recommended for
clarity.Snowflake connection properties
| Property | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | Must be "snowflake" |
url |
string | Yes | JDBC URL: jdbc:snowflake://<account>.snowflakecomputing.com/ |
user |
string | Yes | Snowflake user |
password |
string | No* | Snowflake password |
database |
string | Yes | Snowflake database name |
warehouse |
string | Yes | Snowflake warehouse name |
role |
string | No | Snowflake role name |
cdcDatabase |
string | No | Database for CDC streams (if different from source) |
cdcSchema |
string | No | Schema for CDC streams (if different from source) |
- Either
passwordor private key authentication is required. See Configure authentication for details.
Snowflake source properties
| Property | Type | Required | Description |
|---|---|---|---|
schemas |
array | Yes | Schema names to capture from |
tables |
object | Yes | Tables to capture, keyed as SCHEMA.table |
Advanced configuration options
Configure under sources.<name>.advanced.riotx:
| Property | Type | Default | Description |
|---|---|---|---|
poll |
string | "30s" |
Polling interval for stream changes |
snapshot |
string | "INITIAL" |
Snapshot mode: INITIAL or NEVER |
streamPrefix |
string | "data:" |
Prefix for the Redis stream written by RDI |
streamLimit |
integer | - | Maximum stream length (XTRIM MAXLEN) |
keyColumns |
array | - | Stable source columns to use as message keys |
clearOffset |
boolean | false |
Clear existing offset on start |
count |
integer | 0 |
Limit records per poll (0 = unlimited) |
For reliable update and delete handling, define keyColumns with a stable business key or surrogate key when possible.
Troubleshooting
Connection issues
Error: "Failed to connect to Snowflake"
- Verify the account URL is correct (format:
<account>.snowflakecomputing.com) - Check network connectivity to Snowflake
- Verify the warehouse is running and accessible
- Check firewall rules allow outbound HTTPS (port 443)
Error: "Authentication failed"
- For password auth: verify username and password are correct
- For key-pair auth: verify the private key matches the public key registered in Snowflake
- Ensure the user has appropriate permissions
Error: "Warehouse not found"
- Verify the warehouse name is correct
- Ensure the user has USAGE permission on the warehouse
CDC issues
No data appearing in Redis
-
Verify Snowflake Streams exist in the CDC schema:
SHOW STREAMS IN SCHEMA my_cdc_database.my_cdc_schema; -
Check the polling interval configuration
-
Verify Redis connection is working
-
Check the collector logs:
kubectl get deployments -n rdi | grep riotx-collector kubectl logs -n rdi deployment/<riotx-collector-deployment>
Stale or missing changes
- Snowflake Streams depend on Snowflake change tracking and retention settings
- If the collector was offline longer than the available retention window, changes may be lost
- Consider using
clearOffset: trueto restart from current state
Performance tuning
High Snowflake warehouse usage
- Increase
pollinterval (e.g.,"60s"or"120s") - Use a dedicated warehouse for CDC operations
- Each poll first calls Snowflake's
SYSTEM$STREAM_HAS_DATAfunction to check whether the stream has new data. This check does not start the warehouse; warehouse compute starts only when RDI reads rows from the stream.
Redis memory concerns
- Set
streamLimitto cap stream length - Use
countto limit records per poll batch
Initial snapshot too slow
- Use
snapshot: "NEVER"to skip initial snapshot - Pre-load data using other methods if needed
Enable debug logging
Enable debug logging in the source configuration:
sources:
snowflake:
type: riotx
logging:
level: debug
# ... rest of configuration
View collector logs:
kubectl get deployments -n rdi | grep riotx-collector
kubectl logs -n rdi deployment/<riotx-collector-deployment> -f
5. Configuration is complete
Once you have followed the steps above, your Snowflake database is ready for RDI to use.