{
  "id": "classic-vs-flink",
  "title": "Differences between the classic and Flink processors",
  "url": "https://redis.io/docs/latest/integrate/redis-data-integration/architecture/classic-vs-flink/",
  "summary": "Compare the classic and Flink stream processor implementations.",
  "tags": [
    "docs",
    "integrate",
    "rs",
    "rdi"
  ],
  "last_updated": "2026-05-12T09:07:59-04:00",
  "page_type": "content",
  "content_hash": "c869d106968e931bf873e7cf82ec4442674c26243bdde281aad2427e15156d55",
  "sections": [
    {
      "id": "overview",
      "title": "Overview",
      "role": "overview",
      "text": "RDI ships with two stream processor implementations. Both consume the same\nsource streams, share the same job-level configuration model, and write to\nthe same Redis target, but they differ in architecture, supported features,\nconfiguration, observability, error handling, and performance.\n\nThis page summarizes those differences. See\n[Which processor should I use?](https://redis.io/docs/latest/integrate/redis-data-integration/faq#which-processor-should-i-use)\nin the FAQ for the recommendation, and\n[Migrate from the classic processor to the Flink processor](https://redis.io/docs/latest/integrate/redis-data-integration/installation/migration-classic-to-flink)\nfor a step-by-step migration guide."
    },
    {
      "id": "at-a-glance",
      "title": "At a glance",
      "role": "content",
      "text": "| Aspect | Classic processor | Flink processor |\n|---|---|---|\n| Implementation | Python | Java on top of [Apache Flink](https://flink.apache.org/) |\n| Deployment targets | VM and Kubernetes | Kubernetes only |\n| Scaling | Single replica | Horizontal: TaskManager replicas × task slots per TaskManager |\n| Fault tolerance | Source-stream consumer-group replay | Source-stream consumer-group replay plus Flink checkpointing |\n| Supported `data_type` outputs | `hash`, `json`, `set`, `sorted_set`, `stream`, `string` | `hash`, `json` |\n| Metrics endpoint | `rdi-metrics-exporter` service | Flink JobManager `/metrics` (no metrics exporter) |\n| Metric naming | `rdi_*` (e.g., `rdi_incoming_entries`) | `flink_*` (e.g., `flink_jobmanager_job_operator_coordinator_stream_type_rdiRecords`) |\n| End-to-end latency | Bounded by the per-batch read-process-write cycle | Records flow through pipelined operator chains without a per-batch barrier |\n| Snapshot throughput | Limited by single shared reader and writer | Parallelized across all task slots |\n| Expression and `redis.lookup` result caching | Not supported | Optional, opt-in per transformation |"
    },
    {
      "id": "architecture-and-deployment",
      "title": "Architecture and deployment",
      "role": "content",
      "text": "The classic processor runs as a single pod managed by the operator\nand can be deployed on either VMs or Kubernetes through the RDI Helm\nchart.\n\nThe Flink processor runs as an Apache Flink application cluster: one\nJobManager pod plus one or more TaskManager pods. Source,\ntransformation, and sink operators run as parallel subtasks across\nall task slots in the cluster. The Flink processor scales\nhorizontally by changing the number of TaskManager replicas\n(`advanced.resources.taskManager.replicas`); with adaptive\nparallelism, the default parallelism is the product of TaskManager\nreplicas and task slots per TaskManager. The Flink processor\ncurrently runs on Kubernetes only; VM support is planned for a future\nrelease.\n\nBoth processors retain at-least-once delivery semantics; the Flink\nprocessor adds Flink checkpointing on top of the shared\nconsumer-group replay mechanism.\n\nSee\n[Configure the Flink processor](https://redis.io/docs/latest/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor)\nfor the Helm settings."
    },
    {
      "id": "configuration",
      "title": "Configuration",
      "role": "configuration",
      "text": "The two processors share the same `config.yaml` envelope and the same\n`connections`, `sources`, `targets`, and `jobs` sections. The only\ndifferences are inside the `processors:` block, which is selected via\n`processors.type` (`classic` or `flink`, default `classic`). Properties\nthat apply to only one implementation are annotated with\n**Classic processor only.** or **Flink processor only.** in the\n[pipeline configuration reference](https://redis.io/docs/latest/integrate/redis-data-integration/data-pipelines/pipeline-config#processors),\nand are silently ignored by the other implementation. The Flink\nprocessor exposes additional fine-grained tuning under\n`processors.advanced.*`."
    },
    {
      "id": "supported-output-formats",
      "title": "Supported output formats",
      "role": "compatibility",
      "text": "The classic processor supports all `data_type` values: `hash`, `json`,\n`set`, `sorted_set`, `stream`, and `string`. The Flink processor\ncurrently supports only `hash` and `json`. Pipelines that use any other\noutput type must remain on the classic processor or rewrite the\naffected jobs. Support for the remaining output types is planned for a\nfuture release."
    },
    {
      "id": "transformation-extensions",
      "title": "Transformation extensions",
      "role": "content",
      "text": "The two processors support the same set of transformation blocks\n(`filter`, `map`, `add_field`, `remove_field`, `rename_field`,\n`redis.lookup`) and the same expression languages (JMESPath and SQL).\nPipelines written for one processor generally execute on the other\nwithout changes.\n\nThe Flink processor adds three optional, performance-oriented\nextensions that are not available with the classic processor:\n\n-   **Expression result caching** through a per-expression `cache:`\n    block on `filter`, `map`, `add_field`, and `redis.lookup` arguments.\n-   **`redis.lookup` result caching** through a `lookup_cache:` block.\n-   **`redis.lookup` batching**, which groups lookups into a single\n    Redis pipeline. Batching is enabled by default with sensible\n    defaults; the optional `batch:` block lets you override them.\n\nSee\n[Caching expression results](https://redis.io/docs/latest/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results)\nfor examples and\n[`redis.lookup`](https://redis.io/docs/latest/integrate/redis-data-integration/reference/data-transformation/lookup)\nfor the full property list."
    },
    {
      "id": "metrics",
      "title": "Metrics",
      "role": "content",
      "text": "The two processors expose different Prometheus metric sets and use\ndifferent naming schemes, so dashboards and alerts cannot be reused\nas-is between them. The classic processor exposes its metrics through\nthe `rdi-metrics-exporter` service. The Flink processor emits metrics\ndirectly from the JobManager and TaskManager pods through Flink's\nnative Prometheus reporter; no metrics exporter is deployed.\n\nSee\n[Observability — Flink processor metrics](https://redis.io/docs/latest/integrate/redis-data-integration/observability#flink-processor-metrics)\nfor the customer-facing list of metrics."
    },
    {
      "id": "error-handling-and-dlq",
      "title": "Error handling and DLQ",
      "role": "errors",
      "text": "Both processors implement a dead-letter queue (DLQ) at\n`dlq:{stream_name}` and honor the same top-level `error_handling`\n(`dlq` or `ignore`) and `dlq_max_messages` properties. The Flink\nprocessor surfaces a few corner cases as DLQ entries that the classic\nprocessor logs and skips (for example, missing parent\nkeys in nested writes and exceptions thrown by `when` expressions on\n`redis.lookup`). The DLQ entry field set and value encoding also\ndiffer: the classic processor uses Python-stringified values,\nwhile the Flink processor uses JSON."
    },
    {
      "id": "performance",
      "title": "Performance",
      "role": "performance",
      "text": "The Flink processor delivers significantly higher throughput during\nthe initial snapshot and lower end-to-end latency in steady state.\nThe classic processor uses a sequential read-process-write batching\ncycle, so each record waits for its batch to complete before being\nwritten to the target. The Flink processor pipelines records through\noperator chains without a per-batch barrier, and parallelizes work\nacross all task slots, which both lowers per-record latency and\nraises throughput.\n\nThe Flink processor has a larger baseline memory footprint (JVM plus\nFlink runtime overhead per TaskManager) but, for most pipelines, the\nperformance gains and the additional features (horizontal scaling, caching)\noutweigh that cost."
    }
  ],
  "examples": []
}
