Metrics stream engine for monitoring v2

The new metrics stream engine for monitoring Redis Enterprise Software.

Redis Enterprise Software Redis Enterprise for Kubernetes

The new metrics stream engine is generally available as of Redis Enterprise Software version 8.0.

The new metrics stream engine:

  • Exposes the v2 Prometheus scraping endpoint at https://<IP>:8070/v2.

  • Exports all time-series metrics to external monitoring tools such as Grafana, DataDog, NewRelic, and Dynatrace using Prometheus.

  • Enables real-time monitoring, including full monitoring during maintenance operations, which provides full visibility into performance during events such as shards' failovers and scaling operations.

Integrate with external monitoring tools

To integrate Redis Enterprise metrics into your monitoring environment, see the integration guides for Prometheus and Grafana.

Filter Libraries and tools by "observability" for additional tools and guides.

Prometheus metrics v2

For a list of all available v2 metrics, see Prometheus metrics v2.

The v2 scraping endpoint also exposes metrics for node_exporter version 1.8.1. For more information, see the Prometheus node_exporter GitHub repository.

Transition from Prometheus v1 to Prometheus v2

If you are already using the existing scraping endpoint for integration, do the following to transition from v1 metrics to v2 metrics:

  1. Change the metrics_path in your Prometheus configuration file from / to /v2 to use the new scraping endpoint.

    Here's an example of the updated scraping configuration in prometheus.yml:

    scrape_configs:
      # Scrape Redis Enterprise
      - job_name: redis-enterprise
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /v2
        scheme: https
        tls_config:
          insecure_skip_verify: true
        static_configs:
          - targets: ["<cluster_name>:8070"]
    
  2. Use the metrics tables in this guide to transition from v1 metrics to equivalent v2 PromQL.

It is possible to scrape both existing and new endpoints simultaneously, allowing advanced dashboard preparation and a smooth transition.

Best practices for monitoring

Follow these best practices when monitoring your Redis Enterprise Software cluster using the metrics stream engine.

Monitor host-level metrics

For cluster health, resources, and node stability, monitor these metrics:

Group Metric Why monitor Unit
CPU utilization node_cpu_user,
node_cpu_system
Detect CPU saturation from Redis or the OS that results in higher latency and queueing. Seconds (counter)
Memory (freeable) node_memory_MemTotal_bytes,
node_memory_MemFree_bytes,
node_memory_Buffers_bytes,
node_memory_Cached_bytes
Detect memory pressure early. Low free memory or cache can precede swapping or out-of-memory errors. Bytes (gauge)
Swap usage node_ephemeral_storage_free Monitor memory and disk pressure in your setup. Sustained pressure leads to latency spikes. Bytes (gauge)
Network traffic node_ingress_bytes,
node_egress_bytes
Ensure the network interface is not saturated. Protects replication and client responsiveness. Bytes (counter)
Disk space node_filesystem_avail_bytes,
node_filesystem_size_bytes
Prevent persistence and logging outages from low disk space. Bytes (gauge)
Cluster state has_quorum{…} Monitor whether quorum is maintained (1) or lost (0). Boolean
node_metrics_up Monitor whether the node is connected and reporting to the cluster. Gauge
Licensing license_shards_limit Track shard capacity limits by type (RAM or flash). Count
Certificates node_cert_expires_in_seconds Avoid downtime from expired node certificates. Seconds (gauge)
Services – CPU namedprocess_namegroup_cpu_seconds_total Identify abnormal CPU usage by platform services that can starve Redis, such as alert_mgr, redis_mgr, dmc_proxy. Seconds (counter)
Services – memory namedprocess_namegroup_memory_bytes Detect memory leaks or outliers in platform services, such as alert_mgr, redis_mgr, dmc_proxy. Bytes (gauge)

Monitor database-level metrics

For database performance, availability, and efficiency, monitor the following metrics:

Group Metric Why monitor Unit
Memory redis_server_used_memory Track actual data memory to prevent out-of-memory errors and evictions. Bytes
Memory allocator_allocate Monitor bytes allocated by allocator (includes internal fragmentation). Bytes
Memory allocator_active Monitor bytes in active pages (includes external fragmentation). Use delta/ratio versus allocated to infer defraggable memory. Bytes
Memory active_defrag_running Monitor if defragmentation is active and the intended CPU %. High values can affect performance. % (gauge)
Latency endpoint_read_requests_latency_histogram,
endpoint_write_requests_latency_histogram,
endpoint_other_requests_latency_histogram
Monitor server-side command latency. Microseconds
High availability redis_server_master_repl_offset Compute replica throughput and lag using deltas over time. Bytes (counter)
High availability redis_server_master_link_status Monitor replica link status (up or down) for early warning of high availability risk. Status
Active-Active database_syncer_dst_lag,
database_syncer_lag_ms
Detect cross-region synchronization delays that impact consistency and SLAs. Milliseconds (gauge)
Active-Active database_syncer_state Monitor operational state for troubleshooting synchronization issues. Gauge
Traffic – requests endpoint_read_requests,
endpoint_write_requests,
endpoint_other_requests
Monitor workload mix and spikes that drive capacity and latency. Total equals the sum of all three. Counter
Traffic – responses endpoint_read_responses,
endpoint_write_responses,
endpoint_other_responses
Validate service responsiveness and symmetry with requests. Counter
Traffic – bytes endpoint_ingress,
endpoint_egress
Monitor size trends and watch for sudden growth that impacts egress costs or bandwidth. Bytes (counter)
Egress queue endpoint_egress_pending,
endpoint_egress_pending_discarded
Monitor back-pressure and drops that indicate network or client issues. Bytes (counter)
Connections endpoint_client_connection Monitor accepted connections over time and match against client rollouts or spikes. Counter
Connections endpoint_client_connection_expired Monitor connections closed due to TTL expiry, which can indicate idle policy or client issues. Counter
Connections endpoint_longest_pipeline_histogram Monitor long pipelines that can amplify latency bursts and detect misbehaving clients. Histogram (count)
Connections endpoint_client_connections,
endpoint_client_disconnections,
endpoint_proxy_disconnections
Monitor connection churn and identify who closed the socket (client versus proxy). Current connections ≈ connections − disconnections. Counter
Cache efficiency total_keys,
total_volatile_keys
Monitor key inventory and TTL coverage to inform eviction strategy. Counter
Cache efficiency total_evicted_keys,
total_expired_keys
Monitor eviction and expiry rates. Frequent evictions indicate memory pressure or poor sizing. Counter
Cache efficiency cache_hits,
cache_hit_rate
Monitor hit rate, which drives read latency and cost. Cache hit rate equals cache_hits/(cache_hits+cache_misses). Count / Ratio (%)
Cache efficiency endpoint_client_tracking_on_requests,
endpoint_client_tracking_off_requests,
endpoint_disposed_commands_after_client_caching
Track client-side caching usage and misuse. Counter
Big / complex keys redis_server_<data_type>_<size_or_items>_<bucket> Monitor oversized keys and cardinality that cause fragmentation, slow replication, and CPU spikes. Track to prevent incidents. Examples:
strings_sizes_over_512M,
zsets_items_over_8M
Gauge
Security – clients endpoint_client_expiration_refresh,
endpoint_client_establishment_failures
Monitor unstable clients or problems with authentication or setup. Counter
Security – LDAP endpoint_successful_ldap_authentication,
endpoint_failed_ldap_authentication,
endpoint_disconnected_ldap_client
Monitor authentication health and detect brute-force attacks or misconfigurations. Counter
Security – cert-based endpoint_successful_cba_authentication,
endpoint_failed_cba_authentication,
endpoint_disconnected_cba_client
Monitor certificate authentication status and failures. Counter
Security – password endpoint_disconnected_user_password_client Monitor password-authentication client disconnects and correlate with policy changes. Counter
Security – ACL acl_access_denied_auth,
acl_access_denied_cmd,
acl_access_denied_key,
acl_access_denied_channel
Monitor unauthorized access attempts and incorrectly scoped ACLs. Counter
RATE THIS PAGE
Back to top ↑