Metrics stream engine for monitoring v2
The new metrics stream engine for monitoring Redis Enterprise Software.
| Redis Enterprise Software | Redis Enterprise for Kubernetes | 
|---|
The new metrics stream engine is generally available as of Redis Enterprise Software version 8.0.
The new metrics stream engine:
- 
Exposes the v2 Prometheus scraping endpoint at https://<IP>:8070/v2.
- 
Exports all time-series metrics to external monitoring tools such as Grafana, DataDog, NewRelic, and Dynatrace using Prometheus. 
- 
Enables real-time monitoring, including full monitoring during maintenance operations, which provides full visibility into performance during events such as shards' failovers and scaling operations. 
Integrate with external monitoring tools
To integrate Redis Enterprise metrics into your monitoring environment, see the integration guides for Prometheus and Grafana.
Filter Libraries and tools by "observability" for additional tools and guides.
Prometheus metrics v2
For a list of all available v2 metrics, see Prometheus metrics v2.
The v2 scraping endpoint also exposes metrics for node_exporter version 1.8.1. For more information, see the Prometheus node_exporter GitHub repository.
Transition from Prometheus v1 to Prometheus v2
If you are already using the existing scraping endpoint for integration, do the following to transition from v1 metrics to v2 metrics:
- 
Change the metrics_pathin your Prometheus configuration file from/to/v2to use the new scraping endpoint.Here's an example of the updated scraping configuration in prometheus.yml:scrape_configs: # Scrape Redis Enterprise - job_name: redis-enterprise scrape_interval: 30s scrape_timeout: 30s metrics_path: /v2 scheme: https tls_config: insecure_skip_verify: true static_configs: - targets: ["<cluster_name>:8070"]
- 
Use the metrics tables in this guide to transition from v1 metrics to equivalent v2 PromQL. 
It is possible to scrape both existing and new endpoints simultaneously, allowing advanced dashboard preparation and a smooth transition.
Best practices for monitoring
Follow these best practices when monitoring your Redis Enterprise Software cluster using the metrics stream engine.
Monitor host-level metrics
For cluster health, resources, and node stability, monitor these metrics:
| Group | Metric | Why monitor | Unit | 
|---|---|---|---|
| CPU utilization | node_cpu_user,node_cpu_system | Detect CPU saturation from Redis or the OS that results in higher latency and queueing. | Seconds (counter) | 
| Memory (freeable) | node_memory_MemTotal_bytes,node_memory_MemFree_bytes,node_memory_Buffers_bytes,node_memory_Cached_bytes | Detect memory pressure early. Low free memory or cache can precede swapping or out-of-memory errors. | Bytes (gauge) | 
| Swap usage | node_ephemeral_storage_free | Monitor memory and disk pressure in your setup. Sustained pressure leads to latency spikes. | Bytes (gauge) | 
| Network traffic | node_ingress_bytes,node_egress_bytes | Ensure the network interface is not saturated. Protects replication and client responsiveness. | Bytes (counter) | 
| Disk space | node_filesystem_avail_bytes,node_filesystem_size_bytes | Prevent persistence and logging outages from low disk space. | Bytes (gauge) | 
| Cluster state | has_quorum{…} | Monitor whether quorum is maintained (1) or lost (0). | Boolean | 
| node_metrics_up | Monitor whether the node is connected and reporting to the cluster. | Gauge | |
| Licensing | license_shards_limit | Track shard capacity limits by type (RAM or flash). | Count | 
| Certificates | node_cert_expires_in_seconds | Avoid downtime from expired node certificates. | Seconds (gauge) | 
| Services – CPU | namedprocess_namegroup_cpu_seconds_total | Identify abnormal CPU usage by platform services that can starve Redis, such as alert_mgr,redis_mgr,dmc_proxy. | Seconds (counter) | 
| Services – memory | namedprocess_namegroup_memory_bytes | Detect memory leaks or outliers in platform services, such as alert_mgr,redis_mgr,dmc_proxy. | Bytes (gauge) | 
Monitor database-level metrics
For database performance, availability, and efficiency, monitor the following metrics:
| Group | Metric | Why monitor | Unit | 
|---|---|---|---|
| Memory | redis_server_used_memory | Track actual data memory to prevent out-of-memory errors and evictions. | Bytes | 
| Memory | allocator_allocate | Monitor bytes allocated by allocator (includes internal fragmentation). | Bytes | 
| Memory | allocator_active | Monitor bytes in active pages (includes external fragmentation). Use delta/ratio versus allocated to infer defraggable memory. | Bytes | 
| Memory | active_defrag_running | Monitor if defragmentation is active and the intended CPU %. High values can affect performance. | % (gauge) | 
| Latency | endpoint_read_requests_latency_histogram,endpoint_write_requests_latency_histogram,endpoint_other_requests_latency_histogram | Monitor server-side command latency. | Microseconds | 
| High availability | redis_server_master_repl_offset | Compute replica throughput and lag using deltas over time. | Bytes (counter) | 
| High availability | redis_server_master_link_status | Monitor replica link status (up or down) for early warning of high availability risk. | Status | 
| Active-Active | database_syncer_dst_lag,database_syncer_lag_ms | Detect cross-region synchronization delays that impact consistency and SLAs. | Milliseconds (gauge) | 
| Active-Active | database_syncer_state | Monitor operational state for troubleshooting synchronization issues. | Gauge | 
| Traffic – requests | endpoint_read_requests,endpoint_write_requests,endpoint_other_requests | Monitor workload mix and spikes that drive capacity and latency. Total equals the sum of all three. | Counter | 
| Traffic – responses | endpoint_read_responses,endpoint_write_responses,endpoint_other_responses | Validate service responsiveness and symmetry with requests. | Counter | 
| Traffic – bytes | endpoint_ingress,endpoint_egress | Monitor size trends and watch for sudden growth that impacts egress costs or bandwidth. | Bytes (counter) | 
| Egress queue | endpoint_egress_pending,endpoint_egress_pending_discarded | Monitor back-pressure and drops that indicate network or client issues. | Bytes (counter) | 
| Connections | endpoint_client_connection | Monitor accepted connections over time and match against client rollouts or spikes. | Counter | 
| Connections | endpoint_client_connection_expired | Monitor connections closed due to TTL expiry, which can indicate idle policy or client issues. | Counter | 
| Connections | endpoint_longest_pipeline_histogram | Monitor long pipelines that can amplify latency bursts and detect misbehaving clients. | Histogram (count) | 
| Connections | endpoint_client_connections,endpoint_client_disconnections,endpoint_proxy_disconnections | Monitor connection churn and identify who closed the socket (client versus proxy). Current connections ≈ connections − disconnections. | Counter | 
| Cache efficiency | total_keys,total_volatile_keys | Monitor key inventory and TTL coverage to inform eviction strategy. | Counter | 
| Cache efficiency | total_evicted_keys,total_expired_keys | Monitor eviction and expiry rates. Frequent evictions indicate memory pressure or poor sizing. | Counter | 
| Cache efficiency | cache_hits,cache_hit_rate | Monitor hit rate, which drives read latency and cost. Cache hit rate equals cache_hits/(cache_hits+cache_misses). | Count / Ratio (%) | 
| Cache efficiency | endpoint_client_tracking_on_requests,endpoint_client_tracking_off_requests,endpoint_disposed_commands_after_client_caching | Track client-side caching usage and misuse. | Counter | 
| Big / complex keys | redis_server_<data_type>_<size_or_items>_<bucket> | Monitor oversized keys and cardinality that cause fragmentation, slow replication, and CPU spikes. Track to prevent incidents. Examples: strings_sizes_over_512M,zsets_items_over_8M | Gauge | 
| Security – clients | endpoint_client_expiration_refresh,endpoint_client_establishment_failures | Monitor unstable clients or problems with authentication or setup. | Counter | 
| Security – LDAP | endpoint_successful_ldap_authentication,endpoint_failed_ldap_authentication,endpoint_disconnected_ldap_client | Monitor authentication health and detect brute-force attacks or misconfigurations. | Counter | 
| Security – cert-based | endpoint_successful_cba_authentication,endpoint_failed_cba_authentication,endpoint_disconnected_cba_client | Monitor certificate authentication status and failures. | Counter | 
| Security – password | endpoint_disconnected_user_password_client | Monitor password-authentication client disconnects and correlate with policy changes. | Counter | 
| Security – ACL | acl_access_denied_auth,acl_access_denied_cmd,acl_access_denied_key,acl_access_denied_channel | Monitor unauthorized access attempts and incorrectly scoped ACLs. | Counter |