Redis troubleshooting pocket guide

Symptoms

Latency issues, other problems or just as health-check

Changes

Configuration changes to the software or to the system, changes in the workload or dataset size may provoke latency.

Identify issues on Redis hosts

Check that disk space is not excessively consumed using "df -h". Check if the capacity of the log directory did not increase using “du -sh /var/opt/redislabs/log/” and proceed to check other possible causes
Check that RAM memory or CPU are not excessively consumed. It is recommended that RAM and CPU utilization does not cross 80%. The host resources must be exclusively available for Redis software
Verify swap memory is not utilized or not configured using "free"
It is recommended to have the host clock in sync with a time server. Verify using timedatectl or "ntpq -p" or "chronyc sources"
Check the output of "env", remove https_proxy/http_proxy variable if it exists: "unset https_proxy"
review system logs including the syslog or journal for any error messages, warnings, or critical events

Identify potential issues caused by security hardening

Temporarily disable any security/hardening and check if the problem is relieved. Examples: selinux, cylance, McAfee, dynatrace, ...
Linux user "redislabs" must have read/write access to /tmp folder. Verify using "su - redislabs -s /bin/bash -c 'touch /tmp/test'"
Non-permissive umask can cause issues. If umask differs from the default 022, it might prevent normal operation. Consult your sysadmin and revert to the default umask

Identify Redis cluster issues

Execute “supervisorctl status" and verify all processes are in a RUNNING state.
Execute "rlcheck" and verify no errors appear
Execute "rladmin status issue_only" and verify no issues appear
Execute "rladmin status shards" and verify that the used memory of shards participating in the same database is balanced and that each shard does not exceed 25GB
Execute "rladmin cluster running_actions" and verify no tasks appear

Troubleshooting connectivity

Check if the Redis endpoint can be resolved on the client machine "dig <endpoint>". If the resolution fails, proceed to check if the Redis endpoint can be resolved on one of the cluster nodes "dig @localhost <endpoint>". If the resolution succeeds, the problem is with the organizational DNS.
To identify any issue with the client app, check connectivity from the client machine to the database using redis-cli: "redis-cli -h <endpoint> -p <port> -a <password> info" or "redis-cli -h <endpoint> -p <port> -a <password> --tls --insecure --cert --key ping" If that fails check connectivity to the database using redis-cli from one of the cluster nodes If that fails, the issue is with the network. Consult your sysadmin.
Verify the client uses the db name and not ip
Verify the the database is configured with eviction policy and key expiration to avoid OOM
Verify that access to the database is not blocked by a firewall on the client side or the Redis side iptables -L, ufw status, firewall-cmd –list-all
Additional details can be found in the related document about testing client connections.

Troubleshooting latency

Server-side

Ensure that the memory used in the database does not reach the configured database max memory limit. More details can be found in the document about database memory limits.
Try to correlate the latency time with any surge in the following metrics.
- number of connections
- used memory
- evicted keys, expired keys
Check the output of "slowlog get <number of entries to display>" for slow commands such as KEYS or HGETALL Use alternative commands: SCAN, SSCAN, HSCAN, ZSCAN
Keys with large memory footprints can cause latency. To identify these keys, one can compare the key name that appear in the output of “slowlog get” with the big key reported by the following commands: redis-cli -h <endpoint> -p <port> -a <password> --memkeys redis-cli -h <endpoint> -p <port> -a <password> --bigkeys
Additional diagnostics steps can be found in the following links: https://redis.io/docs/latest/operate/oss_and_stack/management/optimization/latency/ https://redis.io/docs/latest/operate/rs/clusters/logging/redis-slow-log/

Client-side

check there is no memory/CPU pressure on the client host
check the client does not frequently open and close connections and instead uses a connection pool
check the client does not erroneously open multiple connections that can pressure the client or server

Products

Tools

Key Features

See how it works

Get Redis

Use cases

Industries

Customer case studies

Expert services

About

Learn

Connect

Vector search

Products

Tools

Key Features

See how it works

Get Redis

Use cases

Industries

Customer case studies

Expert services

About

Learn

Connect

Vector search

Redis troubleshooting pocket guide

Symptoms

Changes

Identify issues on Redis hosts

Identify potential issues caused by security hardening

Identify Redis cluster issues

Troubleshooting connectivity

Troubleshooting latency

Server-side

Client-side