一、Operator概述

1.1 为什么选择Operator模式来部署Redis集群

在 Kubernetes 生态中,Operator 模式通过引入自定义资源 (CRD) 和自定义控制器,将特定应用的运维知识编码到软件中,从而实现复杂有状态应用的自动化管理。相较于 Helm Chart(通常侧重于应用的初始部署和配置),Operator 提供了更深层次的生命周期管理能力,包括自动扩缩容、版本升级、故障恢复、备份等。

对于 Redis Cluster 这样的分布式有状态服务,Operator 能够:

  1. 简化部署与管理: 通过一个简单的 YAML (Custom Resource) 即可定义和部署整个集群。
  2. 自动化运维: Operator 会持续监控集群状态,自动处理节点故障、数据同步等问题。
  3. 高可用性: 确保 Redis 服务在节点故障时能自动切换和恢复,保障业务连续性。
  4. 声明式配置: 您只需声明期望的状态,Operator 负责将其变为现实。

Operator模式架构图

在这里插入图片描述

1.2 Redis Operator概述

​ 本次使用的是Opstree公司开发的 Redis Operator 。这是一款用 Golang 编写的 Redis 运维控制器(Operator),用于在 Kubernetes 上部署和管理 Redis 的 单机模式集群模式,支持云端部署和裸机部署。它能根据最佳实践创建 Redis 集群,同时内置了 Redis Exporter,实现了对 Redis 的监控能力。

架构图如下

在这里插入图片描述

该 Redis Operator 支持以下功能:

  • ✅ 支持 Redis 的集群模式和单节点模式部署
  • ✅ 内置 Prometheus 监控导出器(redis-exporter)
  • ✅ 动态存储资源分配(基于 PVC 模板)
  • ✅ 资源请求与限制设置(CPU、内存等)
  • ✅ 支持设置密码或无密码的 Redis 实例
  • ✅ 支持节点调度策略(nodeSelector)与亲和性(affinity)配置
  • ✅ 支持设置优先级类(Priority Class)以控制 Pod 优先级
  • ✅ 支持使用 SecurityContext 配置内核参数和权限管理

二、部署Redis Operator

# 添加 Opstree Helm 仓库
helm repo add ot-helm https://ot-container-kit.github.io/helm-charts/

# 更新本地 Helm 仓库索引
helm repo update

# 安装 Redis Operator 到 'ot-operator' 命名空间 (如果不存在则创建)
$ helm install redis-operator ot-helm/redis-operator -n ot-operator --create-namespace
NAME: redis-operator01
LAST DEPLOYED: Tue Jul 22 11:47:11 2025
NAMESPACE: redis-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
...

安装完成后,验证 Operator Pod 是否正常运行:

root@k8s-master01:~# kubectl get pod -n ot-operator
NAME                                  READY   STATUS    RESTARTS   AGE
pod/redis-operator-57f979468d-qkn8p   1/1     Running   0          43s

三、使用

mysql Operator可以创建下列四种资源

  • Redis
  • Redis Cluster
  • Redis Replication
  • Redis Sentinel

3.1 单节点Redis

Redis 单节点是一个基于单进程的 Redis Pod

在这里插入图片描述

3.1.1 单节点Redis部署

使用helm安装

$  helm install redis ot-helm/redis --namespace redis-server --create-namespace
NAME: redis
LAST DEPLOYED: Tue Jul 22 14:36:22 2025
NAMESPACE: redis-server
STATUS: deployed
REVISION: 1
TEST SUITE: None

通过 kubectl 命令行验证独立 redis 设置。

root@k8s-master01:~/redis# kubectl get all -n redis-server
NAME          READY   STATUS    RESTARTS   AGE
pod/redis-0   1/1     Running   0          6s

NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/redis              ClusterIP   10.244.22.26   <none>        6379/TCP   6s
service/redis-additional   ClusterIP   10.244.9.163   <none>        6379/TCP   6s
service/redis-headless     ClusterIP   None           <none>        6379/TCP   6s

NAME                     READY   AGE
statefulset.apps/redis   1/1     6s

YAML安装如下

编辑配置文件 standalone.yaml
root@k8s-master01:~/redis# cat standalone.yaml
---
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: Redis
metadata:
  name: redis-standalone
spec:
  kubernetesConfig:
    image: quay.io/opstree/redis:v7.0.15
    imagePullPolicy: IfNotPresent
  storage:
    volumeClaimTemplate:
      spec:
        # storageClassName: standard
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi
        storageClassName: nfs-client
  securityContext:
    runAsUser: 1000
    fsGroup: 1000

部署

 kubectl apply -f standalone.yaml

查看部署详情

root@k8s-master01:~/redis# kubectl get all -n redis-server
NAME          READY   STATUS    RESTARTS   AGE
pod/redis-0   1/1     Running   0          6s

NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/redis              ClusterIP   10.244.22.26   <none>        6379/TCP   6s
service/redis-additional   ClusterIP   10.244.9.163   <none>        6379/TCP   6s
service/redis-headless     ClusterIP   None           <none>        6379/TCP   6s

NAME                     READY   AGE
statefulset.apps/redis   1/1     6s
3.1.2 配置参数清单

Redis的Helm配置清单如下

Key Type Default Description
TLS.ca string "ca.key"
TLS.cert string "tls.crt"
TLS.key string "tls.key"
TLS.secret.secretName string ""
acl.secret.secretName string ""
affinity object {}
env list []
externalConfig.data string "tcp-keepalive 400\nslowlog-max-len 158\nstream-node-max-bytes 2048\n"
externalConfig.enabled bool false
externalService.enabled bool false
externalService.port int 6379
externalService.serviceType string "NodePort"
initContainer.args list []
initContainer.command list []
initContainer.enabled bool false
initContainer.env list []
initContainer.image string ""
initContainer.imagePullPolicy string "IfNotPresent"
initContainer.resources object {}
labels object {}
nodeSelector object {}
podSecurityContext.fsGroup int 1000
podSecurityContext.runAsUser int 1000
priorityClassName string ""
redisExporter.enabled bool false
redisExporter.env list []
redisExporter.image string "quay.io/opstree/redis-exporter"
redisExporter.imagePullPolicy string "IfNotPresent"
redisExporter.resources object {}
redisExporter.tag string "v1.44.0"
redisStandalone.ignoreAnnotations list []
redisStandalone.image string "quay.io/opstree/redis"
redisStandalone.imagePullPolicy string "IfNotPresent"
redisStandalone.imagePullSecrets list []
redisStandalone.minReadySeconds int 0
redisStandalone.name string ""
redisStandalone.recreateStatefulSetOnUpdateInvalid bool false statefulset的某些字段是不可变的,例如volumeClaimTemplates。当设置为true时,Operator将删除statefulset并重新创建。默认值为false。
redisStandalone.redisSecret.secretKey string ""
redisStandalone.redisSecret.secretName string ""
redisStandalone.resources object {}
redisStandalone.serviceType string "ClusterIP"
redisStandalone.tag string "v7.0.15"
securityContext object {}
serviceAccountName string ""
serviceMonitor.enabled bool false
serviceMonitor.interval string "30s"
serviceMonitor.namespace string "monitoring"
serviceMonitor.scrapeTimeout string "10s"
sidecars.env list []
sidecars.image string ""
sidecars.imagePullPolicy string "IfNotPresent"
sidecars.name string ""
sidecars.resources.limits.cpu string "100m"
sidecars.resources.limits.memory string "128Mi"
sidecars.resources.requests.cpu string "50m"
sidecars.resources.requests.memory string "64Mi"
storageSpec.volumeClaimTemplate.spec.accessModes[0] string "ReadWriteOnce"
storageSpec.volumeClaimTemplate.spec.resources.requests.storage string "1Gi"
tolerations list []

3.2 Redis Cluster

Redis 集群本质上是一种数据分片策略,它会自动将数据分布到多个 Redis 节点上。这是 Redis 的一项高级功能,能够实现分布式存储并避免单点故障。

当任何一个 Redis 节点发生故障时,从节点(follower pod)会自动晋升为主节点(leader);而当原故障节点恢复在线后,它会重新以从节点的身份运行。

  • 若构建仅包含主节点的 Redis 分片集群,至少需要 3 个节点。
  • 若同时包含从节点,则至少需要 6 个 Redis Pod / 进程(通常为 3 主 3 从的配置)。

在这里插入图片描述

3.2.1 Redis Cluster部署

使用Helm安装

$ helm install redis-cluster ot-helm/redis-cluster \
  --set redisCluster.clusterSize=3 --namespace ot-operators
...
Release "redis-cluster" does not exist. Installing it now.
NAME:          redis-cluster
LAST DEPLOYED: Sun May  2 16:11:38 2021
NAMESPACE:     ot-operators
STATUS:        deployed
REVISION:      1
TEST SUITE:    None

验证安装Pod详情

oot@k8s-master01:~/redis# kubectl get pod -n redis-server
NAME                       READY   STATUS    RESTARTS   AGE
redis-cluster-follower-0   1/1     Running   0          37s
redis-cluster-follower-1   1/1     Running   0          34s
redis-cluster-follower-2   1/1     Running   0          31s
redis-cluster-leader-0     1/1     Running   0          82s
redis-cluster-leader-1     1/1     Running   0          61s
redis-cluster-leader-2     1/1     Running   0          58s

通过redis-cli查看集群状态

oot@k8s-master01:~/redis# kubectl exec -it redis-cluster-leader-0 -n redis-server -- redis-cli -a Opstree@1234 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
6e7492c5979f60305003a159e6dbcdfec0a74926 10.244.96.156:6379@16379,redis-cluster-follower-0 slave e988ec0596613ded3f6ac8f241851596f4414814 0 1753167523762 1 connected
e988ec0596613ded3f6ac8f241851596f4414814 10.244.96.155:6379@16379,redis-cluster-leader-0 myself,master - 0 1753167524000 1 connected 0-5460
b9dc458eafb1a8e7aff36df09ebad220d3ca528d 10.244.95.218:6379@16379,redis-cluster-follower-1 slave 0b4d5d5b62e5d3cbb4dcf5d668844b5c7d657682 0 1753167524767 2 connected
ae32c2492600c36453725c2e5ce2d5d24e99f68a 10.244.66.37:6379@16379,redis-cluster-follower-2 slave 1d1e17f99792b1e6087540dd0de433ad385b4aa5 0 1753167523000 3 connected
1d1e17f99792b1e6087540dd0de433ad385b4aa5 10.244.66.36:6379@16379,redis-cluster-leader-2 master - 0 1753167523000 3 connected 10923-16383
0b4d5d5b62e5d3cbb4dcf5d668844b5c7d657682 10.244.95.205:6379@16379,redis-cluster-leader-1 master - 0 1753167524566 2 connected 5461-10922

检查service状态

root@k8s-master01:~/redis# kubectl get svc -n redis-server
NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
redis-cluster-follower              ClusterIP   10.244.49.246   <none>        6379/TCP   81m
redis-cluster-follower-additional   ClusterIP   10.244.2.131    <none>        6379/TCP   81m
redis-cluster-follower-headless     ClusterIP   None            <none>        6379/TCP   81m
redis-cluster-leader                ClusterIP   10.244.37.225   <none>        6379/TCP   82m
redis-cluster-leader-additional     ClusterIP   10.244.22.8     <none>        6379/TCP   82m
redis-cluster-leader-headless       ClusterIP   None            <none>        6379/TCP   82m
redis-cluster-master                ClusterIP   10.244.38.178   <none>        6379/TCP   82m

关键 Service:

  • my-redis-cluster-leader: 主要的连接入口,指向当前的 leader/master 节点。客户端应连接此 Service。
  • my-redis-cluster-follower: 指向 follower/slave 节点,用于读密集型场景(需要客户端配置)。
3.2.2 配置Redis访问凭证

上面可以看到密码是Helm自动配置的默认密码,那么怎么修改密码呢

1)首先创建一个secret

# redis-secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: redis-secret # 此名称将在 RedisCluster CR 中引用
  namespace: redis-server # 建议为 Redis 集群创建一个专用命名空间
data:
  password: UEBzc3cwcmQ= # "P@ssw0rd" 的 Base64 编码
type: Opaque

2)将redis cluster的values get下来

$ helm get values -n redis-server redis-cluster --all >> values.yaml

3)修改values.yaml文件

$ vim values.yaml
redisCluster:
  redisSecret:
    secretKey: "password"        # secret的key名
    secretName: "redis-secret"   # 刚刚创建的secret名称

4)更新helm

root@k8s-master01:~/redis# helm upgrade -n redis-server redis-cluster ot-helm/redis-cluster -f values.yaml
Release "redis-cluster" has been upgraded. Happy Helming!
NAME: redis-cluster
LAST DEPLOYED: Tue Jul 22 15:11:40 2025
NAMESPACE: redis-server
STATUS: deployed
REVISION: 2
TEST SUITE: None

这里做一个小记
helm upgrade 更新helm命令格式如下
Usage:  helm upgrade [RELEASE] [CHART] [flags]
RELEASE名称
CHART名称

CHART名称可以根据以下命令获得
$ helm search repo ot-helm | grep redis-cluster
ot-helm/redis-cluster           0.17.0          0.17.0          Provides easy redis setup definitions for Kuber...

5)现在使用更新后的redis的密码查看集群状态,验证我们密码是否更新成功

$ kubectl exec -it redis-cluster-leader-0 -n redis-server -- redis-cli -a 'P@ssw0rd' cluster nodes
可以看到如下输出:密码更新成功!
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
b9dc458eafb1a8e7aff36df09ebad220d3ca528d 10.244.95.220:6379@16379,redis-cluster-follower-1 master - 0 1753168734000 5 connected 5461-10922
1d1e17f99792b1e6087540dd0de433ad385b4aa5 10.244.66.38:6379@16379,redis-cluster-leader-2 slave ae32c2492600c36453725c2e5ce2d5d24e99f68a 0 1753168734000 4 connected
6e7492c5979f60305003a159e6dbcdfec0a74926 10.244.96.158:6379@16379,redis-cluster-follower-0 master - 0 1753168735135 6 connected 0-5460
0b4d5d5b62e5d3cbb4dcf5d668844b5c7d657682 10.244.95.219:6379@16379,redis-cluster-leader-1 slave b9dc458eafb1a8e7aff36df09ebad220d3ca528d 0 1753168734000 5 connected
e988ec0596613ded3f6ac8f241851596f4414814 10.244.96.157:6379@16379,redis-cluster-leader-0 myself,slave 6e7492c5979f60305003a159e6dbcdfec0a74926 0 1753168734000 6 connected
ae32c2492600c36453725c2e5ce2d5d24e99f68a 10.244.66.39:6379@16379,redis-cluster-follower-2 master - 0 1753168734131 4 connected 10923-16383
3.1.3 Redis Cluster配置参数清单

Redis Cluster Helm配置清单如下

Key Type Default Description
TLS.ca string "ca.key"
TLS.cert string "tls.crt"
TLS.key string "tls.key"
TLS.secret.secretName string ""
acl.secret.secretName string ""
env list []
externalConfig.data string "tcp-keepalive 400\nslowlog-max-len 158\nstream-node-max-bytes 2048\n"
externalConfig.enabled bool false
externalService.enabled bool false
externalService.port int 6379
externalService.serviceType string "LoadBalancer"
initContainer.args list []
initContainer.command list []
initContainer.enabled bool false
initContainer.env list []
initContainer.image string ""
initContainer.imagePullPolicy string "IfNotPresent"
initContainer.resources object {}
labels object {}
podSecurityContext.fsGroup int 1000
podSecurityContext.runAsUser int 1000
priorityClassName string ""
redisCluster.clusterSize int 3
redisCluster.clusterVersion string "v7"
redisCluster.follower.affinity string nil
redisCluster.follower.nodeSelector string nil
redisCluster.follower.pdb.enabled bool false
redisCluster.follower.pdb.maxUnavailable int 1
redisCluster.follower.pdb.minAvailable int 1
redisCluster.follower.replicas int 3
redisCluster.follower.securityContext object {}
redisCluster.follower.serviceType string "ClusterIP"
redisCluster.follower.tolerations list []
redisCluster.image string "quay.io/opstree/redis"
redisCluster.imagePullPolicy string "IfNotPresent"
redisCluster.imagePullSecrets object {}
redisCluster.leader.affinity object {}
redisCluster.leader.nodeSelector string nil
redisCluster.leader.pdb.enabled bool false
redisCluster.leader.pdb.maxUnavailable int 1
redisCluster.leader.pdb.minAvailable int 1
redisCluster.leader.replicas int 3
redisCluster.leader.securityContext object {}
redisCluster.leader.serviceType string "ClusterIP"
redisCluster.leader.tolerations list []
redisCluster.minReadySeconds int 0
redisCluster.name string ""
redisCluster.persistenceEnabled bool true
redisCluster.recreateStatefulSetOnUpdateInvalid bool false Some fields of statefulset are immutable, such as volumeClaimTemplates. When set to true, the operator will delete the statefulset and recreate it. Default is false.
redisCluster.redisSecret.secretKey string ""
redisCluster.redisSecret.secretName string ""
redisCluster.resources object {}
redisCluster.tag string "v7.0.15"
redisExporter.enabled bool false
redisExporter.env list []
redisExporter.image string "quay.io/opstree/redis-exporter"
redisExporter.imagePullPolicy string "IfNotPresent"
redisExporter.resources object {}
redisExporter.tag string "v1.44.0"
serviceAccountName string ""
serviceMonitor.enabled bool false
serviceMonitor.interval string "30s"
serviceMonitor.namespace string "monitoring"
serviceMonitor.scrapeTimeout string "10s"
sidecars.env object {}
sidecars.image string ""
sidecars.imagePullPolicy string "IfNotPresent"
sidecars.name string ""
sidecars.resources.limits.cpu string "100m"
sidecars.resources.limits.memory string "128Mi"
sidecars.resources.requests.cpu string "50m"
sidecars.resources.requests.memory string "64Mi"
storageSpec.nodeConfVolume bool true
storageSpec.nodeConfVolumeClaimTemplate.spec.accessModes[0] string "ReadWriteOnce"
storageSpec.nodeConfVolumeClaimTemplate.spec.resources.requests.storage string "1Gi"
storageSpec.volumeClaimTemplate.spec.accessModes[0] string "ReadWriteOnce"
storageSpec.volumeClaimTemplate.spec.resources.requests.storage string "1Gi"

3.3 Redis Replication

3.3.1 Redis Replication部署

Redis Replication 是将数据从一个 Redis 主节点(leader node)同步到一个或多个从节点(follower node)的过程。

在 Redis Replication 中,主节点负责接收写请求,并将数据变更同步到一个或多个从节点。从节点从主节点接收数据变更并在本地应用,从而形成主节点数据集的副本。

Redis Replication采用异步复制方式,这意味着主节点在发送新更新时,不会等待从节点完成变更应用。相反,从节点会根据可用的网络带宽和硬件性能,尽快与主节点同步数据。

在这里插入图片描述

使用Helm部署

$ helm install redis-replication ot-helm/redis-replication \
  --set redisreplication.clusterSize=3 --namespace ot-operators
...
NAME: redis-replication
LAST DEPLOYED: Tue Mar 21 22:47:44 2023
NAMESPACE: ot-operators
STATUS: deployed
REVISION: 1
TEST SUITE: None

通过检查 pod 的状态来验证Redis Replication

root@k8s-master01:~/redis# kubectl get all -n redis-server
NAME                      READY   STATUS    RESTARTS   AGE
pod/redis-replication-0   1/1     Running   0          26s
pod/redis-replication-1   1/1     Running   0          23s
pod/redis-replication-2   1/1     Running   0          20s

NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/redis-replication              ClusterIP   10.244.4.100    <none>        6379/TCP   26s
service/redis-replication-additional   ClusterIP   10.244.18.241   <none>        6379/TCP   26s
service/redis-replication-headless     ClusterIP   None            <none>        6379/TCP   26s
service/redis-replication-master       ClusterIP   10.244.53.248   <none>        6379/TCP   26s
service/redis-replication-replica      ClusterIP   10.244.2.65     <none>        6379/TCP   26s

NAME                                 READY   AGE
statefulset.apps/redis-replication   3/3     26s
3.3.2 Redis Replication配置参数清单

配置参数清单如下

Key Type Default Description
TLS.ca string "ca.key"
TLS.cert string "tls.crt"
TLS.key string "tls.key"
TLS.secret.secretName string ""
affinity object {}
env list []
externalConfig.data string "tcp-keepalive 400\nslowlog-max-len 158\nstream-node-max-bytes 2048\n"
externalConfig.enabled bool false
externalService.enabled bool false
externalService.port int 26379
externalService.serviceType string "NodePort"
initContainer.args list []
initContainer.command list []
initContainer.enabled bool false
initContainer.env list []
initContainer.image string ""
initContainer.imagePullPolicy string "IfNotPresent"
initContainer.resources object {}
labels object {}
livenessProbe.failureThreshold int 3
livenessProbe.initialDelaySeconds int 1
livenessProbe.periodSeconds int 10
livenessProbe.successThreshold int 1
livenessProbe.timeoutSeconds int 1
nodeSelector object {}
pdb.enabled bool false
pdb.maxUnavailable string nil
pdb.minAvailable int 1
podSecurityContext.fsGroup int 1000
podSecurityContext.runAsUser int 1000
priorityClassName string ""
readinessProbe.failureThreshold int 3
readinessProbe.initialDelaySeconds int 1
readinessProbe.periodSeconds int 10
readinessProbe.successThreshold int 1
readinessProbe.timeoutSeconds int 1
redisExporter.enabled bool false
redisExporter.env list []
redisExporter.image string "quay.io/opstree/redis-exporter"
redisExporter.imagePullPolicy string "IfNotPresent"
redisExporter.resources object {}
redisExporter.tag string "v1.44.0"
redisSentinel.clusterSize int 3
redisSentinel.ignoreAnnotations list []
redisSentinel.image string "quay.io/opstree/redis-sentinel"
redisSentinel.imagePullPolicy string "IfNotPresent"
redisSentinel.imagePullSecrets list []
redisSentinel.minReadySeconds int 0
redisSentinel.name string ""
redisSentinel.recreateStatefulSetOnUpdateInvalid bool false Some fields of statefulset are immutable, such as volumeClaimTemplates. When set to true, the operator will delete the statefulset and recreate it. Default is false.
redisSentinel.redisSecret.secretKey string ""
redisSentinel.redisSecret.secretName string ""
redisSentinel.resources object {}
redisSentinel.serviceType string "ClusterIP"
redisSentinel.tag string "v7.0.15"
redisSentinelConfig.downAfterMilliseconds string ""
redisSentinelConfig.failoverTimeout string ""
redisSentinelConfig.masterGroupName string ""
redisSentinelConfig.parallelSyncs string ""
redisSentinelConfig.quorum string ""
redisSentinelConfig.redisPort string ""
redisSentinelConfig.redisReplicationName string "redis-replication"
redisSentinelConfig.redisReplicationPassword.secretKey string ""
redisSentinelConfig.redisReplicationPassword.secretName string ""
redisSentinelConfig.resolveHostnames string "no"
redisSentinelConfig.announceHostnames string "no"
securityContext object {}
serviceAccountName string ""
serviceMonitor.enabled bool false
serviceMonitor.interval string "30s"
serviceMonitor.namespace string "monitoring"
serviceMonitor.scrapeTimeout string "10s"
sidecars.env list []
sidecars.image string ""
sidecars.imagePullPolicy string "IfNotPresent"
sidecars.name string ""
sidecars.resources.limits.cpu string "100m"
sidecars.resources.limits.memory string "128Mi"
sidecars.resources.requests.cpu string "50m"
sidecars.resources.requests.memory string "64Mi"
tolerations list []

3.4 Redis Sentinel

3.4.1 概述

Redis Sentinel 是 Redis 的高可用组件,主要功能包括:

  • 自动故障转移:当主节点出现故障时,Sentinel 能自动选出新的主节点;
  • 节点监控:持续检查 Redis 主/从节点是否健康;
  • 通知功能:在故障发生或切换时发送通知;
  • 配置管理:更新集群配置信息,让从节点连接新主节点。

Sentinel 本质上是一个运行独立进程的监控系统,它之间会相互通信,并与 Redis 节点互动,以实现高可用。

在这里插入图片描述

3.4.2 部署

1) 使用Helm部署

使用helm部署

$ helm install redis-sentinel ot-helm/redis-sentinel \
  --set redissentinel.clusterSize=3  --namespace ot-operators \
  --set redisSentinelConfig.redisReplicationName="redis-replication"
...
NAME: redis-sentinel
LAST DEPLOYED: Tue Mar 21 23:11:57 2023
NAMESPACE: ot-operators
STATUS: deployed
REVISION: 1
TEST SUITE: None

参数解释

{"level":"error","ts":"2025-07-28T03:55:46Z","msg":"","controller":"redissentinel","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisSentinel","RedisSentinel":{"name":"redis-sentinel","namespace":"juicefs"},"namespace":"juicefs","name":"redis-sentinel","reconcileID":"81a58023-46ae-4084-8aeb-088ffdc088ee","error":"no real master pod found","stacktrace":"github.com/OT-CONTAINER-KIT/redis-operator/internal/k8sutils.getRedisReplicationMasterPod\n\t/workspace/internal/k8sutils/redis-sentinel.go:363\ngithub.com/OT-CONTAINER-KIT/redis-operator/internal/k8sutils.getRedisReplicationMasterIP\n\t/workspace/internal/k8sutils/redis-sentinel.go:374\ngithub.com/OT-CONTAINER-KIT/redis-operator/internal/k8sutils.IsRedisReplicationReady\n\t/workspace/internal/k8sutils/redis-replication.go:242\ngithub.com/OT-CONTAINER-KIT/redis-operator/internal/controller/redissentinel.(*RedisSentinelReconciler).reconcileReplication\n\t/workspace/internal/controller/redissentinel/redissentinel_controller.go:93\ngithub.com/OT-CONTAINER-KIT/redis-operator/internal/controller/redissentinel.(*RedisSentinelReconciler).Reconcile\n\t/workspace/internal/controller/redissentinel/redissentinel_controller.go:60\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
  • redis-sentinel:发布名称
  • clusterSize=3:部署 3 个 sentinel 实例,形成一个可选举的仲裁集群
  • redisSentinelConfig.redisReplicationName="redis-replication":这是一个关键参数,必须是已存在的 RedisReplication 资源名称,Sentinel 用来监控它

查看Pod状态

$ kubectl get pod -n ot-operators
NAME                        READY   STATUS    RESTARTS   AGE
redis-replication-0         1/1     Running   0          107s
redis-replication-1         1/1     Running   0          105s
redis-replication-2         1/1     Running   0          101s
redis-sentinel-sentinel-0   1/1     Running   0          67s
redis-sentinel-sentinel-1   1/1     Running   0          59s
redis-sentinel-sentinel-2   1/1     Running   0          51s

查看service状态

root@k8s-master01:~/redis# kubectl get svc -n redis-server
NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
redis-replication                    ClusterIP   10.244.45.240   <none>        6379/TCP    113s
redis-replication-additional         ClusterIP   10.244.7.159    <none>        6379/TCP    113s
redis-replication-headless           ClusterIP   None            <none>        6379/TCP    113s
redis-replication-master             ClusterIP   10.244.51.62    <none>        6379/TCP    113s
redis-replication-replica            ClusterIP   10.244.9.249    <none>        6379/TCP    113s
redis-sentinel-sentinel              ClusterIP   10.244.6.134    <none>        26379/TCP   44s
redis-sentinel-sentinel-additional   ClusterIP   10.244.12.179   <none>        26379/TCP   43s
redis-sentinel-sentinel-headless     ClusterIP   None            <none>        26379/TCP   44s

注意事项:

  • 客户端通过 Sentinel 获取当前主节点 IP 和端口,所以客户端连接sentinel节点的地址
  • 主节点故障时,Sentinel 会自动选出新主节点,客户端通过 Sentinel 能及时获取最新的主节点信息;

2) 使用YAML部署

以下是一个最基本的 Redis Sentinel YAML 配置示例:

apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisSentinel
metadata:
  name: redis-sentinel
spec:
  clusterSize: 3
  podSecurityContext:
    runAsUser: 1000
    fsGroup: 1000
  redisSentinelConfig:
    redisReplicationName: redis-replication  # 必须是已存在的 RedisReplication 名称
  kubernetesConfig:
    image: quay.io/opstree/redis-sentinel:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi

应用 YAML 文件:

kubectl apply -f sentinel.yaml

重要说明:

  • redisReplicationName 字段必须与已有的 RedisReplication 资源名称一致;
  • RedisSentinel 是用来监控由 RedisReplication 创建的 Redis 主从节点;
  • 所以在部署 Sentinel 之前,必须先部署 RedisReplication 资源。
3.4.3 验证

前提条件:配置好redis-replication和redis-sentinel的密码

在本机安装redis-cli客户端连接测试,发现可以正常连接进sentinel节点
root@k8s-master01:~/redis# redis-cli -h 10.244.6.134 -p 26379 -a P@ssw0rd
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.244.6.134:26379>

查看节点状态可以发现,1主节点,2个从节点,3个哨兵节点
10.244.6.134:26379> INFO Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=myMaster,status=ok,address=10.244.95.228:6379,slaves=2,sentinels=3



常用命令
SENTINEL masters   # 查看所有被监控的主节点信息
SENTINEL get-master-addr-by-name <master-name>  # 获取主节点地址
SENTINEL slaves <master-name>  # 查看某个主节点的从节点列表
INFO Sentinel      # 查看sentinel自身状态

3.5 Redis 故障模拟测试

本次使用 redis replication和redisSentinel模式测试,模拟主节点宕机,会不会自动把从节点切换成主节点,并且流量导向也到新主节点

准备redis主从+哨兵环境如下
root@k8s-master01:~# kubectl get pod,svc -n juicefs | grep redis
pod/redis-operator-865fcc9887-smj5b   1/1     Running   0          45m
pod/redis-replication-0               1/1     Running   0          25m
pod/redis-replication-1               1/1     Running   0          25m
pod/redis-replication-2               1/1     Running   0          25m
pod/redis-sentinel-sentinel-0         1/1     Running   0          2m29s
pod/redis-sentinel-sentinel-1         1/1     Running   0          2m27s
pod/redis-sentinel-sentinel-2         1/1     Running   0          2m26s
service/redis-replication                    ClusterIP   10.244.18.194   <none>        6379/TCP         44m
service/redis-replication-additional         ClusterIP   10.244.11.114   <none>        6379/TCP         44m
service/redis-replication-headless           ClusterIP   None            <none>        6379/TCP         44m
service/redis-replication-master             ClusterIP   10.244.16.33    <none>        6379/TCP         44m
service/redis-replication-replica            ClusterIP   10.244.9.234    <none>        6379/TCP         44m
service/redis-sentinel-sentinel              ClusterIP   10.244.16.240   <none>        26379/TCP        2m24s
service/redis-sentinel-sentinel-additional   ClusterIP   10.244.1.80     <none>        26379/TCP        2m24s
service/redis-sentinel-sentinel-headless     ClusterIP   None            <none>        26379/TCP        2m25s

其中:redis-replication-master svc指向的是redis主从中的主节点

查看其配置文件可以发现,其指向的是 redis-role: master 标签
root@k8s-master01:~# kubectl get svc -n juicefs redis-replication-master -o yaml
...
spec:
  selector:
    app: redis-replication
    app.kubernetes.io/component: middleware
    app.kubernetes.io/instance: redis-replication
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis-replication
    app.kubernetes.io/version: 0.16.7
    helm.sh/chart: redis-replication-0.16.7
    redis-role: master
    redis_setup_type: replication
    role: replication
  sessionAffinity: None
  type: ClusterIP
....

测试

现在可知 redis-replication-0是主节点
root@k8s-master01:~# kubectl get pod -n juicefs  --show-labels | grep master
redis-replication-0               1/1     Running   0          29m     app.kubernetes.io/component=middleware,app.kubernetes.io/instance=redis-replication,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=redis-replication,app.kubernetes.io/version=0.16.7,app=redis-replication,apps.kubernetes.io/pod-index=0,controller-revision-hash=redis-replication-5487f6cd66,helm.sh/chart=redis-replication-0.16.7,redis-role=master,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-0

现在我们干掉 redis-replication-0,我直接把他所在的节点干掉
root@k8s-node-01:~# init 0


四、监控

4.1 Redis Exporter

在redis operator中内置了Redis Exporter,以 Prometheus 格式导出 redis 设置的指标。

监控架构如下图所示

在这里插入图片描述

对于使用Helm部署的 Redis资源,只需要修改values.yaml的如下参数即可开启Redis Exporter

redisExporter:
  enabled: true
  image: quay.io/opstree/redis-exporter:1.0
  imagePullPolicy: Always
  
  或
redisExporter:
  enabled: true
  image: quay.io/opstree/redis-exporter
  imagePullPolicy: IfNotPresent
  tag: v1.44.0

然后更新redis

root@k8s-master01:~/redis# helm upgrade -n redis-server redis-replication ot-helm/redis-replication -f values.yaml
Release "redis-replication" has been upgraded. Happy Helming!
NAME: redis-replication
LAST DEPLOYED: Tue Jul 22 17:10:48 2025
NAMESPACE: redis-server
STATUS: deployed
REVISION: 2
TEST SUITE: None

查看redis exporter部署详情

查看发现,Pod数量变成了2,因为redis-exporter是以sidecar的形式跑pod里面
root@k8s-master01:~/redis# kubectl get pod -n redis-server
NAME                  READY   STATUS    RESTARTS   AGE
redis-replication-0   2/2     Running   0          9m21s
redis-replication-1   2/2     Running   0          9m29s
redis-replication-2   2/2     Running   0          9m37s

查看pod详情
root@k8s-master01:~/redis# kubectl get pod -n redis-server redis-replication-0 -o yaml
...
status:
  containerStatuses:
  - containerID: containerd://bf31f0fddf0a0b3d43efd6f5fae1c86821d1412d23323c724ebf4e10cb0d4527
    image: quay.io/opstree/redis-exporter:v1.44.0
    imageID: quay.io/opstree/redis-exporter@sha256:a63d2b6e946f8b82467ec5f853c24ab994b7cab5eacc3c3cbaa50e49bd27f235
    lastState: {}
    name: redis-exporter
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2025-07-22T09:11:12Z"
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-ps47p
      readOnly: true
      recursiveReadOnly: Disabled
...


查看svc详情,可以发现也多映射了一个端口,这个9121就是我们要采集的指标端口
root@k8s-master01:~/redis# kubectl get svc -n redis-server redis-replication -o yaml
...
spec:
  ports:
  ...
  - name: redis-exporter
    port: 9121
    protocol: TCP
    targetPort: 9121
  ...

这样就可以配置Prometheus去采集指标了

4.2 ServiceMonitor

配置完Redis Exporter之后,在redis operator中还支持配置serviceMonitor,他会自动创建一个serviceMonitor资源,如果环境中有prometheus-oprator会很方便的就采集到指标

修改values.yaml文件
serviceMonitor:
  enabled: true  # 修改为true
  extraLabels: {}
  interval: 30s
  namespace: monitoring
  scrapeTimeout: 10s

更新redis

root@k8s-master01:~/redis# helm upgrade -n redis-server redis-replication ot-helm/redis-replication --install -f values.yaml
Release "redis-replication" has been upgraded. Happy Helming!
NAME: redis-replication
LAST DEPLOYED: Tue Jul 22 17:53:35 2025
NAMESPACE: redis-server
STATUS: deployed
REVISION: 4
TEST SUITE: None

查看operator创建的serviceMonitor

root@k8s-master01:~/redis# kubectl get serviceMonitor -n redis-server
NAME                                      AGE
redis-replication-prometheus-monitoring   22s
root@k8s-master01:~/redis# kubectl get serviceMonitor -n redis-server -o yaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    annotations:
      meta.helm.sh/release-name: redis-replication
      meta.helm.sh/release-namespace: redis-server
    creationTimestamp: "2025-07-22T09:53:36Z"
    generation: 1
    labels:
      app.kubernetes.io/component: middleware
      app.kubernetes.io/instance: redis-replication
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: redis-replication
      app.kubernetes.io/version: 0.16.7
      helm.sh/chart: redis-replication-0.16.7
    name: redis-replication-prometheus-monitoring
    namespace: redis-server
    resourceVersion: "3076926"
    uid: d13aeee2-c748-479b-be9e-79f49b3882fd
  spec:
    endpoints:
    - interval: 30s
      port: redis-exporter
      scrapeTimeout: 10s
    namespaceSelector:
      matchNames:
      - redis-server
    selector:
      matchLabels:
        app: redis-replication
        redis_setup_type: replication
        role: replication
kind: List
metadata:
  resourceVersion: ""
Logo

一站式 AI 云服务平台

更多推荐