基于Rook构建Ceph云原生存储
# 云原生存储概述
# k8s数据持久化
松耦合的方式,Pod与存储分离,三种方式
- volume,需要知道后端存储的细节
- PV/PVC 管理员定义PV,PersistentVolume。用户通过PVC的使用PersistentVolumeClaim。
- storageclass: 静态+动态,通过PVC声明使用的空间,自动创建PV和后端驱动的对接。
普通存储卷
configmap和secret以及secret都是容器启动时依赖的数据,emptyDir和hostPath都是临时数据存储,nfs,cephfs,GlutserFS,cloud都是持久化数据存储。
存储没有单独的资源对象,与Pod的生命周期一起。
- configmap
- secret
- emptyDir
- hostPath
- nfs
- cephfs
- ClutserFS
- Cloud
# 揭开rook神秘面纱
# Ceph部署方法
Ceph常见的部署方法有:
- ceph-deploy
- cephadm
- 手工部署
- Rook
# 什么是Rook
Rook 将分布式存储系统转变为自我管理、自我扩展、自我修复的存储服务。它可以自动执行存储管理员的任务:部署、引导、配置、供应、扩展、升级、迁移、灾难恢复、监控和资源管理。
定义: 开源,为Kubernetes而服务的云原生存储,即为Kubernetes
# Rook存储类型
- Ceph
- NFS
- Cassandra
- CockroachDB
- Yugabyte DB
- EdgeFS
# Rook与K8S结合
rook负责初始化和管理Ceph集群
- monitor集群
- mgr集群
- osd集群
- pool管理
- 对象存储
- 文件存储
rook负责提供访问存储所需的驱动
- Flex驱动(旧驱动,不建议使用)
- CSI驱动
- RBD块存储
- CephFS文件存储
- S3/Swift风格对象存储
# Rook架构解析
所有对象依托于Kubernetes集群
- mon
- rgw
- mds
- mgr
- osd
- Agent
- csi-rbdplugin
- csi-cephfsplugi
抽象化管理,隐藏细节
- pool
- volumes
- filesystems
- buckets
# Rook快速入门
# 部署Rook的前提
集群环境介绍
操作系统版本: CentOS Linux release 8.3.2011
kubenetes版本: kubernetes v1.20.2
节点名称 | IP地址 | k8s角色 | 组件名称 | 磁盘-50G |
---|---|---|---|---|
node1 | master | ceph-mon ceph-mgr ceph-osd csi-cephfsplugin csi-cephfsplugin | /dev/vdb | |
node2 | node | ceph-mon ceph-mgr ceph-osd csi-cephfsplugin csi-cephfsplugin | /dev/vdb | |
node3 | node | ceph-mon ceph-osd csi-cephfsplugin csi-cephfsplugin | /dev/vdb | |
node4 | node | ceph-osd csi-cephfsplugin csi-cephfsplugin | /dev/vdb | |
node5 | node | ceph-osd csi-cephfsplugin csi-cephfsplugin | /dev/vdb |
# 获取源码
$ git clone --single-branch --branch v1.8.3 https://github.com/rook/rook.git
$ cd rook/deploy/examples
$ kubectl create -f crds.yaml -f common.yaml -f operator.yaml
$ kubectl create -f cluster.yaml
# 拉取Ceph镜像
[root@node1 ~]# docker pull rook/ceph:v1.8.3
v1.8.3: Pulling from rook/ceph
Digest: sha256:d3b03079b6e055f5a436611fb06f3eaa3991fe222ff066765491d66e4cd5f401
Status: Image is up to date for rook/ceph:v1.8.3
docker.io/rook/ceph:v1.8.3
# 拉取Csi镜像
找出需要拉取的镜像
[root@master ~]# for i in `kubectl get pods -n rook-ceph -o jsonpath='{.items[*].spec.containers[*].image}'`;do echo ${i} | grep gcr.io;done
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
然后编写一个脚本拉取镜像
#!/bin/bash
image_list=(
csi-provisioner:v3.1.0
csi-node-driver-registrar:v2.4.0
csi-attacher:v3.4.0
csi-snapshotter:v4.2.0
csi-resizer:v1.3.0
)
aliyuncs="registry.aliyuncs.com/it00021hot" 或者 registry.aliyuncs.com/google_containers
google_gcr="k8s.gcr.io/sig-storage"
for image in ${image_list[*]}
do
docker image pull ${aliyuncs}/${image}
docker image tag ${aliyuncs}/${image} ${google_gcr}/${image}
docker image rm ${aliyuncs}/${image}
echo "${aliyuncs}/${image} ${google_gcr}/${image} downloaded"
done
# 查看状态
[root@master ~]# kubectl get pods -o wide -n rook-ceph
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-6cmc9 3/3 Running 0 26m 192.168.0.121 node2 <none> <none>
csi-cephfsplugin-kpk9h 3/3 Running 0 27m 192.168.0.133 node4 <none> <none>
csi-cephfsplugin-n87jf 3/3 Running 0 26m 192.168.0.42 node3 <none> <none>
csi-cephfsplugin-provisioner-57fdf9cc88-6c57f 6/6 Running 0 27m 10.244.104.10 node2 <none> <none>
csi-cephfsplugin-provisioner-57fdf9cc88-dqhf8 6/6 Running 0 27m 10.244.135.11 node3 <none> <none>
csi-cephfsplugin-zck7h 3/3 Running 0 27m 192.168.0.177 node1 <none> <none>
csi-rbdplugin-btjpc 3/3 Running 0 26m 192.168.0.42 node3 <none> <none>
csi-rbdplugin-jb864 3/3 Running 0 26m 192.168.0.133 node4 <none> <none>
csi-rbdplugin-lpj4t 3/3 Running 0 27m 192.168.0.177 node1 <none> <none>
csi-rbdplugin-provisioner-865f84ff48-bwwvw 6/6 Running 0 27m 10.244.166.143 node1 <none> <none>
csi-rbdplugin-provisioner-865f84ff48-ls2jv 6/6 Running 0 27m 10.244.3.75 node4 <none> <none>
csi-rbdplugin-w9wb5 3/3 Running 0 27m 192.168.0.121 node2 <none> <none>
rook-ceph-crashcollector-node1-7bdf8cf7d-8fvfw 1/1 Running 0 60m 10.244.166.138 node1 <none> <none>
rook-ceph-crashcollector-node2-5d87c47d67-qqgzw 1/1 Running 0 62m 10.244.104.4 node2 <none> <none>
rook-ceph-crashcollector-node3-5999b78949-rqm6d 1/1 Running 0 51m 10.244.135.8 node3 <none> <none>
rook-ceph-crashcollector-node4-69d99c8b7c-v6rbq 1/1 Running 0 61m 10.244.3.69 node4 <none> <none>
rook-ceph-mgr-a-9bfd675fc-pblwn 1/1 Running 0 62m 10.244.166.134 node1 <none> <none>
rook-ceph-mon-a-6985594f74-55hdt 1/1 Running 0 70m 10.244.3.68 node4 <none> <none>
rook-ceph-mon-b-d7b47d9c6-ptv5n 1/1 Running 0 65m 10.244.104.3 node2 <none> <none>
rook-ceph-mon-c-58dfcdfd8f-87svf 1/1 Running 0 62m 10.244.166.133 node1 <none> <none>
rook-ceph-operator-6b5c6b5dcf-dwfhp 1/1 Running 0 87m 10.244.104.1 node2 <none> <none>
rook-ceph-osd-0-65d46bb8d8-p6cdm 1/1 Running 0 55m 10.244.3.73 node4 <none> <none>
rook-ceph-osd-1-7887c44f8d-8qdmr 1/1 Running 0 55m 10.244.104.8 node2 <none> <none>
rook-ceph-osd-2-55c6df6466-slxrf 1/1 Running 0 53m 10.244.166.140 node1 <none> <none>
rook-ceph-osd-3-6648dc584-mjtbv 1/1 Running 0 51m 10.244.135.7 node3 <none> <none>
rook-ceph-osd-prepare-node1-bx4bw 0/1 Completed 0 49m 10.244.166.142 node1 <none> <none>
rook-ceph-osd-prepare-node2-kcbrv 0/1 Completed 0 49m 10.244.104.9 node2 <none> <none>
rook-ceph-osd-prepare-node3-9t666 0/1 Completed 0 49m 10.244.135.10 node3 <none> <none>
rook-ceph-osd-prepare-node4-rcrfc 0/1 Completed 0 49m 10.244.3.74 node4 <none> <none>
# Master加入OSD
因为Master节点是被禁止调度,就是污点,需要设置取消禁止调度和污点。然后让Master加入OSD节点。
[root@master ~]# kubectl taint node master node-role.kubernetes.io/master:NoSchedule-
node/master untainted
# 部署Rook Ceph 工具
[root@master examples]# kubectl create -f toolbox.yaml
通过ceph-tool工具pod查看ceph集群状态
[root@master ~]# kubectl exec -it pod/rook-ceph-tools-7bbd566fd9-smchn -n rook-ceph -- bash
[rook@rook-ceph-tools-7bbd566fd9-smchn /]$ ceph -s
cluster:
id: 9790e7f4-cfa3-44de-a411-5926d3034cb7
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,c,b (age 106m)
mgr: a(active, since 104m)
osd: 5 osds: 5 up (since 29m), 5 in (since 29m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 28 MiB used, 250 GiB / 250 GiB avail
pgs: 1 active+clean
[rook@rook-ceph-tools-7bbd566fd9-smchn /]$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 250 GiB 250 GiB 28 MiB 28 MiB 0.01
TOTAL 250 GiB 250 GiB 28 MiB 28 MiB 0.01
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 79 GiB
# 部署Ceph Dashboard
[root@master examples]# kubectl apply -f dashboard-external-https.yaml
获取dashboard的admin密码
[root@master examples]# kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o \
> jsonpath="{['data']['password']}" | base64 -d
drX3{d2D#dU,Z,_#z6kn
通过NodePort去访问
[root@master examples]# kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-cephfsplugin-metrics ClusterIP 10.103.81.126 <none> 8080/TCP,8081/TCP 110m
csi-rbdplugin-metrics ClusterIP 10.105.224.132 <none> 8080/TCP,8081/TCP 110m
rook-ceph-mgr ClusterIP 10.104.54.196 <none> 9283/TCP 98m
rook-ceph-mgr-dashboard ClusterIP 10.109.104.68 <none> 8443/TCP 98m
rook-ceph-mgr-dashboard-external-https NodePort 10.104.174.92 <none> 8443:32349/TCP 32m #这个就是dashboard的地址
rook-ceph-mon-a ClusterIP 10.100.122.20 <none> 6789/TCP,3300/TCP 108m
rook-ceph-mon-b ClusterIP 10.97.200.54 <none> 6789/TCP,3300/TCP 103m
rook-ceph-mon-c ClusterIP 10.101.245.217 <none> 6789/TCP,3300/TCP 100m
# Ceph集群管理
# Ceph资源对象
Ceph包含的组件:
- mon monitor 管理集群
- mgr manager 监控管理
- mds CephFS 元数据管理
- rgw 对象存储
- osd 存储
# K8S集群访问Ceph
查看容器内部的Ceph配置文件
[rook@rook-ceph-tools-7bbd566fd9-smchn /]$ cat /etc/ceph/ceph.conf
[global]
mon_host = 10.101.245.217:6789,10.100.122.20:6789,10.97.200.54:6789
[client.admin]
keyring = /etc/ceph/keyring
[rook@rook-ceph-tools-7bbd566fd9-smchn /]$ cat /etc/ceph/keyring
[client.admin]
key = AQAZOPph2QMBMxAAqepptucmmM6eM5gG5ZbKHg==
拷贝容器内的配置文件到本地集群
[root@master ~]# mkdir /etc/ceph
[root@master ~]# vim /etc/ceph/ceph.conf
[global]
mon_host = 10.101.245.217:6789,10.100.122.20:6789,10.97.200.54:6789
[client.admin]
keyring = /etc/ceph/keyring
[root@master ~]# vim /etc/ceph/keyring
[client.admin]
key = AQAZOPph2QMBMxAAqepptucmmM6eM5gG5ZbKHg==
安装Ceph客户端
[root@master ~]# cat > /etc/yum.repos.d/ceph.repo << EOF
> [ceph]
> name=ceph
> baseurl=https://mirrors.aliyun.com/ceph/rpm-pacific/el8/x86_64/
> gpgcheck=0
> EOF
[root@master ~]# yum install -y ceph-common
[root@master ~]# ceph -s
cluster:
id: 9790e7f4-cfa3-44de-a411-5926d3034cb7
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,c,b (age 106m)
mgr: a(active, since 104m)
osd: 5 osds: 5 up (since 29m), 5 in (since 29m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 28 MiB used, 250 GiB / 250 GiB avail
pgs: 1 active+clean
# 访问RBD块存储
1、创建pool
[root@master ~]# ceph osd pool create rook 16 16
pool 'rook' created
[root@master ~]# ceph osd lspools
1 device_health_metrics
2 rook
2、在pool创建RBD块设备
[root@master ~]# rbd create -p rook --image rook-rbd.img --size 10G
[root@master ~]# rbd ls -p rook
rook-rbd.img
[root@master ~]# rbd info rook/rook-rbd.img
rbd image 'rook-rbd.img':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 4a802aa59f38
block_name_prefix: rbd_data.4a802aa59f38
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Feb 2 10:21:34 2022
access_timestamp: Wed Feb 2 10:21:34 2022
modify_timestamp: Wed Feb 2 10:21:34 2022
3、客户端挂载
[root@master ~]# rbd map rook/rook-rbd.img
[root@master ~]# rbd showmapped
[root@master ~]# mkfs.xfs /dev/rbd0
[root@master ~]# mount /dev/rbd0 /media/
# rook提供RBD服务
rook可以提供以下3类型的存储:
- Block: Create block storage to be consumed by a pod
- Object: Create an object store that is accessible inside or outside the Kubernetes cluster
- Shared File System: Create a file system to be shared across multiple pods
# 使用rook-ceph的块存储
1、基于 CSI 驱动程序创建卷 2、基于 flex 驱动程序创建卷 1、
此示例使用 CSI 驱动程序,它是 K8s 1.13 和更新版本的首选驱动程序。
2、flex 驱动程序(K8s 1.12 或更早版本需要)的存储类要基于 flex 驱动程序创建卷,确保通过 Ceph CSI 启用了 flex 驱动程序。为此,您需要在您的操作员部署文件 operator.yaml 中设置 ROOK_ENABLE_FLEX_DRIVER 为 true ,与 CSI 驱动程序相同。
注意:CSI 驱动程序,此示例要求每个节点至少有 1 个 OSD,每个 OSD 位于3 个不同的节点上。因为 failureDomain 设置为 host 并且 replicated.size 设置为 3 。
这里使用 CSI 驱动程序,创建了一个示例应用程序来使用 Rook 提供的块存储和经典的 wordpress 和 mysql 应用程序,这两个应用程序都将使用 Rook 提供的块卷
在提供块存储之前,需要先创建 StorageClass 和存储池pool,k8s需要这两类资源,才能和 Rook 交互,进而分配持久卷(PV)。
在kubernetes集群里,要提供rbd块设备服务,需要有如下步骤:
- 创建 rbd-provisione
- 创建 pool
- 创建对应的 storageclass
- 使用rbd对应的storageclass创建pvc,然后动态申请pv
- 创建pod使用pv( 使用存储 )
通过rook创建Ceph Cluster集群之后,rook自身提供了rbd-provisioner服务,所以不需要再部署 其provisioner(供应商,就是提供存储的)。
1、这里使用CSI 驱动程序
1、CSI 驱动程序 (2、3步骤可以直接使用下面命令执行)
# kubectl create -f /rook/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml
2、flex 驱动程序
# kubectl create -f cluster/examples/kubernetes/ceph/flex/storageclass.yaml
2、创建pool,创建一个自己的工作目录
[root@master ~]# cd /root/rook/cluster/examples/kubernetes/ceph
[root@master ~]# mkdir my_yaml
[root@master ~]# cat > pool.yaml << EOF
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool # operator会监听并创建一个pool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
requireSafeReplicaSize: true
EOF
[root@master ~]# kubectl apply -f pool.yaml
创建了一个名为 replicapool 的存储池
apply执行完后,登录 ceph dashboard 界面上能看到对应的pool
3、创建对应的 storageclass
[root@master ~]# cat > storageclass.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block # 这里创建一个storage class, 在pvc中指定这个storage class即可实现动态创建PV
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
Pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # 必须与rook-ceph集群ns相同
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4 # 指定卷的文件系统类型。如果没有指定,它将使用ext4
allowVolumeExpansion: true
reclaimPolicy: Delete # 不写默认值为Delete,其他选项包括“保留(Retain)”、“回收(Recycle)”
EOF
[root@master ~]# kubectl apply -f storageclass.yaml #创建了一个名为 rook-ceph-block 的storageClass
[root@master ~]# kubectl get storageclass
如果您在“rook-ceph”以外的命名空间中部署了 Rook operator ,请更改配置器中的前缀以匹配您 使用的命名空间。例如,如果 Rook operator 在名称空间“my-namespace”中运行,则供应商值应 为“my-namespace.rbd.csi.ceph.com”。