Table of contents
Make sure time synchronization is turned on
Regarding the first deployment of ceph, before, I just created a new user and allocated space. I had no deployment of my own, and I had relatively little actual production and operation experience. At most, I had done a data migration from ceph to Alibaba Cloud OSS before.
Because it is an intranet environment, except for uploading some packages, it is basically isolated from the external network, and the operation interface is also very uncomfortable. I thought it would be done quickly, but it actually took almost an afternoon to get it done. , encountered some pitfalls, but fortunately it was solved before get off work on Friday.
This article is briefly recorded. If you have the same questions, you can send a private message directly.
♦️
Preparation
Preparation
In fact, a lot of preparation work was done. Because the new host was provided by another department, some basic initialization operations were not done, and the user only provided an ordinary user.
Resources required:
host list | Deploy components |
192.168.20.2 | ceph-mon,ceph-mgr |
192.168.20.3 | ceph-mon,ceph-mgr,ceph-osd |
192.168.20.4 | ceph-mon,ceph-osd |
In addition to host resources, there are also deployment packages. Because it is an intranet environment and cannot access the yum source of the external network, you need to download the corresponding package yourself, and then go to the server to deploy your own yum source. The involved packages, There are two main parts. One is that ceph-deploy needs to be deployed on the installation node, and the other is other packages of ceph.
When I download these rpm packages, I use the Alibaba Cloud source. In addition to downloading the rpm package, I also need to download the corresponding repodata, which contains some dependencies between rpm packages. If you build your own local source ,is needed.
I downloaded it using gitbash on my laptop. The following two commands can prepare the corresponding package:
#ceph rpm包
for i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done
# ceph repodata
for i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done
In addition, ceph-deploy preparation is another path, as follows:
You also need to prepare rpm packages and repodata packages. ceph-deploy uses 2.0.1. You can also directly check which packages are currently available on the server yum list|grep ceph-deploy
for i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done
Local yum source construction
I saw that many previous articles were forwarded using ng as a proxy. Here, local files are directly used as the yum source. Although it needs to be operated on each host, I don’t want to use ng anymore.
vim /etc/yum.repos.d/ceph.repo
[ceph]
name=ceph
baseurl=file:///data/mirrors/ceph
gpgcheck=0
priority=1
You need to place the rpm package and repodata folder you just downloaded under this folder. After configuration, execute:
yum clean all
yum makecache
yum repolist
No error is reported during execution. You can check yum list|grep ceph to see if there is a corresponding package. If so, there will be no problem. Three machines need to do the same operation, which is a little troublesome.
In addition, on the host where ceph-deploy is deployed, you also need to add the yum source of ceph-deploy. The steps are the same.
At this point, the yum source is basically established.
Other preparations :
Turn off the firewall. In the past, this step was basically unnecessary, because internal mirrors or some cloud hosts were basically turned off, so I didn't pay attention at first, but I encountered pitfalls later.
systemctl stop firewalld
systemctl disable firewalld
Turn off selinux
#一般也是关闭的,没关的话下面两步都要做
#重启后生效,防止主机异常重启
sed -i 's/enforcing/disabled/' /etc/selinux/config
#即时生效
setenforce 0
Modify host name and add host
#修改主机名
hostnamectl set-hostname ceph1
#添加host
cat >> /etc/hosts << EOF
192.168.20.2 ceph1
192.168.20.3 ceph2
192.168.20.4 ceph3
EOF
Make sure time synchronization is turned on
This thing is also a pitfall encountered later, causing the mon service to always be abnormal. Because it is an intranet environment and there is no internal time synchronization service, the master node is used as the server and the other two are used as clients. Although synchronization is configured at the beginning, But it didn't take effect. It is recommended to adjust the time, and then observe whether the time is synchronized and whether the configuration is successful.
#可以使用ntp或者chrony
#安装chrony服务
yum install chrony -y
#主节点配置文件修改
vim /etc/chrony.conf
server 192.168.20.2 iburst
allow 192.168.20.0/24
local stratum 10
#ceph2、ceph3配置
server 192.168.20.2 iburst
#三台重启chrongyd服务
systemctl restart chrongyd
systemctl enable chronyd
#其它命令
chronyc -a makestep
chronyc sourcestats
chronyc sources -v
After executing the chronyc sources -v command, check whether the beginning of the returned result is ^*. If it is ^?, it means there is a problem with time synchronization and needs to be checked.
At this point, the basic preparation work is completed, and the complete deployment process will begin later.
♦️
Cluster deployment
yum installs related packages
ceph1 execution
yum install python-setuptools ceph-deploy -y
#注意查看 ceph-deploy 的版本 ceph-deploy --version
ceph1, ceph2, ceph3 execution
yum install -y ceph ceph-mon ceph-osd ceph-mds ceph-radosgw ceph-mgr
After the relevant packages are deployed, the cluster should be initialized first
mkdir /data/my-cluster
cd /data/my-cluster
ceph-deploy new ceph1
Make sure there are no exceptions and errors during execution
#mon服务的初始化
ceph-deploy mon create-initial
#将配置文件拷贝到对应的节点
ceph-deploy admin ceph2 ceph3
#如果想部署高可用的monitor ,可以将ceph2也加入mon集群
ceph-deploy mon add ceph2
# 复制ceph 相关文件,执行后可以使用ceph -s
cp ceph.client.admin.keyring /etc/ceph/
#Ensure a few pieces of information here. This is the condition to ensure deployment and continue moving forward.
1.ceph -s cluster is in HEALTH OK state
2. Is there a mon process?
3.Mon service checks whether it is running
The above 3 steps actually mean the same thing. If your health is ok, you don’t need to read the next two steps.
If it is not health ok, there will probably be two exceptions, one is a time synchronization exception error, and the other is a safe mode problem
#时间同步异常报错
clock skew detected on mon.node2
#禁用不 安全模式 报错“mon is allowing insecure global_id reclaim”
ceph config set mon auth_allow_insecure_global_id_reclaim false
After mon confirms its health, continue the deployment of mgr.
ceph-deploy mgr create ceph1
#如果想部署高可用mgr,可以将ceph2、ceph3也添加进来
ceph-deploy mgr create ceph2
After deployment, also confirm whether the status in ceph -s is normal and whether related processes and services are normal.
ps -ef|grep mgr process whether there is
The problem I encountered here is that there was no problem with the initialization, but the mgr process never came back to normal. I checked the service startup log and found that the directory permissions of mgr under /var are root. The service startup is the ceph user, and I need to chown to modify the permissions.
Then after restarting the service, the process started normally.
After mgr ends, continue to initialize the osd. Before this step, you need to prepare the disk in advance. It is best that it has not been formatted, otherwise you need to do a cleanup step.
ceph-deploy disk zap ceph1 /dev/sda3
#然后添加 osd
ceph-deploy osd create --data /dev/sda3 ceph2
ceph-deploy osd create --data /dev/sda3 ceph3
After adding it, also confirm whether the process services are normal.
Some ceph commands
#检查集群状态命令
ceph -s
# 查看osd状态
ceph osd status
#列出所有ceph服务
systemctl status ceph\*.service ceph\*.target
#启动所有服务的守护进程
systemctl start ceph.target
#停止所有服务的守护进程
systemctl stop ceph.target
#按照服务类型启动服务守护进程
systemctl start ceph-osd.target
systemctl start ceph-mon.target
systemctl start ceph-mds.target
If you encounter problems during the deployment process and want to roll back, you can use the following command. During my deployment this time, I rolled back twice, and the third time the deployment was finally successful.
#这步 部署的yum包也会删除
ceph-deploy purge ceph1 ceph2 ceph3
ceph-deploy forgetkeys
At this point, the basic deployment is over, but if it is officially used, the subsequent maintenance has just begun. If it is specific, it will continue to be updated later.