Ceph14.2.22 cluster deployment and pitfall guide (pure intranet environment, no yum source, no time synchronization)

Table of contents

Preparation

Resources required:

Local yum source construction

Other preparations:

Turn off selinux

Modify host name and add host

Make sure time synchronization is turned on

yum installs related packages

Some ceph commands


Regarding the first deployment of ceph, before, I just created a new user and allocated space. I had no deployment of my own, and I had relatively little actual production and operation experience. At most, I had done a data migration from ceph to Alibaba Cloud OSS before.

Because it is an intranet environment, except for uploading some packages, it is basically isolated from the external network, and the operation interface is also very uncomfortable. I thought it would be done quickly, but it actually took almost an afternoon to get it done. , encountered some pitfalls, but fortunately it was solved before get off work on Friday.

This article is briefly recorded. If you have the same questions, you can send a private message directly.

♦️

Preparation

Preparation

In fact, a lot of preparation work was done. Because the new host was provided by another department, some basic initialization operations were not done, and the user only provided an ordinary user.

Resources required:

host list Deploy components
192.168.20.2 ceph-mon,ceph-mgr
192.168.20.3 ceph-mon,ceph-mgr,ceph-osd
192.168.20.4 ceph-mon,ceph-osd

In addition to host resources, there are also deployment packages. Because it is an intranet environment and cannot access the yum source of the external network, you need to download the corresponding package yourself, and then go to the server to deploy your own yum source. The involved packages, There are two main parts. One is that ceph-deploy needs to be deployed on the installation node, and the other is other packages of ceph.

When I download these rpm packages, I use the Alibaba Cloud source. In addition to downloading the rpm package, I also need to download the corresponding repodata, which contains some dependencies between rpm packages. If you build your own local source ,is needed.

I downloaded it using gitbash on my laptop. The following two commands can prepare the corresponding package:

#ceph rpm包for i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done# ceph repodatafor i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done

In addition, ceph-deploy preparation is another path, as follows:

You also need to prepare rpm packages and repodata packages. ceph-deploy uses 2.0.1. You can also directly check which packages are currently available on the server yum list|grep ceph-deploy

for i in `curl http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/ | awk -F '"' '{print $4}' | gerp '14.2.22|'grep rpm`;do curl -O http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64//$i ;done

Local yum source construction

I saw that many previous articles were forwarded using ng as a proxy. Here, local files are directly used as the yum source. Although it needs to be operated on each host, I don’t want to use ng anymore.

vim /etc/yum.repos.d/ceph.repo

[ceph]name=cephbaseurl=file:///data/mirrors/cephgpgcheck=0priority=1

You need to place the rpm package and repodata folder you just downloaded under this folder. After configuration, execute:

​​​​​​​

yum clean allyum makecacheyum repolist

No error is reported during execution. You can check yum list|grep ceph to see if there is a corresponding package. If so, there will be no problem. Three machines need to do the same operation, which is a little troublesome.

In addition, on the host where ceph-deploy is deployed, you also need to add the yum source of ceph-deploy. The steps are the same.

At this point, the yum source is basically established.

Other preparations :

Turn off the firewall. In the past, this step was basically unnecessary, because internal mirrors or some cloud hosts were basically turned off, so I didn't pay attention at first, but I encountered pitfalls later.

​​​​​​​

systemctl stop firewalldsystemctl disable firewalld

Turn off selinux​​​​​​​

#一般也是关闭的,没关的话下面两步都要做#重启后生效,防止主机异常重启sed -i 's/enforcing/disabled/' /etc/selinux/config  #即时生效setenforce 0  

Modify host name and add host

​​​​​​​

#修改主机名hostnamectl set-hostname  ceph1#添加hostcat >> /etc/hosts << EOF192.168.20.2 ceph1192.168.20.3 ceph2192.168.20.4 ceph3EOF

Make sure time synchronization is turned on

This thing is also a pitfall encountered later, causing the mon service to always be abnormal. Because it is an intranet environment and there is no internal time synchronization service, the master node is used as the server and the other two are used as clients. Although synchronization is configured at the beginning, But it didn't take effect. It is recommended to adjust the time, and then observe whether the time is synchronized and whether the configuration is successful.
​​​​​

#可以使用ntp或者chrony#安装chrony服务yum install chrony -y#主节点配置文件修改
vim /etc/chrony.confserver 192.168.20.2 iburstallow 192.168.20.0/24 local stratum 10#ceph2、ceph3配置server 192.168.20.2 iburst#三台重启chrongyd服务
systemctl restart chrongydsystemctl enable chronyd#其它命令chronyc -a makestepchronyc sourcestatschronyc sources -v

After executing the chronyc sources -v command, check whether the beginning of the returned result is ^*. If it is ^?, it means there is a problem with time synchronization and needs to be checked.

At this point, the basic preparation work is completed, and the complete deployment process will begin later.

♦️

Cluster deployment

yum installs related packages

ceph1 execution​​​​​​​

yum install python-setuptools ceph-deploy -y#注意查看 ceph-deploy 的版本  ceph-deploy --version

ceph1, ceph2, ceph3 execution

yum install -y ceph ceph-mon ceph-osd ceph-mds ceph-radosgw ceph-mgr

After the relevant packages are deployed, the cluster should be initialized first​​​​​​

mkdir /data/my-clustercd /data/my-clusterceph-deploy new  ceph1

Make sure there are no exceptions and errors during execution​​​​​​​

#mon服务的初始化ceph-deploy mon create-initial#将配置文件拷贝到对应的节点ceph-deploy admin ceph2 ceph3#如果想部署高可用的monitor ,可以将ceph2也加入mon集群ceph-deploy mon add ceph2# 复制ceph 相关文件,执行后可以使用ceph -scp ceph.client.admin.keyring /etc/ceph/

#Ensure a few pieces of information here. This is the condition to ensure deployment and continue moving forward.

1.ceph -s cluster is in HEALTH OK state

2. Is there a mon process?

3.Mon service checks whether it is running

The above 3 steps actually mean the same thing. If your health is ok, you don’t need to read the next two steps.

If it is not health ok, there will probably be two exceptions, one is a time synchronization exception error, and the other is a safe mode problem​​​​​​​

#时间同步异常报错clock skew detected on mon.node2#禁用不 安全模式 报错“mon is allowing insecure global_id reclaim”ceph config set mon auth_allow_insecure_global_id_reclaim false

After mon confirms its health, continue the deployment of mgr.

​​​​​​​

ceph-deploy mgr create ceph1#如果想部署高可用mgr,可以将ceph2、ceph3也添加进来ceph-deploy mgr create ceph2

After deployment, also confirm whether the status in ceph -s is normal and whether related processes and services are normal.

ps -ef|grep mgr process whether there is

The problem I encountered here is that there was no problem with the initialization, but the mgr process never came back to normal. I checked the service startup log and found that the directory permissions of mgr under /var are root. The service startup is the ceph user, and I need to chown to modify the permissions.

Then after restarting the service, the process started normally.

After mgr ends, continue to initialize the osd. Before this step, you need to prepare the disk in advance. It is best that it has not been formatted, otherwise you need to do a cleanup step.​​​​​​​

ceph-deploy disk zap ceph1 /dev/sda3#然后添加 osdceph-deploy osd create --data /dev/sda3 ceph2ceph-deploy osd create --data /dev/sda3 ceph3

After adding it, also confirm whether the process services are normal.

Some ceph commands

​​​​​​​

#检查集群状态命令ceph -s# 查看osd状态ceph osd status#列出所有ceph服务systemctl status ceph\*.service ceph\*.target#启动所有服务的守护进程systemctl start ceph.target#停止所有服务的守护进程systemctl stop ceph.target#按照服务类型启动服务守护进程systemctl start ceph-osd.targetsystemctl start ceph-mon.targetsystemctl start ceph-mds.target

If you encounter problems during the deployment process and want to roll back, you can use the following command. During my deployment this time, I rolled back twice, and the third time the deployment was finally successful.

#这步 部署的yum包也会删除ceph-deploy purge ceph1 ceph2 ceph3ceph-deploy forgetkeys

At this point, the basic deployment is over, but if it is officially used, the subsequent maintenance has just begun. If it is specific, it will continue to be updated later.

Guess you like

Origin blog.csdn.net/smallbird108/article/details/125963911