[Reprint] Linux common system failures: file space, but does not release the cause of deleted

Common Linux system failure: file space, but does not release the cause of deleted

https://os.51cto.com/art/201912/608713.htm

 

In general the situation after deleting the file space does not release does not appear, but there are exceptions, such as the file is locked process, or have been in the process of writing data to the file, etc., to understand this problem, we need to know the file under Linux storage mechanism and storage structures.

Author: Anonymous Source: Computer and Network Security | 2019-12-31 15:52

 

1. Symptom

Operation and maintenance of the monitoring system sent a notification, reporting a server space is full, the login server view, the root partition does not have the space, as shown in Fig.

Figure 1 View server disk space

Here first explain some of the deleted policy server, because Linux is not the Recycle Bin function, so the online server you want to delete all files will be moved to the system before the / tmp directory, and then periodically clear the data in the / tmp directory. The strategy itself is no problem, but through the inspection found that server system partition and no partition data in the / tmp partition, so / tmp alone actually taking up space root partition. Now find a problem, then delete the / tmp directory under the space of some large data files can be a maximum of three data files examined under / tmp, as shown in Fig.

See FIG. 2 the first three largest data file in / tmp

Command output file access_log found a 66GB size of the / tmp directory, this file should be generated by Apache access log files from the log size point of view, it should not be a long time to clean up the Apache log file, this file is the basic cause determination full of root space, after confirming that the file can be deleted, do the following deletions:

  1. [root@localhost ~]# rm /tmp/access_log 

The system then looks at the root partition space is released, as shown in FIG.

3 Check whether the release disk space

From the output you can see, the root partition space is still not released, how is this going?

2 Solutions

In general the situation after deleting the file space does not release does not appear, but there are exceptions, such as the file is locked process, or have been in the process of writing data to the file, etc., to understand this problem, we need to know the file under Linux storage mechanism and storage structures.

一个文件在文件系统中的存放分为两个部分:数据部分和指针部分,指针位于文件系统的meta-data中,在将数据删除后,这个指针就从meta-data中清除了,而数据部分存储在磁盘中。在将数据对应的指针从meta-data中清除后,文件数据部分占用的空间就可以被覆盖并写入新的内容,之所以在出现删除access_log文件后,空间还没释放,就是因为httpd进程还在一直向这个文件写入内容,导致虽然删除了access_log文件,但是由于进程锁定,文件对应的指针部分并未从meta-data中清除,而由于指针并未删除,系统内核就认为文件并未删除,因此通过df命令查询空间并未释放也就不足为奇了。

3、问题排查

既然有了解决问题的思路,那么接下来看看是否有进程一直在向access_log文件中写数据,这里需要用到Linux下的lsof命令,通过这个命令可以获取一个仍然被应用程序占用的已删除文件列表,命令执行如图4所示。

图4 查看被应用程序锁定的已删除文件列表

从输出结果可以看到,/tmp/access_log文件被进程httpd锁定,而httpd进程还一直向这个文件写入日志数据。从第7列可知,这个日志文件大小约70GB,而系统根分区总大小才100GB,由此可知,这个文件就是导致系统根分区空间耗尽的罪魁祸首。最后一列的“deleted”状态说明这个日志文件已经被删除,但由于进程还在一直向此文件写入数据,因此空间并未释放。

4、解决问题

到这里问题就基本排查清楚了,解决这一类问题的方法有很多种,最简单的方法是关闭或重启httpd进程,当然也可以重启操作系统,不过这些并不是最好的方法。对待这种进程不停对文件写日志的操作,要释放文件占用的磁盘空间,最好的方法是在线清空这个文件,具体可以通过如下命令完成:

  1. [root@localhost ~]# echo " " >/tmp/acess.log 

In this way, not only the disk space can be released immediately, as well as protecting the process to continue to write to the log files, this method is often used to clean up the online log files Apache, Tomcat, Nginx and other Web services generated.

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12127500.html