Incorrect key file for table '/tmp/#sql_1df6_8.MYI'; try to repair it

遇到MySQL硬盘的问题,解决方法来自:http://blog.itpub.net/26355921/viewspace-1273862/
摘抄如下:
 突然接到开发打来电话说MySQL数据库执行SQL报错,让排查一下原因。

登陆MySQL服务器看了一眼错误日志:
        140904 12:06:20 [ERROR] /usr/local/mysql5.5/bin/mysqld: Incorrect key file for table '/tmp/#sql_5608_1.MYI'; try to repair it
        140904 12:06:20 [ERROR] Got an error from unknown thread, /tmp/mysql5.5_install/tmp/mysql-5.5.30/storage/myisam/mi_write.c:223


看到/tmp下的问题,先看一眼tmpdir:


mysql> show variables like 'tmpdir';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| tmpdir        | /tmp  |
+---------------+-------+
1 row in set (0.00 sec)


        说/tmp下有表需要修复,通过tmpdir可知,是临时排序表所致。在网上查了一下,说产生这个问题是由于tmpdir目录下空间不足。
查看一下磁盘空间:
[root@localhost data]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   15G   31G  33% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


/tmp属于/ 根目录下,空间还有31G,觉得不可能。


        查看慢日志,找到一条很可疑的语句,拿50万的数据来排序,还有的语句拿200多万的数据来排序,而且从时间上也与开发说的时间很吻合:
# Time: 140904 16:13:50
# User@Host: staff_reader[staff_reader] @  [10.35.15.68]
# Query_time: 133.653564  Lock_time: 0.000186 Rows_sent: 5384178  Rows_examined: 20886933
use hotel_inventory;
SET timestamp=1409818430;
SELECT col1,col2,col3....,col7
  FROM  table1 nolock WHERE
 available_time >= '2014-09-03' AND available_time <=  DATE_ADD('2014-09-03', interval 91 day)
 order by hotel_id,room_type_id,available_time;


看一下该语句访问的表的具体情况:
 
表行数为20886994:
mysql> select count(*) from table1;
+----------+
| count(*) |
+----------+
| 20886994 |
+----------+


表大小为7G: 
 [root@localhost]# du -sch *|grep sum_inv_room.ibd
6.9G    sum_inv_room.ibd


该机器上有4个实例:
[[email protected] hotel_inventory]# ps -ef|grep mysqld_safe
root     21185     1  0  2013 ?        00:00:00 /bin/sh /usr/local/mysql5.5/bin/mysqld_safe --defaults-file=/data/mysql6025/etc/my6025.cnf
root     22525     1  0  2013 ?        00:00:00 /bin/sh /usr/local/mysql5.5/bin/mysqld_safe --defaults-file=/data/mysql6027/etc/my6027.cnf
root     24275     1  0  2013 ?        00:00:00 /bin/sh /usr/local/mysql5.5/bin/mysqld_safe --defaults-file=/data/mysql6024/etc/my6024.cnf
root     37864     1  0  2013 ?        00:00:00 /bin/sh /usr/local/mysql5.5/bin/mysqld_safe --defaults-file=/data/mysql6026/etc/my6026.cnf
root     60463  5879  0 16:18 pts/6    00:00:00 grep mysqld_safe


总大小有30G左右,都是同时抽取数据:
[[email protected] hotel_inventory]# du -sch /data/mysql602*/data/hotel_inventory/* |grep sum_inv_room.ibd
6.9G    /data/mysql6024/data/hotel_inventory/sum_inv_room.ibd
7.1G    /data/mysql6025/data/hotel_inventory/sum_inv_room.ibd
7.9G    /data/mysql6026/data/hotel_inventory/sum_inv_room.ibd
6.5G    /data/mysql6027/data/hotel_inventory/sum_inv_room.ibd


        既然看到这些现象,那么我肯定就是这条语句同时执行所致。这时候给开发反馈,而开发说[staff_reader] @  [10.35.15.68] 这个用户不是他们那边的。没办法了,继续找。我就手动那上面那条语句去执行,发现根本没产生多少临时文件。这时候可以断定不是这个语句造成的了。
所以这个时候只能等问题再次重现。看了一下磁盘空间,正常情况下,/ 根目录可用空间是31G,然后我就写了个脚本一直监测可用空间是否是31G,如果不是,就给我QQ邮箱发邮件。


过了一会,果然收到邮件了,这时候上服务器看磁盘空间,在持续增长:




查看磁盘空间情况:
[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   30G   17G  65% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   33G   14G  71% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   35G   11G  77% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] tmp]# du -sch *
88K     mha4mysql-node-0.53
2.1G    #sql_5608_0.MYD
4.0K    #sql_5608_0.MYI
3.6G    #sql_5608_1.MYD
4.0K    #sql_5608_1.MYI
1.9G    #sql_5b44_0.MYD
4.0K    #sql_5b44_0.MYI
4.9G    #sql_5b44_1.MYD
4.0K    #sql_5b44_1.MYI
2.0G    #sql_621a_0.MYD
4.0K    #sql_621a_0.MYI
749M    #sql_621a_1.MYD
4.0K    #sql_621a_1.MYI
2.2G    #sql_9753_0.MYD
4.0K    #sql_9753_0.MYI
4.2G    #sql_9753_1.MYD
4.0K    #sql_9753_1.MYI
22G     total


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   37G  9.5G  80% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   41G  5.2G  89% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   46G   29M 100% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data
[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   46G   22M 100% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data




[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   46G     0 100% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


空间满的同时,看到/data/mysql6025/log/mysqld.err中又出现错误:
140904 16:03:59 [ERROR] /usr/local/mysql5.5/bin/mysqld: Incorrect key file for table '/tmp/#sql_5608_1.MYI'; try to repair it
140904 16:03:59 [ERROR] Got an error from unknown thread, /tmp/mysql5.5_install/tmp/mysql-5.5.30/storage/myisam/mi_write.c:223


通过慢日志来看,又出现了上面那条很可疑的语句。可是之前已经排除了其的可能性,那就只能继续找,发现一条很简短的语句,看起来没什么,但是执行时间很长:
SELECT MIN(sum_inv_room_id), MAX(sum_inv_room_id) FROM (select s.*, 20140909 as dw_insert_dt_wid from sum_inv_room s WHERE  (1 = 1) ) AS t1;


出错的这段时间,基本上就这两条SQL在跑。




报错之后,磁盘空间逐渐恢复:


........................................................


[[email protected] tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   38G  8.6G  82% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


[[email protected] log]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              49G   15G   31G  33% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             985M   44M  891M   5% /boot
/dev/sda4             219G  6.5G  201G   4% /opt
/dev/sdb1             1.4T  176G  1.2T  14% /data


最后空间使用率恢复正常。




于是手动拿这个语句执行了一下:
SELECT MIN(col1), MAX(col1) FROM (select s.*, 20140909 as dw_insert_dt_wid from table1 s WHERE  (1 = 1) ) AS t1;


果然看到磁盘空间立马飙升,找到原因所在。单条语句执行,看到其产生的临时文件最大到了10多G。一共31G的空间,4个实例,都产生10多G,当然扛不住。


(至于具体为什么这样的语句会产生这么大的临时文件,本人对其中的原理不是太理解,先做此记录。)


经过与开发沟通,开发说那个是定时抽取数据。而且机器上的4个实例是同时开始的。


所以给出以下解决办法:
1.将MySQL中参数tmpdir设置到大的目录下去,但是这个需要参数是read_only变量,需要重启数据库
2.将这4个语句串行执行
3.修改抽取数据的SQL


最后开发选择了第2种方法,将定点抽取数据的SQL串行执行,问题暂时得到解决,至少半个多月都没再出现这个问题了。


猜你喜欢

转载自stevenfeng.iteye.com/blog/2187544