EXPLAIN SELECT t1.TRANSACTION_ID, t1.CLIENT_KEY, t1.RING_NAME, t1.CMS_RING_NAME, t1.NICK_NAME, T1.RING_TYPE, t1.TTS_CONTENT, T1.AUDIO_TYPE, T1.TTS_GENDER, T1.MSISDN, t1.TTS_BG_MUSIC_ID, T1.FILE_PATH, T1.CMS_FILE_PATH, T1.LISTEN_FILE_PATH, T1.CONTENT_ID, T1.COPYRIGHT_ID, T1.REAL_COPYRIGHT_ID, T1.STATE RING_STATE, T1.CREATE_TIME RING_CREATE_TIME, T2.ID, T2.PID, T2.TYPE STEP_TYPE, T2.PROVINCE_CODE, T2.LAST_TIME, T2.STATE STEP_STATE, T2.CREATE_TIME STEP_CREATE_TIME, T3.CMS_CALLBACK_ID FROM AB_RING T1, ab_ring_step T2, AB_DEPLOY T3 WHERE T1.TRANSACTION_ID = T2.TRANSACTION_ID AND (T2.PID = T3.STEP_ID OR T2.ID = T3.STEP_ID) and t1.conteNT_ID='600912988981418430' ORDER BY T2.CREATE_TIME DESC LIMIT 0, 10
查看执行计划
T3出现了全表扫描,会导致查询变慢,在测试数据中,各个表都只有几万条数据,耗时还不是很明显,但正式环境是上百万的数据量,这样的执行计划必然会导致慢查询。而T3表全表扫描的原因,很明显是这句话造成的
T2.PID = T3.STEP_ID OR T2.ID = T3.STEP_ID
在业务上,这句话的意思是将t2表和t3表关联,t2表如果是子流程,则将这个子流程的主流程id与t3表关联,如果t2表的数据是主流程,则直接将这个主流程与t3表关联,主流程对应的pid为‘0000000000000000000000’
根据常规优化,把or换成union all,于是sql变成了这样
EXPLAIN SELECT t1.TRANSACTION_ID, t1.CLIENT_KEY, t1.RING_NAME, t1.CMS_RING_NAME, t1.NICK_NAME, T1.RING_TYPE, t1.TTS_CONTENT, T1.AUDIO_TYPE, T1.TTS_GENDER, T1.MSISDN, t1.TTS_BG_MUSIC_ID, T1.FILE_PATH, T1.CMS_FILE_PATH, T1.LISTEN_FILE_PATH, T1.CONTENT_ID, T1.COPYRIGHT_ID, T1.REAL_COPYRIGHT_ID, T1.STATE RING_STATE, T1.CREATE_TIME RING_CREATE_TIME, T2.ID, T2.PID, T2.TYPE STEP_TYPE, T2.PROVINCE_CODE, T2.LAST_TIME, T2.STATE STEP_STATE, T2.CREATE_TIME STEP_CREATE_TIME, T3.CMS_CALLBACK_ID FROM AB_RING T1, ab_ring_step T2, AB_DEPLOY T3 WHERE T1.TRANSACTION_ID = T2.TRANSACTION_ID AND T2.PID = T3.STEP_ID AND t1.conteNT_ID='600912988981418430' UNION ALL SELECT t1.TRANSACTION_ID, t1.CLIENT_KEY, t1.RING_NAME, t1.CMS_RING_NAME, t1.NICK_NAME, T1.RING_TYPE, t1.TTS_CONTENT, T1.AUDIO_TYPE, T1.TTS_GENDER, T1.MSISDN, t1.TTS_BG_MUSIC_ID, T1.FILE_PATH, T1.CMS_FILE_PATH, T1.LISTEN_FILE_PATH, T1.CONTENT_ID, T1.COPYRIGHT_ID, T1.REAL_COPYRIGHT_ID, T1.STATE RING_STATE, T1.CREATE_TIME RING_CREATE_TIME, T2.ID, T2.PID, T2.TYPE STEP_TYPE, T2.PROVINCE_CODE, T2.LAST_TIME, T2.STATE STEP_STATE, T2.CREATE_TIME STEP_CREATE_TIME, T3.CMS_CALLBACK_ID FROM AB_RING T1, ab_ring_step T2, AB_DEPLOY T3 WHERE T1.TRANSACTION_ID = T2.TRANSACTION_ID AND T2.ID = T3.STEP_ID AND t1.conteNT_ID='600912988981418430' ORDER BY STEP_CREATE_TIME DESC limit 0,10
再来看看执行计划
看执行计划的最后一行,因为有order by,两个结果集合并之后需要再排序一次。排序也是相当耗时的,甚至超过全表扫描,所以这样的执行计划也是不行的。
因为T3表的step_id是有索引的,如果是 驱动表.xx = T3.step_id,这样的写法肯定是没有问题的。所以,用其他办法去掉OR查询条件,于是有了如下的sql
EXPLAIN SELECT t1.TRANSACTION_ID, t1.CLIENT_KEY, t1.RING_NAME, t1.CMS_RING_NAME, t1.NICK_NAME, T1.RING_TYPE, t1.TTS_CONTENT, T1.AUDIO_TYPE, T1.TTS_GENDER, T1.MSISDN, t1.TTS_BG_MUSIC_ID, T1.FILE_PATH, T1.CMS_FILE_PATH, T1.LISTEN_FILE_PATH, T1.CONTENT_ID, T1.COPYRIGHT_ID, T1.REAL_COPYRIGHT_ID, T1.STATE RING_STATE, T1.CREATE_TIME RING_CREATE_TIME, T2.ID, T2.PID, T2.TYPE STEP_TYPE, T2.PROVINCE_CODE, T2.LAST_TIME, T2.STATE STEP_STATE, T2.CREATE_TIME STEP_CREATE_TIME, T3.CMS_CALLBACK_ID FROM AB_RING T1, (select *, case pid when pid='0000000000000000000000' then ID else pid end xpid from ab_ring_step) T2 , AB_DEPLOY T3 WHERE T1.TRANSACTION_ID = T2.TRANSACTION_ID AND T2.xPID = T3.STEP_ID AND T1.CONTENT_ID = '600912988981418430' ORDER BY T2.CREATE_TIME DESC limit 0,10
再来看看执行计划
这个执行计划中,T3表方面没有什么问题了。但仍然有全表扫描,ab_ring_step表进行了全表扫描。接下来,我们优化ab_ring_step全表扫描的问题。
ab_ring_step全表扫描,会不会是因为把T2作为了一个子查询,数据放到临时表里面去了,临时表是没有索引的,所以进行了全表扫描。然而事实证明不是,为了简化这个问题,我们把sql改写为
EXPLAIN select * from ab_ring t1,ab_ring_step t2 where t2.transaction_id = t1.transaction_id and t1.content_id='600912988981418430'
再来看看执行计划
这回sql清晰明了,但ab_ring_step表仍然是全表扫描。有意思了,ab_ring_step表的transaction_id可是有索引的
连接条件为transaction_id的等值条件,而被驱动表ab_ring_step的transaction_id字段也是有索引,它居然走了全表扫描!
不相信,把sql反过来试试
EXPLAIN select * from ab_ring t1,ab_ring_step t2 where t1.transaction_id = t2.transaction_id and t2.pid='D0A956EB2D614A47BF087FD989517ACE'
执行计划
看上去没什么问题,连接条件如果在被驱动表上有索引,确实是走的索引,没毛病。
那为什么当ab_ring_step表被驱动的时候没有走索引呢?
检查N久,终于发现
两个字段的字符集不一样!
所以解决的办法就是把它们的字符集弄成一样。
再来看一看
执行计划正常了。
在看看这个sql
EXPLAIN SELECT t1.TRANSACTION_ID, t1.CLIENT_KEY, t1.RING_NAME, t1.CMS_RING_NAME, t1.NICK_NAME, T1.RING_TYPE, t1.TTS_CONTENT, T1.AUDIO_TYPE, T1.TTS_GENDER, T1.MSISDN, t1.TTS_BG_MUSIC_ID, T1.FILE_PATH, T1.CMS_FILE_PATH, T1.LISTEN_FILE_PATH, T1.CONTENT_ID, T1.COPYRIGHT_ID, T1.REAL_COPYRIGHT_ID, T1.STATE RING_STATE, T1.CREATE_TIME RING_CREATE_TIME, T2.ID, T2.PID, T2.TYPE STEP_TYPE, T2.PROVINCE_CODE, T2.LAST_TIME, T2.STATE STEP_STATE, T2.CREATE_TIME STEP_CREATE_TIME, T3.CMS_CALLBACK_ID FROM AB_RING T1, (select *, case pid when pid='0000000000000000000000' then ID else pid end xpid from ab_ring_step) T2 , AB_DEPLOY T3 WHERE T1.TRANSACTION_ID = T2.TRANSACTION_ID AND T2.xPID = T3.STEP_ID AND T1.CONTENT_ID = '600912988981418430' ORDER BY T2.CREATE_TIME DESC limit 0,10
执行计划
一切正常了。
那么为什么字符集不一样会导致索引失效呢?
参考这篇博客http://www.tuicool.com/articles/A7nM3yI
结论:
1.尽量避免使用or查询
2.数据库中字段的字符集一定要统一