记一次sql优化,or查询优化与连接索引失效

优化一个sql,这个sql并不复杂,但执行了错误的执行计划,原始sql如下

EXPLAIN
SELECT
	t1.TRANSACTION_ID,
	t1.CLIENT_KEY,
	t1.RING_NAME,
	t1.CMS_RING_NAME,
	t1.NICK_NAME,
	T1.RING_TYPE,
	t1.TTS_CONTENT,
	T1.AUDIO_TYPE,
	T1.TTS_GENDER,
	T1.MSISDN,
	t1.TTS_BG_MUSIC_ID,
	T1.FILE_PATH,
	T1.CMS_FILE_PATH,
	T1.LISTEN_FILE_PATH,
	T1.CONTENT_ID,
	T1.COPYRIGHT_ID,
	T1.REAL_COPYRIGHT_ID,
	T1.STATE RING_STATE,
	T1.CREATE_TIME RING_CREATE_TIME,
	T2.ID,
	T2.PID,
	T2.TYPE STEP_TYPE,
	T2.PROVINCE_CODE,
	T2.LAST_TIME,
	T2.STATE STEP_STATE,
	T2.CREATE_TIME STEP_CREATE_TIME,
	T3.CMS_CALLBACK_ID
FROM
	AB_RING T1,
	ab_ring_step T2,
	AB_DEPLOY T3
WHERE
	T1.TRANSACTION_ID = T2.TRANSACTION_ID 
AND 
	(T2.PID = T3.STEP_ID OR T2.ID = T3.STEP_ID)
and t1.conteNT_ID='600912988981418430'
ORDER BY
	T2.CREATE_TIME DESC
LIMIT 0,
 10


查看执行计划


T3出现了全表扫描,会导致查询变慢,在测试数据中,各个表都只有几万条数据,耗时还不是很明显,但正式环境是上百万的数据量,这样的执行计划必然会导致慢查询。而T3表全表扫描的原因,很明显是这句话造成的
T2.PID = T3.STEP_ID OR T2.ID = T3.STEP_ID

在业务上,这句话的意思是将t2表和t3表关联,t2表如果是子流程,则将这个子流程的主流程id与t3表关联,如果t2表的数据是主流程,则直接将这个主流程与t3表关联,主流程对应的pid为‘0000000000000000000000’

根据常规优化,把or换成union all,于是sql变成了这样
EXPLAIN
SELECT
	t1.TRANSACTION_ID,
	t1.CLIENT_KEY,
	t1.RING_NAME,
	t1.CMS_RING_NAME,
	t1.NICK_NAME,
	T1.RING_TYPE,
	t1.TTS_CONTENT,
	T1.AUDIO_TYPE,
	T1.TTS_GENDER,
	T1.MSISDN,
	t1.TTS_BG_MUSIC_ID,
	T1.FILE_PATH,
	T1.CMS_FILE_PATH,
	T1.LISTEN_FILE_PATH,
	T1.CONTENT_ID,
	T1.COPYRIGHT_ID,
	T1.REAL_COPYRIGHT_ID,
	T1.STATE RING_STATE,
	T1.CREATE_TIME RING_CREATE_TIME,
	T2.ID,
	T2.PID,
	T2.TYPE STEP_TYPE,
	T2.PROVINCE_CODE,
	T2.LAST_TIME,
	T2.STATE STEP_STATE,
	T2.CREATE_TIME STEP_CREATE_TIME,
	T3.CMS_CALLBACK_ID
FROM
	AB_RING T1,
	ab_ring_step T2,
	AB_DEPLOY T3
WHERE
	T1.TRANSACTION_ID = T2.TRANSACTION_ID 
AND 
	T2.PID = T3.STEP_ID
AND 
  t1.conteNT_ID='600912988981418430'
UNION ALL
SELECT
	t1.TRANSACTION_ID,
	t1.CLIENT_KEY,
	t1.RING_NAME,
	t1.CMS_RING_NAME,
	t1.NICK_NAME,
	T1.RING_TYPE,
	t1.TTS_CONTENT,
	T1.AUDIO_TYPE,
	T1.TTS_GENDER,
	T1.MSISDN,
	t1.TTS_BG_MUSIC_ID,
	T1.FILE_PATH,
	T1.CMS_FILE_PATH,
	T1.LISTEN_FILE_PATH,
	T1.CONTENT_ID,
	T1.COPYRIGHT_ID,
	T1.REAL_COPYRIGHT_ID,
	T1.STATE RING_STATE,
	T1.CREATE_TIME RING_CREATE_TIME,
	T2.ID,
	T2.PID,
	T2.TYPE STEP_TYPE,
	T2.PROVINCE_CODE,
	T2.LAST_TIME,
	T2.STATE STEP_STATE,
	T2.CREATE_TIME STEP_CREATE_TIME,
	T3.CMS_CALLBACK_ID
FROM
	AB_RING T1,
	ab_ring_step T2,
	AB_DEPLOY T3
WHERE
	T1.TRANSACTION_ID = T2.TRANSACTION_ID 
AND 
	T2.ID = T3.STEP_ID
AND 
  t1.conteNT_ID='600912988981418430'
ORDER BY
	STEP_CREATE_TIME DESC
limit 0,10


再来看看执行计划


看执行计划的最后一行,因为有order by,两个结果集合并之后需要再排序一次。排序也是相当耗时的,甚至超过全表扫描,所以这样的执行计划也是不行的。

因为T3表的step_id是有索引的,如果是 驱动表.xx = T3.step_id,这样的写法肯定是没有问题的。所以,用其他办法去掉OR查询条件,于是有了如下的sql
EXPLAIN
SELECT
	t1.TRANSACTION_ID,
	t1.CLIENT_KEY,
	t1.RING_NAME,
	t1.CMS_RING_NAME,
	t1.NICK_NAME,
	T1.RING_TYPE,
	t1.TTS_CONTENT,
	T1.AUDIO_TYPE,
	T1.TTS_GENDER,
	T1.MSISDN,
	t1.TTS_BG_MUSIC_ID,
	T1.FILE_PATH,
	T1.CMS_FILE_PATH,
	T1.LISTEN_FILE_PATH,
	T1.CONTENT_ID,
	T1.COPYRIGHT_ID,
	T1.REAL_COPYRIGHT_ID,
	T1.STATE RING_STATE,
	T1.CREATE_TIME RING_CREATE_TIME,
	T2.ID,
	T2.PID,
	T2.TYPE STEP_TYPE,
	T2.PROVINCE_CODE,
	T2.LAST_TIME,
	T2.STATE STEP_STATE,
	T2.CREATE_TIME STEP_CREATE_TIME,
	T3.CMS_CALLBACK_ID
FROM
	AB_RING T1,
	(select *,
case pid
when pid='0000000000000000000000'
then ID
else pid
end xpid
 from ab_ring_step) T2  ,
	AB_DEPLOY T3
WHERE
	T1.TRANSACTION_ID = T2.TRANSACTION_ID
AND 
	T2.xPID = T3.STEP_ID
AND T1.CONTENT_ID = '600912988981418430'
ORDER BY
	T2.CREATE_TIME DESC
limit 0,10


再来看看执行计划


这个执行计划中,T3表方面没有什么问题了。但仍然有全表扫描,ab_ring_step表进行了全表扫描。接下来,我们优化ab_ring_step全表扫描的问题。

ab_ring_step全表扫描,会不会是因为把T2作为了一个子查询,数据放到临时表里面去了,临时表是没有索引的,所以进行了全表扫描。然而事实证明不是,为了简化这个问题,我们把sql改写为
EXPLAIN
select * from ab_ring t1,ab_ring_step t2 
where t2.transaction_id = t1.transaction_id 
and t1.content_id='600912988981418430'


再来看看执行计划


这回sql清晰明了,但ab_ring_step表仍然是全表扫描。有意思了,ab_ring_step表的transaction_id可是有索引的

连接条件为transaction_id的等值条件,而被驱动表ab_ring_step的transaction_id字段也是有索引,它居然走了全表扫描!

不相信,把sql反过来试试
EXPLAIN
select * from ab_ring t1,ab_ring_step t2 
where t1.transaction_id = t2.transaction_id
and t2.pid='D0A956EB2D614A47BF087FD989517ACE'


执行计划


看上去没什么问题,连接条件如果在被驱动表上有索引,确实是走的索引,没毛病。
那为什么当ab_ring_step表被驱动的时候没有走索引呢?

检查N久,终于发现




两个字段的字符集不一样!
所以解决的办法就是把它们的字符集弄成一样。

再来看一看



执行计划正常了。

在看看这个sql
EXPLAIN
SELECT
	t1.TRANSACTION_ID,
	t1.CLIENT_KEY,
	t1.RING_NAME,
	t1.CMS_RING_NAME,
	t1.NICK_NAME,
	T1.RING_TYPE,
	t1.TTS_CONTENT,
	T1.AUDIO_TYPE,
	T1.TTS_GENDER,
	T1.MSISDN,
	t1.TTS_BG_MUSIC_ID,
	T1.FILE_PATH,
	T1.CMS_FILE_PATH,
	T1.LISTEN_FILE_PATH,
	T1.CONTENT_ID,
	T1.COPYRIGHT_ID,
	T1.REAL_COPYRIGHT_ID,
	T1.STATE RING_STATE,
	T1.CREATE_TIME RING_CREATE_TIME,
	T2.ID,
	T2.PID,
	T2.TYPE STEP_TYPE,
	T2.PROVINCE_CODE,
	T2.LAST_TIME,
	T2.STATE STEP_STATE,
	T2.CREATE_TIME STEP_CREATE_TIME,
	T3.CMS_CALLBACK_ID
FROM
	AB_RING T1,
	(select *,
case pid
when pid='0000000000000000000000'
then ID
else pid
end xpid
 from ab_ring_step) T2  ,
	AB_DEPLOY T3
WHERE
	T1.TRANSACTION_ID = T2.TRANSACTION_ID
AND 
	T2.xPID = T3.STEP_ID
AND T1.CONTENT_ID = '600912988981418430'
ORDER BY
	T2.CREATE_TIME DESC
limit 0,10


执行计划


一切正常了。

那么为什么字符集不一样会导致索引失效呢?
参考这篇博客http://www.tuicool.com/articles/A7nM3yI


结论:
1.尽量避免使用or查询
2.数据库中字段的字符集一定要统一

猜你喜欢

转载自wsdtq123.iteye.com/blog/2376533