多表联合查询导致的数据重复问题

运营让我查询一个活动中参与的物品数,然后需要前60个评论+赞的人。

最开始我用子查询实现,巨慢无比:
select id,like_count,favorite_count from message_message where id in(select message_id from message_message_tags where messagetag_id=62101) and (category=9 or category=1) order by (like_count+favorite_count) desc 


原因是in子句的数据量大的话,性能就会非常低下。详见:

修改成多表查询:
 select t1.sender_id,t1.id,t1.like_count,t1.favorite_count,t1.add_datetime from message_message t1, message_message2tag t2 where t1.id = t2.message_id and t2.tag_id=120560 and t1.category in (9,1) and t1.add_datetime>'2012-06-29' and t1.add_datetime < '2012-07-03' order by (like_count+favorite_count) desc  limit 60 


性能解决了,但发现有重复记录:
+----------+------------+----------------+-----------+---------------------+
| id       | like_count | favorite_count | sender_id | add_datetime        |
+----------+------------+----------------+-----------+---------------------+
| 33850491 |          2 |            220 |    471956 | 2012-06-29 20:42:47 |
| 33778187 |         14 |            167 |    122216 | 2012-06-29 12:24:15 |
| 33779339 |          8 |            165 |    122216 | 2012-06-29 12:33:57 |
| 34068745 |          0 |            112 |      3054 | 2012-07-01 15:00:26 |
| 33782239 |          4 |             94 |    116633 | 2012-06-29 12:58:23 |
| 33815718 |          2 |             92 |      4287 | 2012-06-29 17:10:13 |
| 33905850 |          3 |             87 |     85131 | 2012-06-30 10:36:39 |
| 33827272 |          1 |             75 |    171462 | 2012-06-29 18:21:31 |
| 33778798 |          7 |             69 |    122216 | 2012-06-29 12:29:38 |
| 33780283 |          8 |             64 |    122216 | 2012-06-29 12:42:15 |
| 33816638 |          0 |             59 |      4287 | 2012-06-29 17:16:16 |

122216一个会员发了多条记录,看来需要做distinct,直接把distinct加在字段前面是没有效果,可以通过结合子查询来和group来做,最终版:

select * from (select t1.sender_id,t1.id,t1.like_count,t1.favorite_count,t1.add_datetime from message_message t1, message_message2tag t2 where t1.id = t2.message_id and t2.tag_id=120560 and t1.category in (9,1) and t1.add_datetime>'2012-06-29' and t1.add_datetime < '2012-07-03' order by (like_count+favorite_count) desc limit 60) as t group by t.sender_id order by (like_count+favorite_count) desc;



这个人提到说通过
 select *, count(distinct name) from table group by name 
来实现
http://hi.baidu.com/liveinyc/blog/item/facac543a2dc260b9213c658.html



猜你喜欢

转载自san-yun.iteye.com/blog/1576595