MongoDB中的一个分组统计的查询(Group ... Distinct)

数据很简单,如下:

> db.t2.find();
{ "country" : "china", "province" : "sh", "userid" : "a" }
{  "country" : "china", "province" : "sh", "userid" : "b" }
{  "country" : "china", "province" : "sh", "userid" : "a" }
{  "country" : "china", "province" : "sh", "userid" : "c" }
{  "country" : "china", "province" : "bj", "userid" : "da" }
{  "country" : "china", "province" : "bj", "userid" : "fa" }


需求是统计出每个country/province下的userid的数量(同一个userid只统计一次)

过程如下。

首先试着这样来统计:

> db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "prov": "$province"} , "number":{$sum:1}} } ])

但是这样的结果是错误的:
{ "_id" : { "country" : "china", "prov" : "bj" }, "number" : 2 }
{ "_id" : { "country" : "china", "prov" : "sh" }, "number" : 4 }

原因是,这样来统计不能区分userid相同的情况 (上面的数据中sh有两个 userid = a)
为了解决这个问题,首先执行一个group,其id 是 country, province, userid三个field:

> db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ])
结果为

{ "_id" : { "country" : "china", "province" : "bj", "uid" : "fa" } }
{ "_id" : { "country" : "china", "province" : "bj", "uid" : "da" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "c" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "b" } }
{ "_id" : { "country" : "china", "province" : "sh", "uid" : "a" } }
可以看出,这步的目的是把相同的userid只剩下一个。

然后第二步,再第一步的结果之上再执行统计:

>db.t2.aggregate([ 
        { $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ,
{ $group: {"_id": { "country" : "$_id.country", "province": "$_id.province"  }, count : { $sum : 1 }  } }
])

这回就对了:
{ "_id" : { "country" : "china", "province" : "sh" }, "count" : 3 }
{ "_id" : { "country" : "china", "province" : "bj" }, "count" : 2 }

为了让结果好看点,加入一个$project操作符,把_id分开:
>db.t2.aggregate([ { $group: {"_id": { "country" : "$country", "province": "$province" , "uid" : "$userid" } } } ,
{ $group: {"_id": { "country" : "$_id.country", "province": "$_id.province"  }, count: { $sum : 1 }  } },
{ $project : {"_id": 0, "country" : "$_id.country", "province" : "$_id.province", "count" : 1}}
])
{ "count" : 3, "country" : "china", "province" : "sh" }
{ "count" : 2, "country" : "china", "province" : "bj" }

--------------------- 

原文:https://blog.csdn.net/lff0305/article/details/50034735 
 

猜你喜欢

转载自blog.csdn.net/zhuchunyan_aijia/article/details/86656787
今日推荐