Elasticsearch学习总结(五)

ES的聚合

   我们还有一个需求需要完成:允许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY但是功能更强大。

   举个例子,让我们找到所有职员中最大的共同点(兴趣爱好)是什么:

   GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

暂时先忽略语法只看查询结果:

{
   "took": 229,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 1,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "first_name": "Lily",
               "last_name": "Smith",
               "age": 29,
               "about": "I like to go shopping!",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "3",
            "_score": 1,
            "_source": {
               "first_name": "Tom",
               "last_name": "Smith",
               "age": 18,
               "about": "I like to play basketball!",
               "interests": [
                  "music"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 3
            }
         ]
      }
   }
}

     我们可以看到两个职员对音乐有兴趣,一个喜欢林学,一个喜欢运动。这些数据并没有被预先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有名为"Tom"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "first_name": "Tom"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

all_interests聚合已经变成只包含和查询语句相匹配的文档了:

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "3",
            "_score": 0.30685282,
            "_source": {
               "first_name": "Tom",
               "last_name": "Smith",
               "age": 18,
               "about": "I like to play basketball!",
               "interests": [
                  "music"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 1
            }
         ]
      }
   }
}

 聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:

GET /megacorp/employee/_search
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}

"aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 3,
               "avg_age": {
                  "value": 26.333333333333332,
                  "value_as_string": "26.333333333333332"
               }
            }
         ]
      }
   }

    该聚合结果比之前的聚合结果要更加丰富。我们依然得到了兴趣以及数量(指具有该兴趣的员工人数)的列表,但是现在每个兴趣额外拥有avg_age字段来显示具有该兴趣员工的平均年龄。

即使你还不理解语法,但你也可以大概感觉到通过这个特性可以完成相当复杂的聚合工作,你可以处理任何类型的数据。

猜你喜欢

转载自mrchaixs.iteye.com/blog/2234101