【弄nèng - Elasticsearch】DSL入门篇(十)—— 常用聚合

常用聚合

1. Missing Aggregation

返回缺省文档数量

如下事例返回没有price的文档数

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "products_without_postDate" : {
            "missing" : { "field" : "price" }
        }
    }
}

效果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "products_without_a_price": {
      "doc_count": 4
    }
  }
}

2. Range Aggregation

范围聚合,按范围区间聚合桶
注意,此聚合包括from值,而不包括每个范围的to值。

如下事例返回 -2.2,2.2-2.5 ,2.5- 三个桶。

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 2.2 },
                    { "from" : 2.2, "to" : 2.5 },
                    { "from" : 2.5 }
                ]
            }
        }
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price_ranges": {
      "buckets": [
        {
          "key": "*-2.2",
          "to": 2.2,
          "doc_count": 0
        },
        {
          "key": "2.2-2.5",
          "from": 2.2,
          "to": 2.5,
          "doc_count": 1
        },
        {
          "key": "2.5-*",
          "from": 2.5,
          "doc_count": 1
        }
      ]
    }
  }
}

3. Histogram Aggregation

Histogram 直方图聚合,根据区间分割桶,同date_histogram差不多

  • interval:间隔
  • min_doc_count: 最小文档数,如果设置成1,则为0的桶不显示
  • extended_bounds:设置返回桶的范围,Elasticsearch 默认只返回你的数据中最小值和最大值之间的 buckets。如果2.2桶没有数据,2.3有数据,则不设置extended_bounds的话只会从2.3的桶开始返回。

如下事例根据0.1间隔构建价格桶聚合

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 0.1,
                "min_doc_count" : 1, // 可选
                "extended_bounds": { // 可选
                  "min": 2.2,
                  "max": 2.5
              }
            }
        }
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "prices": {
      "buckets": [
        {
          "key": 2.2,
          "doc_count": 1
        },
        {
          "key": 2.5,
          "doc_count": 1
        }
      ]
    }
  }
}

3.1 排序

根据子聚合排序

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 0.1,
                "order" : { "avg1" : "asc" } 
            },
            "aggs" : {
                "avg1" : { 
                  "avg" : {
                    "field": "price"
                  } 
                } 
            }
        }
    }
}

根据KEY排序

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 0.1,
                "order" : { "_key" : "desc" }
            }
        }
    }
}

根据count排序就是改成_count

4. Terms Aggregation

基于聚合的多桶值源,其中桶是动态构建的——每个惟一的值一个桶。

如下事例根据name聚合

POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "nameAgg" : {
            "terms" : { "field" : "name" }
        }
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "nameAgg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "kimchy1",
          "doc_count": 2
        },
        {
          "key": "SIMA",
          "doc_count": 1
        },
        {
          "key": "SIMA1",
          "doc_count": 1
        },
        {
          "key": "SIMA5",
          "doc_count": 1
        },
        {
          "key": "SIMA6",
          "doc_count": 1
        }
      ]
    }
  }
}

5. Avg,Min,Max,Stats,Sum Aggregation

  • stats:统计 count max min avg sum
  • extended_stats:比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
POST twitter/tweet/_search
{
    "size": 0,
    "aggs" : {
        "avg_price" : { 
          "avg" : { 
            "field" : "price" 
          } 
        },
        "sum_price" : { 
          "sum" : { 
            "field" : "price" 
          } 
        },
        "min_price" : { 
          "min" : { 
            "field" : "price" 
          } 
        },
        "max_price" : { 
          "max" : { 
            "field" : "price" 
          } 
        },
        "stats_price" : { 
          "stats" : { 
            "field" : "price" 
          } 
        },
        "extended_stats_price" : { 
          "extended_stats" : { 
            "field" : "price" 
          } 
        }
        
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "max_price": {
      "value": 2.5
    },
    "min_price": {
      "value": 2.2
    },
    "extended_stats_price": {
      "count": 2,
      "min": 2.2,
      "max": 2.5,
      "avg": 2.35,
      "sum": 4.7,
      "sum_of_squares": 11.09,
      "variance": 0.022499999999999076,
      "std_deviation": 0.1499999999999969,
      "std_deviation_bounds": {
        "upper": 2.649999999999994,
        "lower": 2.050000000000006
      }
    },
    "avg_price": {
      "value": 2.35
    },
    "stats_price": {
      "count": 2,
      "min": 2.2,
      "max": 2.5,
      "avg": 2.35,
      "sum": 4.7
    },
    "sum_price": {
      "value": 4.7
    }
  }
}

6. Percentiles Aggregation

百分比聚合, 官网传送门
我的博文传送门

6.1 基础聚合

如下事例返回price百分比分布情况

POST schools/classes/_search
{
    "size": 0,
    "aggs" : {
        "per_price" : {
            "percentiles" : {
                "field" : "price",
                "percents" : [95, 99, 99.9] 
            }
        }
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "per_price": {
      "values": {
        "1.0": 1.2,
        "5.0": 1.2,
        "25.0": 1.45,
        "50.0": 2.2,
        "75.0": 2.2,
        "95.0": 2.3,
        "99.0": 2.4
      }
    }
  }
}

大多数价格在1.2~2.2,偶尔也会是2.3, 2.4

6.2 指定返回百分比

POST schools/classes/_search
{
    "size": 0,
    "aggs" : {
        "per_price" : {
            "percentiles" : {
                "field" : "price",
                "percents" : [95, 99, 99.9] 
            }
        }
    }
}

效果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 7,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "per_price": {
      "values": {
        "95.0": 2.27,
        "99.0": 2.2939999999999996,
        "99.9": 2.2994
      }
    }
  }
}

6.3 脚本

POST schools/classes/_search
{
    "size": 0,
    "aggs" : {
        "per_price" : {
            "percentiles" : {
                "field" : "price",
                "percents" : [95, 99, 99.9] 
            }
        }
    }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 7,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "load_time_outlier": {
      "values": {
        "1.0": 0.0012,
        "5.0": 0.0012,
        "25.0": 0.0017000000000000001,
        "50.0": 0.0022,
        "75.0": 0.0022,
        "95.0": 0.00227,
        "99.0": 0.002294
      }
    }
  }
}

7. Global aggregation和Filter aggregation

博文传送门

8. Top Hits Aggregation

博文传送门

9. Percentile_ranks aggregation

求每个值对应的百分位

统计postDate小于等于1574874999000和1574928999000的文档的占比,和第6项相反

POST twitter/tweet/_search
{
  "size" : 0,
  "aggs": {
    "agg_rank": {
      "percentile_ranks": {
        "field": "postDate",
        "values": [
          1574874999000,
          1574928999000
        ]
      }
    }
  }
}

效果

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "agg_rank": {
      "values": {
        "1.574874999E12": 49.99942403091834,
        "1.574928999E12": 100
      }
    }
  }
}

结果说明:postDate小于1574874999000的文档占比为49.99%,postDate小于1574928999000的文档占比为100%,

10. Nested aggregation

博文传送门


项目推荐

IT-CLOUD :IT服务管理平台,集成基础服务,中间件服务,监控告警服务等。
IT-CLOUD-ACTIVITI6 :Activiti教程源码。博文在本CSDN Activiti系列中。
IT-CLOUD-ELASTICSEARCH :elasticsearch教程源码。博文在本CSDN elasticsearch系列中。

开源项目,持续更新中,喜欢请 Star~

发布了160 篇原创文章 · 获赞 46 · 访问量 20万+

猜你喜欢

转载自blog.csdn.net/yy756127197/article/details/103315174