Elasticsearch使用篇 - 组合聚合

Composite aggregation

[kəmˈpɑːzət],组合聚合。属于分桶聚合。

基于不同的源(source)来创建组合聚合(composite aggregation)桶。可以对多级的聚合的结果进行分页。该聚合方式提供了一种方式来流化某种聚合的所有桶,类似于文档的滚动(scroll)。

组合聚合目前不兼容 pipeline aggregation。

组合聚合基于文档的值来创建一个组合,每个组合可以看作是一个组合桶。

比如,文档的内容如下:

{
    
    
  "keyword": ["foo", "bar"],
  "number": [23, 65, 76]
}

通过使用组合聚合的方式,会产生如下几种组合桶。

{
    
     "keyword": "foo", "number": 23 }
{
    
     "keyword": "foo", "number": 65 }
{
    
     "keyword": "foo", "number": 76 }
{
    
     "keyword": "bar", "number": 23 }
{
    
     "keyword": "bar", "number": 65 }
{
    
     "keyword": "bar", "number": 76 }
  • sources:定义聚合源的列表。每个聚合源的名称需要唯一。

  • missing_bucket :默认 false,即如果某个聚合源的结果为空,则整体的组合聚合的结果会输出 []。如果设置 true,只有结果为空的聚合源输出 null,其它聚合源正常输出。

  • size:限制组合聚合的结果输出多少条数据。默认 10。

  • after:设置当前页的起点,即上一页的最后一条数据。

聚合源

terms、histogram、date_histogram、geotile_grid 四种聚合可以作为聚合源。

terms聚合作为聚合源

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_FlightTimeMin": {
    
    
      "terms": {
    
    
        "field": "FlightTimeMin"
      }
    }
  }
}

这种方式等价于直接使用 terms 聚合。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "runtime_mappings": {
    
    
    "FlightTimeMinChanged": {
    
    
      "type": "double",
      "script": {
    
    
        "source": """
          emit(doc['FlightTimeMin'].value / 10)
        """
      }
    }
  },
  "aggs": {
    
    
    "composite_FlightTimeMinChanged": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "terms_FlightTimeMinChanged": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMinChanged"
              }
            }
          }
        ]
      }
    }
  }
}

支持运行时字段来创建组合桶。

histogram聚合作为聚合源

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_FlightTimeMin": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "histogram_FlightTimeMin": {
    
    
              "histogram": {
    
    
                "field": "FlightTimeMin",
                "interval": 10
              }
            }
          }
        ]
      }
    }
  }
}

date_histogram聚合作为聚合源

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd"
              }
            }
          }
        ]
      }
    }
  }
}

多种聚合源组合在一起

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp_FlightTimeMin": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd"
              }
            }
          },
          {
    
    
            "terms_FlightTimeMin": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMin"
              }
            }
          }
        ]
      }
    }
  }
}

不同聚合源分别指定排序规则

先按照第一个聚合源进行排序,然后第二个。。以此类推。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp_FlightTimeMin": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd",
                "order": "desc"
              }
            }
          },
          {
    
    
            "terms_FlightTimeMin": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMin",
                "order": "asc"
              }
            }
          }
        ]
      }
    }
  }
}

组合聚合与子聚合之间的对比

首先使用组合聚合的方式,按照 OriginCountry、DestCountry 两个字段进行词项聚合。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_OriginCountry_DestCountry": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "terms_OriginCountry": {
    
    
              "terms": {
    
    
                "field": "OriginCountry"
              }
            }
          },
          {
    
    
            "terms_DestCountry": {
    
    
              "terms": {
    
    
                "field": "DestCountry"
              }
            }
          }
        ]
      }
    }
  }
}

聚合结果如下:

"aggregations" : {
    
    
    "composite_OriginCountry_DestCountry" : {
    
    
      "after_key" : {
    
    
        "terms_OriginCountry" : "AE",
        "terms_DestCountry" : "CA"
      },
      "buckets" : [
        {
    
    
          "key" : {
    
    
            "terms_OriginCountry" : "AE",
            "terms_DestCountry" : "AE"
          },
          "doc_count" : 9
        },
        {
    
    
          "key" : {
    
    
            "terms_OriginCountry" : "AE",
            "terms_DestCountry" : "AR"
          },
          "doc_count" : 10
        },
        。。。。。。

作为对比,我们再使用 terms 子聚合的方式。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "terms_OriginCountry": {
    
    
      "terms": {
    
    
        "field": "OriginCountry"
      },
      "aggs": {
    
    
        "terms_DestCountry": {
    
    
          "terms": {
    
    
            "field": "DestCountry"
          }
        }
      }
    }
  }
}

聚合结果如下:

"aggregations" : {
    
    
    "terms_OriginCountry" : {
    
    
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 4114,
      "buckets" : [
        {
    
    
          "key" : "IT",
          "doc_count" : 2278,
          "terms_DestCountry" : {
    
    
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 513,
            "buckets" : [
              {
    
    
                "key" : "IT",
                "doc_count" : 459
              },
              {
    
    
                "key" : "US",
                "doc_count" : 328
              },
              {
    
    
                "key" : "CN",
                "doc_count" : 195
              },
              {
    
    
                "key" : "CA",
                "doc_count" : 192
              },

missing_bucket参数

在第二个聚合源中,我们指定一个不存在的字段 FlightTimeMin2。通过修改 missing_bucket 参数的值,对比它的作用。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp_FlightTimeMin": {
    
    
      "composite": {
    
    
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd",
                "order": "desc"
              }
            }
          },
          {
    
    
            "terms_FlightTimeMin": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMin2",
                "order": "asc",
                "missing_bucket": false
              }
            }
          }
        ]
      }
    }
  }
}

after参数

从上一页的 after_key 中,可以得到最后一条数据的内容。

"after_key" : {
    
    
  "date_histogram_timestamp" : "2022-08-28",
  "terms_FlightTimeMin" : 32.9625244140625
}

接下来将 after 参数的内容修改为上述 after_key 的内容,也就是基于上一页来展示下一页的数据内容。

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp_FlightTimeMin": {
    
    
      "composite": {
    
    
        "size": 5, 
        "after": {
    
    
          "date_histogram_timestamp" : "2022-08-28",
          "terms_FlightTimeMin" : 13.010112762451172
        }, 
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd"
              }
            }
          },
          {
    
    
            "terms_FlightTimeMin": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMin",
                "missing_bucket": true
              }
            }
          }
        ]
      }
    }
  }
}

支持嵌入子聚合

GET kibana_sample_data_flights/_search
{
    
    
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    
    
    "composite_timestamp_FlightTimeMin": {
    
    
      "composite": {
    
    
        "size": 2, 
        "after": {
    
    
          "date_histogram_timestamp" : "2022-08-28",
          "terms_FlightTimeMin" : 13.010112762451172
        }, 
        "sources": [
          {
    
    
            "date_histogram_timestamp": {
    
    
              "date_histogram": {
    
    
                "field": "timestamp",
                "calendar_interval": "1d",
                "format": "yyyy-MM-dd"
              }
            }
          },
          {
    
    
            "terms_FlightTimeMin": {
    
    
              "terms": {
    
    
                "field": "FlightTimeMin",
                "missing_bucket": true
              }
            }
          }
        ]
      },
      "aggs": {
    
    
        "stats_FlightTimeMin": {
    
    
          "stats": {
    
    
            "field": "FlightTimeMin"
          }
        }
      }
    }
  }
}

聚合结果输出如下:

"aggregations" : {
    
    
    "composite_timestamp_FlightTimeMin" : {
    
    
      "after_key" : {
    
    
        "date_histogram_timestamp" : "2022-08-28",
        "terms_FlightTimeMin" : 17.2014217376709
      },
      "buckets" : [
        {
    
    
          "key" : {
    
    
            "date_histogram_timestamp" : "2022-08-28",
            "terms_FlightTimeMin" : 16.21676254272461
          },
          "doc_count" : 1,
          "stats_FlightTimeMin" : {
    
    
            "count" : 1,
            "min" : 16.21676254272461,
            "max" : 16.21676254272461,
            "avg" : 16.21676254272461,
            "sum" : 16.21676254272461
          }
        },
        {
    
    
          "key" : {
    
    
            "date_histogram_timestamp" : "2022-08-28",
            "terms_FlightTimeMin" : 17.2014217376709
          },
          "doc_count" : 1,
          "stats_FlightTimeMin" : {
    
    
            "count" : 1,
            "min" : 17.2014217376709,
            "max" : 17.2014217376709,
            "avg" : 17.2014217376709,
            "sum" : 17.2014217376709
          }
        }
      ]
    }
  }

猜你喜欢

转载自blog.csdn.net/qq_34561892/article/details/129479453