版权声明:本文为博主原创文章,未经博主允许不得转载。交流请联系:351605040 https://blog.csdn.net/Arvinzr/article/details/79231456
对于平时的group by 我们都知道使用 AggregationBuilder 可以解决
AggregationBuilder aggregationBuilder =
AggregationBuilders.terms("nameAgg").field("name.keyword").size(Integer.MAX_VALUE) //1
.subAggregation(AggregationBuilders.terms("jobAgg").field("job.keyword").size(Integer.MAX_VALUE) //2
.subAggregation(AggregationBuilders.avg("ageAgg").field("age")) //3
.subAggregation(AggregationBuilders.count("totalNum").field("name.keyword"))); //4
searchSourceBuilder.aggregation(aggregationBuilder);
以上代码的解释如下
(1) 首先按照name分组,terms括号里面是聚合名字,随便起,field为聚合的字段名;之所以加了.keyword是因为不加聚合的时候会报fielddata属性没有设置为true;
{
"error": {
"root_cause": [{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default.Set fielddata = true on[name] in order to load fielddata in memory by uninverting the inverted index.Note that this can however use significant memory.Alternatively use a keyword field instead."
}],
"type ": "search_phase_execution_exception ",
"reason ": " all shards failed ",
"phase ": " query ",
"grouped": true,
"failed_shards": [{
"shard": 0,
"index": "school",
"node": "H7VIRoOwS8mws78T-0Ce-Q",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set
fielddata = true on[name] in order to load fielddata in memory by uninverting the inverted index.Note that
this can however use significant memory.Alternatively use a keyword field instead.
"}}]},"
status ":400}
因为映射模板将string类型的字段存进elasticsearch时,一个字符串字段有两个类型,一个text类型,分词类型;一个keyword类型,不分词类型;所以加上.keyword就可以正常聚合了,对于es2.x版本有可能不分词的类型为.raw;注意甄别。后面size参数默认为10,貌似是最多聚合10个,我肯定想要聚合全部数据,就填最大值
扫描二维码关注公众号,回复:
3685270 查看本文章
(2) 紧接着在name分组的基础上按job分组,属于nameAgg聚合的子聚合,后面的都属于前面的子聚合
(3) 分组完,紧接着统计各组平均年龄,由于年龄属于long类型,不用加.keyword,从这里以后都要注意括号的位置,.subAggregation跟在谁的后面一定要搞清楚,搞混淆结果会不一样
(4) 实际上这一步不需要,因为elasticsearch在分组聚合完自动会计算当前分组下有多少doc_count
但是有时候会涉及到一些子查询(嵌套查询)
SumBucketPipelineAggregationBuilder pipelineAggregationBuilder =
PipelineAggregatorBuilders.sumBucket("countTotalNum","jobAgg>totalNum");//第二个参数为聚合路径
//如果有需要管道聚合,可以在上面的分组上继续.subAggregation(pipelineAggregationBuilder);
接下来开始将查询和聚合条件放入search中:
String query = searchSourceBuilder.toString();
Search search = new Search.Builder(query).addIndex("school").addType("student").build();
SearchResult result = client.execute(search);
接下来就是取结果,最主要是聚合怎么取
//首先取最外层的聚合,拿到桶
List<TermsAggregation.Entry> nameAgg =
result.getAggregations().getTermsAggregation("nameAgg").getBuckets();
//循环每一个桶,拿到里面的聚合,再拿桶
for (TermsAggregation.Entry entry : nameAgg) {
List<TermsAggregation.Entry> jobAgg = entry.getTermsAggregation("jobAgg").getBuckets();
//循环每一个桶,拿到里面的聚合,再拿桶
for (TermsAggregation.Entry jobEntry : jobAgg) {
//取到每个分组里的平均年龄
long avgAge = jobEntry.getAvgAggregation("ageAgg").getAvg();
//其实这里已经能获取doc_count了,所以聚合计算总数那一步可以省略
long count = jobEntry.getCount();
........
//其他操作
........
}
..........
//其他操作
..........
}