背景:目前国内有大量的公司都在使用 Elasticsearch,包括阿里、京东、滴滴、今日头条、小米、vivo等诸多知名公司。除了搜索功能之外,Elasticsearch还结合Kibana、Logstash、Elastic Stack还被广泛运用在大数据近实时分析领域,包括日志分析、指标监控等多个领域。
本节内容:Elasticsearch的排序原理。
目录
我们知道,Elasticsearch默认情况下,返回的结果是按照相关性_score进行排序的,即最相关的文档排在最前。 在日常业务当中,Elasticsearch排序会被经常使用,今天我带着大家看看Elasticsearch sort参数含义以及如何使用sort进行排序。
1、默认按照_score排序
为了按照相关性来排序,需要将相关性_score表示为一个数值。在 Elasticsearch 中, 相关性得分由一个浮点数进行表示,并在搜索结果中通过 _score参数返回, 默认排序是按照_score降序。
http://localhost:9201/student/_search
查询请求,比如需要查询id为1的数据。
{
"query" : {
"bool" : {
"filter" : {
"term" : {
"id" : 1
}
}
}
}
}
查询结果如下,
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0,
"hits": [
{
"_index": "student",
"_type": "_doc",
"_id": "1",
"_score": 0,// 相关性评分,无意义的值
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-05-28 14:19:05",
"name": "test1",
"id": "1",
"age": 1
}
}
]
}
}
上面的相关性评分可能对于生产环境而言并没有实际业务意义。因为当使用 filter过滤时,这表明只是希望获取匹配 id为1的文档数据,而并没有试图确定这些文档的相关性。 如果有多个文档,此时文档会按照随机顺序返回,并且每个文档都会评为零分。
如果我们想把这个没有意义的分数过滤掉。可以使用 constant_score 关键字对查询条件进行替换:
{
"query" : {
"constant_score" : { //constant_score替换前面的bool
"filter" : {
"term" : {
"id" : 1
}
}
}
}
}
最终查询结果如下,
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "student",
"_type": "_doc",
"_id": "1",
"_score": 1, //恒定分值,默认为1
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-05-28 14:19:05",
"name": "test1",
"id": "1",
"age": 1
}
}
]
}
}
此时执行与前面相同的查询请求,返回的所有文档_score的恒定值为1。
2、按照单字段排序
在实际业务场景中,通常会根据具体的单个业务字段进行排序,比如 数值、日期等。
请求参数,比如我们需要查询按照创建倒序进行对学习排序,此时可以使用sort参数进行实现。
{
"query": {
"bool": {
"filter": {
"term": {
"id": 1
}
}
}
},
"sort": {
"createTime": {
"order": "desc"
}
}
}
响应参数如下,
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null, //返回为空
"hits": [
{
"_index": "student",
"_type": "_doc",
"_id": "1",
"_score": null, //返回为空
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-05-28 14:19:05",
"name": "test1",
"id": "1",
"age": 1
},
"sort": [ // 新的节点
1653747545000 //排序字段值
]
}
]
}
}
此时,我们发现_score的值为null, 此时表示_score没有用于排序。
createTime 字段的值表示为自 epoch (January 1, 1970 00:00:00 UTC)以来的毫秒数,通过 sort 字段的值进行返回。
每个返回结果中会有一个新的节点sort元素,它包含了用于排序的值。 在这个案例中,我们按照 createTime 进行排序,在内部被索引为自epoch以来的毫秒数。 long 类型数1653747545000等价于日期字符串2022-5-28 22:19:50UTC 。
其次 _score 和 max_score 字段都是 null 。计算 _score对性能会有比较大的损耗,通常仅用于排序; 我们一般情况下,并不会根据相关性排序,所以记录_score是没有意义的。如果你的需要场景确实需要计算_score, 此时可以将在请求参数中加track_scores参数,并设置值为true 。
{
"query": {
"bool": {
"filter": {
"term": {
"id": 1
}
}
}
},
"track_scores": true, // 将track_scores设置为true
"sort": {
"createTime": {
"order": "desc"
}
}
}
字段将会默认升序排序,而按照 _score
的值进行降序排序。
3、按照多字段排序
假定我们想要结合使用 createTime 和_score 进行查询,并且匹配的结果首先按照日期排序,然后按照相关性排序。
{
"query": {
"bool": {
"must": {
"match": {
"love": "I like to collect rock albums"
}
},
"filter": {
"term": {
"id": 1
}
}
}
},
"sort": [
{
"createTime": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
排序条件的顺序是很重要的。结果首先按第一个条件排序,仅当结果集的第一个 sort 值完全相同时才会按照第二个条件进行排序,以此类推。
多级排序并不一定包含_score字段。你也可以根据实际业务场景,针对一些不同的字段联合进行排序。
4、单字段多值排序
这种场景是单个字段需要根据多个值进行排序,而且这些值并没有固有的顺序;一个字段多值进行排序,这时应该选择哪个进行排序呢?
如果是数字或日期,你可以将多值字段减为单值,这可以通过使用 min 、 max 、 avg 或是 sum 排序模式 。
比如,你可以按照每个 createTime 字段中的最早日期进行排序,通过以下方法:
{
"query": {
"bool": {
"must": {
"match": {
"love": "I like to collect rock albums"
}
},
"filter": {
"term": {
"id": 1
}
}
}
},
"sort": {
"createTime": {
"order": "asc",
"mode": "min"
}
}
}
返回结果,
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "student",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-05-28 14:19:05",
"name": "test1",
"id": "1",
"age": 1
},
"sort": [
1653747545000
]
}
]
}
}
此种应用场景实际生产环境中使用比较少,具体使用需要结合自身业务需求而定。
5、字符串排序与多字段
有一些业务场景下,我们需要根据某个字段的字符串值进行排序。这在普通的关系型数据库中是很难实现的,那在Elasticsearch是怎么处理的呢?
为了对字符串字段进行排序,这个字段在创建索引时,需包含一项:index为not_analyzed。 但是我们仍需要 analyzed 字段,这样才能以全文进行查询。
通常有一个简单的方法解决这个问题:就是用两个字段存储同一个字符串,一个设置为analyzed 用于搜索, 另一个设置为not_analyzed用于排序。
但是如果重复保存相同的字符串两次,在_source字段是浪费空间的。 我们所希望的是传递一个单字段但是却用两种方式索引它。所有的 _core_field 类型 (strings, numbers, Booleans, dates) 接收一个 fields 参数。
此时,在建立映射是,可设置如下:
// < 7.x版本
"love": {
"type": "string",
"fields": {
"raw": { //子字段
"type": "string",
"index": "not_analyzed" //设置为not_analyzed
}
}
}
// >= 7.x 版本
"love": {
"type": "keyword",
"fields": {
"raw": {
"type": "keyword"
}
}
}
love 字段与之前的一样: 是一个analyzed全文字段。而新增加的 love.raw 子字段是 not_analyzed.
现在,至少只要我们重新索引了我们的数据,使用 love 字段用于搜索,love.raw 字段用于排序。
请求样例如下,
{
"sort": "love.raw"
}
如果没建该字段,则会提示如下信息:
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "No mapping found for [love.raw] in order to sort on",
"index_uuid": "PJE50ZroS4OiTMObGhkw7Q",
"index": "student"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "student",
"node": "ufFZIzzWQkaNgoJXsUn3Sg",
"reason": {
"type": "query_shard_exception",
"reason": "No mapping found for [love.raw] in order to sort on",
"index_uuid": "PJE50ZroS4OiTMObGhkw7Q",
"index": "student"
}
}
]
},
"status": 400
}
此时需要重建索引信息如下,
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"love": {
"type": "keyword",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"createTime": {
"format": "yyyy-MM-dd HH:mm:ss",
"type": "date"
}
}
}
}
最终查询结果如下,
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 20,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "student",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:16",
"name": "test9",
"id": "1",
"age": 1
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "3",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:17",
"name": "test9",
"id": "3",
"age": 3
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "5",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:17",
"name": "test9",
"id": "5",
"age": 5
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "7",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:18",
"name": "test9",
"id": "7",
"age": 7
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "9",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:18",
"name": "test9",
"id": "9",
"age": 9
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "11",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:19",
"name": "test9",
"id": "11",
"age": 11
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "13",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:19",
"name": "test9",
"id": "13",
"age": 13
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "15",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:19",
"name": "test9",
"id": "15",
"age": 15
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "17",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:20",
"name": "test9",
"id": "17",
"age": 17
},
"sort": [
"I like to collect rock albums"
]
},
{
"_index": "student",
"_type": "_doc",
"_id": "19",
"_score": null,
"_source": {
"love": "I like to collect rock albums",
"createTime": "2022-06-03 17:37:20",
"name": "test9",
"id": "19",
"age": 19
},
"sort": [
"I like to collect rock albums"
]
}
]
}
}