Elasticsearch7 Query DSL使用

在ElasticSearch中，搜索是通过基于JSON的查询来实现的。查询由两种子句组成：
叶查询子句：这些子句是匹配、术语或范围，它们在特定字段中查找特定值。
复合查询子句：这些查询是叶查询子句和其他复合查询的组合，以提取所需的信息。

查询分类：

先准备一下测试数据：

PUT /yule/user/1
{
  "name":"zhangliangying",
  "address":"bei jing hai dian qu qing he zhen",
  "age":35,
  "birthday":"1984-08-13",
  "interests":"xi huan hejiu,duanlian,changge"
}

PUT /yule/user/2
{
  "name":"liuyifei",
  "address":"shang hai huang pu qu qing he zhen",
  "age":36,
  "birthday":"1983-06-13",
  "interests":"xi huan yingshi,duanlian,changge"
}

PUT /yule/user/3
{
  "name":"liruotong",
  "address":"hong kong zhang jia zai",
  "age":38,
  "birthday":"1981-02-23",
  "interests":"xi huan dianshi,duanlian,lvyou"
}

PUT /yule/user/4
{
  "name":"yangmi",
  "address":"hang zhou xi hu qu",
  "age":34,
  "birthday":"1985-02-16",
  "interests":"xi huan youxi,duanlian,lvyou"
}

Match All查询

这是最基本的查询；它返回所有内容，每个对象的得分为1.0。例如，

GET /yule/user/_search
{
    "query": {
        "match_all": {}
        },
        "_source":{
          "includes": ["name" ,"age"]
        }
    
}

/_search 查找/yule/user下的内容

query 为查询关键字，类似的还有aggs为聚合关键字

match_all 匹配所有的文档，也可以写match_none不匹配任何文档

_source 返回特定字段，这里返回name,age字段，过滤掉不需要的字段

结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangliangying",
          "interests" : "xi huan hejiu,duanlian,changge",
          "age" : 35
        }
      },
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "liuyifei",
          "interests" : "xi huan yingshi,duanlian,changge",
          "age" : 36
        }
      },
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "yangmi",
          "interests" : "xi huan youxi,duanlian,lvyou",
          "age" : 34
        }
      },
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "liruotong",
          "interests" : "xi huan dianshi,duanlian,lvyou",
          "age" : 38
        }
      }
    ]
  }
}

took：表示我们执行整个搜索请求消耗了多少毫秒

timed_out：表示本次查询是否超时

这里需要注意当timed_out为True时也会返回结果，这个结果是在请求超时时ES已经获取到的数据，所以返回的这个数据可能不完整。

且当你收到timed_out为True之后，虽然这个连接已经关闭，但在后台这个查询并没有结束，而是会继续执行

_shards：显示查询中参与的分片信息，成功多少分片失败多少分片等

hits：匹配到的文档的信息，其中total表示匹配到的文档总数，max_score为文档中所有_score的最大值

hits中的hits数组为查询到的文档结果，默认包含查询结果的前十个文档，每个文档都包含文档的_index、_type、_id、_score和_source数据

分页查询

size： 设置一次返回的结果数量，也就是hits中的文档数量，默认为10

from： 设置从第几个结果开始往后查询，默认值为0

GET /yule/user/_search
{
  "size":2,
  "from":1,
    "query": {
        "match_all": {}
        },
        "_source":{
          "includes": ["name" ,"age"]
        }
    
}

Match查询

此查询将文本或短语与一个或多个字段的值相匹配。例如，

get /yule/user/_search
{
  "query":{
    "match":{
      "interests":"lvyou"
    }
  }
}

结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "yangmi",
          "address" : "hang zhou xi hu qu",
          "age" : 34,
          "birthday" : "1985-02-16",
          "interests" : "xi huan youxi,duanlian,lvyou"
        }
      },
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "3",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "liruotong",
          "address" : "hong kong zhang jia zai",
          "age" : 38,
          "birthday" : "1981-02-23",
          "interests" : "xi huan dianshi,duanlian,lvyou"
        }
      }
    ]
  }
}

multi_match

在多个字段上执行相同的match查询

get /yule/user/_search
{
  "query":{
    "multi_match":{
      "query":"lvyou",
      "fields":["interests","name"]
    }
  }
}

也可以用query_string这种方式组合更多的条件完成更复杂的查询请求

get /yule/user/_search
{
  "query":{
    "query_string":{
      "query":"age:34 or(name:liuyifei)"
    }
  }
}

与其像类似的还有个simple_query_string的关键字，可以将query_string中的AND或OR用+或|这样的符号替换掉.

query_string常见写法：

{“query”:{“query_string”:{“name:obama”}}}

name字段为obama

{“query”:{“query_string”:{“nam\\*:obama”}}}

存在一个nam开头的字段，值为obama

{“query”:{“query_string”:{“__missing__:name”}}}

name字段值为null的文档

{“query”:{“query_string”:{“__exists__:name”}}}

name字段值不为null的文档

{“query”:{“query_string”:{“name:（obama OR xidada)”}}}

name字段为Obama或者xidada的文档

term可以用来精确匹配，精确匹配的值可以是数字、时间、布尔值或者是设置了not_analyzed不分词的字符串

term是代表完全匹配，即不进行分词器分析，文档中必须包含整个搜索的词汇

term对输入的文本不进行分析，直接精确匹配输出结果，如果要同时匹配多个值可以使用terms

get /yule/user/_search
{
  "query":{
    "terms":{
      "interests":["duanlian","lvyou"]
    }
  }
}

结果：

{
  "took": 37,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 1,
    "hits": [
      {
        "_index": "yule",
        "_type": "user",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "yangmi",
          "address": "hang zhou xi hu qu",
          "age": 34,
          "birthday": "1985-02-16",
          "interests": "xi huan youxi,duanlian,lvyou"
        }
      },
      {
        "_index": "yule",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "liuyifei",
          "address": "shang hai huang pu qu qing he zhen",
          "age": 36,
          "birthday": "1983-06-13",
          "interests": "xi huan yingshi,duanlian,changge"
        }
      },
      {
        "_index": "yule",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "zhangliangying",
          "address": "bei jing hai dian qu qing he zhen",
          "age": 35,
          "birthday": "1984-08-13",
          "interests": "xi huan hejiu,duanlian,changge"
        }
      },
      {
        "_index": "yule",
        "_type": "user",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "liruotong",
          "address": "hong kong zhang jia zai",
          "age": 38,
          "birthday": "1981-02-23",
          "interests": "xi huan dianshi,duanlian,lvyou"
        }
      }
    ]
  }
}

term和match对比例子：

GET /yule/user/_search
{
  "query":{
    "term":{
      "interests":"xi youxi"
    }
  }
}

以及

GET /yule/user/_search
{
  "query":{
    "term":{
      "interests":"xi huan youxi"
    }
  }
}

都是匹配不到任何数据

而：

GET /yule/user/_search
{
  "query":{
    "match":{
      "interests":"xi duanlian"
    }
  }
}

可以查出结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.36464313,
    "hits" : [
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "4",
        "_score" : 0.36464313,
        "_source" : {
          "name" : "yangmi",
          "address" : "hang zhou xi hu qu",
          "age" : 34,
          "birthday" : "1985-02-16",
          "interests" : "xi huan youxi,duanlian,lvyou"
        }
      }
    ]
  }
}

range

range用来查询落在指定区间内的数字或者时间

get /yule/user/_search
{
  "query":{
    "range":{
      "age":{
        "gte":34,
        "lt": 36
      }
    }
  }
}

这里的操作符主要有四个gt大于，gte大于等于，lt小于，lte小于等于

当使用日期作为范围查询时，我们需要注意下日期的格式，官方支持的日期格式主要有两种:

时间戳，注意是毫秒粒度

日期字符串

模糊查询：

使用关键字wildcard
它使用标准的 shell 通配符查询： ? 匹配任意字符， * 匹配 0 或多个字符

GET /yule/user/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "*yi*"
      }
    }
  }
}

结果：

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "yule",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "liuyifei",
          "address": "shang hai huang pu qu qing he zhen",
          "age": 36,
          "birthday": "1983-06-13",
          "interests": "xi huan yingshi,duanlian,changge"
        }
      },
      {
        "_index": "yule",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "zhangliangying",
          "address": "bei jing hai dian qu qing he zhen",
          "age": 35,
          "birthday": "1984-08-13",
          "interests": "xi huan hejiu,duanlian,changge"
        }
      }
    ]
  }
}

组合查询

通常我们可能需要将很多个条件组合在一起查出最后的结果，这个时候就需要使用ES提供的bool来实现了

查询query和过滤器filter已合并（在ES1.X版本是分开的，存在filtered检索类型）。

ES高版本（2.X/5.X/6.x以后），任何查询子句都可以在“查询上下文query”中用作查询，并在“过滤器上下文filter”中用作过滤器。

post /yule/user/_search
{
  "query": {

        "bool": {
          "must": [
            {
              "match": {
                "interests": "duanlian"
              }
            }
          ],
          "must_not": [
            {
              "match": {
                "name": "liuyifei"
              }
            }
          ],
          "should": [
            {
              "term": {
                "address": "bei"
              }
            },
            {
              "term": {
                "address": "an"
              }
            }
          ],
          "filter": [
            {
              "range": {
                "age": {
                  "gte": 34,
                  "lt": 37
                }
              }
            }
          ]
    
    }
  }
}

结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2054763,
    "hits" : [
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.2054763,
        "_source" : {
          "name" : "zhangliangying",
          "address" : "bei jing hai dian qu qing he zhen",
          "age" : 35,
          "birthday" : "1984-08-13",
          "interests" : "xi huan hejiu,duanlian,changge"
        }
      },
      {
        "_index" : "yule",
        "_type" : "user",
        "_id" : "4",
        "_score" : 0.105360515,
        "_source" : {
          "name" : "yangmi",
          "address" : "hang zhou xi hu qu",
          "age" : 34,
          "birthday" : "1985-02-16",
          "interests" : "xi huan youxi,duanlian,lvyou"
        }
      }
    ]
  }
}

主要有四个关键字来组合查询之间的关系，分别为：

must：类似于SQL中的AND，必须包含

must_not：类似于SQL中的NOT，必须不包含

filter：与must相似，但不会对结果进行相关性评分_score，大多数情况下我们对于日志的需求都无相关性的要求，所以建议查询的过程中多用filter

should：满足这些条件中的任何条件都会增加评分_score，不满足也不影响，should只会影响查询结果的_score值，并不会影响结果的内容

重点说一下should：

elasticsearch实现传统数据库中的or功能，需要使用bool下面的should关键字，对于A or B的情况，应该至少返回A和B中的一个，但是上面的语句，不仅返回A和B中的至少一个，也返回了没有A也没有B的情况.

参看elasticsearch官方文档，对should的说明如下：

should

The clause (query) should appear in the matching document. If the bool query is in a query 
context and has a must or filter clause then a document will match the bool query even if 
none of the should queries match. In this case these clauses are only used to influence the 
score. If the bool query is a filter context or has neither must or filter then at least 
one of the should queries must match a document for it to match the bool query. This
 behavior may be explicitly controlled by settings the minimum_should_match parameter.

表达的意思是：如果一个query语句的bool下面，除了should语句，还包含了filter或者must语句，那么should context下的查询语句可以一个都不满足，只是_score=0，所以上述查询语句，有无should语句，查询到的hits().total()是一样的，只是score不同而已。

为了达到传统数据库中or的功能，有如下两种方法：

1.将should语句写到must下面，然后让must和filter并列

2.采用官方文档中的 minimum_should_match 参数:

minimum_should_match代表了最小匹配精度，如果设置minimum_should_match=1，那么should语句中至少需要有一个条件满足，查询语句如下：

post /yule/user/_search
{
  "query": {

        "bool": {
          "must": [
            {
              "match": {
                "interests": "duanlian"
              }
            }
          ],
          "must_not": [
            {
              "match": {
                "name": "liuyifei"
              }
            }
          ],
          "should": [
            {
              "term": {
                "address": "bei"
              }
            },
            {
              "term": {
                "address": "an"
              }
            }
          ],
          "minimum_should_match" : 1,
          "filter": [
            {
              "range": {
                "age": {
                  "gte": 34,
                  "lt": 37
                }
              }
            }
          ]
    
    }
  }
}

返回结果：

post /yule/user/_search
{
  "query": {

        "bool": {
          "must": [
            {
              "match": {
                "interests": "duanlian"
              }
            }
          ],
          "must_not": [
            {
              "match": {
                "name": "liuyifei"
              }
            }
          ],
          "should": [
            {
              "term": {
                "address": "bei"
              }
            },
            {
              "term": {
                "address": "an"
              }
            }
          ],
          "minimum_should_match" : 1,
          "filter": [
            {
              "range": {
                "age": {
                  "gte": 34,
                  "lt": 37
                }
              }
            }
          ]
    
    }
  }
}

对于minimum_should_match设置值:

1.minimum_should_match:"3"

无论可选子句的数量如何，都表示固定值.

2.minimum_should_match:"-2"

表示可选子句的总数减去此数字应该是必需的。

3.minimum_should_match:"75%"

表示最少匹配的子句个数,例如有五个可选子句,最少的匹配个数为5*75%=3.75.向下取整为3,这就表示五个子句最少要匹配其中三个才能查到.

4.minimum_should_match:"-25%"

和上面的类似,只是计算方式不一样,假如也是五个子句,5*25%=1.25,向下取整为1,5最少匹配个数为5-1=4.

5.minimum_should_match:"3<90%"

表示如果可选子句的数量等于（或小于）设置的值，则它们都是必需的，但如果它大于设置的值，则适用规范。在这个例子中：如果有1到3个子句，则它们都是必需的，但是对于4个或更多子句，只需要90％的匹配度.

综合使用实例：

GET /applog-2019-11-11/logs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "properties.point_id": "22632"
          }
        },
        {
          "terms": {
            "app_id": [
              "587f69909c9749ac8f98562527e948a2",
              "78cbafbbb0a94edb85126bf7350c198e",
              "962725006cdb4a45a3c84734f29b6ede"
            ]
          }
        },
        {
          "term": {
            "account_id": {
              "value": "127321"
            }
          }
        },
        {
          "term": {
            "app_version": {
              "value": "3.68.11101.98"
            }
          }
        },
        {
          "term": {
            "app_version": {
              "value": "3.68.11101.98"
            }
          }
        }
      ],
      "filter": [
        {
          "range": {
            "time": {
              "gte": "2019-11-11 10:10:00",
              "lte": "2019-11-11 10:13:00",
              "format": "yyyy-MM-dd hh:mm:ss"
            }
          }
        }
      ]
    }
  }
}

liuhmmjj

博客专家

原创文章 317 获赞 416 访问量 112万+

关注他的留言板