Elasticsearch 查询语句的理解

转载出处 https://blog.csdn.net/hololens/article/details/78932628

① {"query": {"match_all": {}}} query 部分告诉我们我们的查询定义是什么，match_all 部分简单指定了我们想去执行的查询类型，意思就是在索引中搜索所有的文档。

除了query参数，我们还可以通过其他的参数影响搜索结果。指定size来指定返回的结果数量

{"query": { "match_all": {} },"size":1}} 注意如果size没有指定，它默认为10。

POST  /testxufeifei/_search
{
   "query":{"match_all":{}},
   "from":2,
   "size":2
}

from 参数(从2开始)指定了从哪个文档索引开始， size 参数指定了从 from 指定的索引开始返回多少个文档。这个特性在实现分页搜索时很有用。注意如果 from 参数没有指定，它默认为0。

如下示例使用match_all并且按账户的balance值进行倒序排列后返回前10条文档：

{"query":{"match_all":{}},"sort":{"balance":{"order":"desc"}}}

首先，我们来关注一下返回的文档属性。默认情况下，文档会作为搜索结果的一部分返回所有的属性值。这个文档的JSON内容被称为source(返回结果中的hits的_source属性值)。如果我们不需要返回所有的source文档属性，我们可以在请求体中加入我们需要返回的属性名。

如下的示例演示了如何返回两个属性，age 和 balance (在_source中)：POST /testxufeifei/_search

{"query": {"match_all": {}},"_source": ["balance","age"],"size": 2}

注意上面的例子仅仅只是减少了_source里的属性。它仍然会返回_source属性，只不过_source属性中之包含account_number和balance两个属性。

现在，让我们的视线转到查询部分。之前我们已经看到如何使用match_all来匹配所有的文档。现在让我们介绍一个新的查询叫做match 查询，它可以被认为是基本的属性搜索查询(就是通过特定的一个或多个属性来搜索)。

如下的示例返回account_number为20的文档：

POST /testxufeifei/_search{"query": {"match": {"account_number": 13 }}}

如下示例返回所有的address字段中包含“Street”(不区分大小写)这个单词的账户文档：

{"query": {"match": {"address": "Street"}},"_source": [ "address","city"]} 记录只显示address city字段的值

返回的结果是

{"hits": { 
"total": 3,
"max_score": 0.70273256,
"hits": [ 
  { 
    "_index": "testxufeifei",
     "_type": "doc",
     "_id": "2",
     "_score": 0.70273256,
     "_source": { 
         "address": "671 Bristol Street",
         "city": "Dante"
      }
 }
,
 { 
      "_index": "testxufeifei",
      "_type": "example",
      "_id": "7",
      "_score": 0.5,
      "_source": { 
            "address": "702 Quentin Street",
            "city": "Veguita"
       }
      }
,
{ 
     "_index": "testxufeifei",
     "_type": "doc",
     "_id": "3",
     "_score": 0.15342641,
     "_source": { 
     "address": "789 Madison Street",
      "city": "Nogal"
   }
}
]
}}

如下示例返回所有的address字段中包含“671”或者是“702”的账户文档：或的意思

{"query": {"match": {"address": "671 702"}}, "_source": ["address", "city","firstname"]}

返回数据为

{"hits": [ 
{ 
    "_index": "testxufeifei",
    "_type": "doc",
    "_id": "2",
    "_score": 0.19551794,
    "_source": { 
       "address": "671 Bristol Street",
       "firstname": "Hattie",
       "city": "Dante"
     }
}
,
{ 
    "_index": "testxufeifei",
    "_type": "example",
    "_id": "7",
    "_score": 0.12713557,
    "_source": { 
      "address": "702 Quentin Street",
      "firstname": "Dillard",
      "city": "Veguita"
}
}
]}

如下示例是match的一种变体(match_phrase)，这个将返回所有address中包含“171 Putnam”这个短语的账户文档：

{"query": {"match_phrase": { "address": "171 Putnam"}},"_source": ["address","city","firstname"]}

查询结果为

{"hits": { 
"total": 2,
"max_score": 1.4054651,
"hits": [ 
    { 
      "_index": "testxufeifei",
      "_type": "doc",
      "_id": "6",
      "_score": 1.4054651,
      "_source": { 
      "address": "171 Putnam Avenue",
      "firstname": "Virginia",
      "city": "Nicholson"
     }
  }
,
  { 
     "_index": "testxufeifei",
     "_type": "doc",
     "_id": "5",
     "_score": 0.30685282,
     "_source": { 
       "address": "171 Putnam Avenue",
       "firstname": "Virginia",
       "city": "Nicholson"
     }
   }
 ]
}}

现在让我们介绍 bool 查询。bool 查询允许我们使用布尔逻辑将小的查询组成大的查询。

如下的示例组合两个match查询并且返回所有address属性中包含 “Street” 和 “702” 的账户文档：

{"query": {"bool": {"must": [{ "match": { "address": "Street" } },{ "match": { "address": "702" } } ]}}} 类似于and 满足两个条件

bool must 子句指定了所有匹配文档必须满足的条件。

返回的数据为

{"hits": [ 
{ 
    "_index": "testxufeifei",
    "_type": "example",
    "_id": "7",
    "_score": 0.70710677,
    "_source": { 
      "account_number": 32,
      "balance": 48086,
     "firstname": "Dillard",
     "lastname": "Mcpherson",
     "age": 34,
     "gender": "F",
     "address": "702 Quentin Street",
     "employer": "Quailcom",
     "email": "[email protected]",
     "city": "Veguita",
     "state": "IN"
}
}
]}

相比之下，如下的示例组合两个match查询并且返回所有address属性中包含 “Street” 或 “702” 的账户文档：

{"query": { "bool": { "should": [{ "match": { "address": "Street" } },{ "match": { "address": "702" } }]} }}

返回的数据为

{"hits": [ 
{ 
    "_index": "testxufeifei",
    "_type": "example",
    "_id": "7",
    "_score": 0.70710677,
    "_source": { 
       "account_number": 32,
       "balance": 48086,
       "firstname": "Dillard",
       "lastname": "Mcpherson",
       "age": 34,
       "gender": "F",
       "address": "702 Quentin Street",
       "employer": "Quailcom",
       "email": "[email protected]",
       "city": "Veguita",
       "state": "IN"
  }
}
,
{ 
   "_index": "testxufeifei",
   "_type": "doc",
   "_id": "2",
   "_score": 0.19551794,
   "_source": { 
      "account_number": 6,
      "balance": 5686,
      "firstname": "Hattie",
      "lastname": "Bond",
      "age": 36,
      "gender": "M",
      "address": "671 Bristol Street",
      "employer": "Netagy",
      "email": "[email protected]",
      "city": "Dante",
      "state": "TN"
   }
}
,
{ 
    "_index": "testxufeifei",
   "_type": "doc",
   "_id": "3",
   "_score": 0.02250402,
   "_source": { 
     "account_number": 13,
     "balance": 32838,
     "firstname": "Nanette",
     "lastname": "Bates",
     "age": 28,
     "gender": "F",
     "address": "789 Madison Street",
    "employer": "Quility",
    "email": "[email protected]",
    "city": "Nogal",
    "state": "VA"
}
}
]
}

在上述的例子中，bool should 子句指定了匹配文档只要满足其中的任何一个条件即可匹配。

如下示例组合两个match查询并且返回所有address属性中既不包含 “Avenue” 也不包含 “Street” 的账户文档：

{"query": {"bool": {"must_not": [{ "match": { "address": "Avenue" } },{ "match": { "address": "Street" } }]}}}

返回数据为

{"hits": [ 
  { 
    "_index": "testxufeifei",
    "_type": "doc",
    "_id": "4",
    "_score": 1,
    "_source": { 
      "account_number": 18,
      "balance": 4180,
      "firstname": "Dale",
      "lastname": "Adams",
      "age": 33,
      "gender": "M",
      "address": "467 Hutchinson Court",
      "employer": "Boink",
      "email": "[email protected]",
      "city": "Orick",
      "state": "MD"
    } 
}
,
{ 
   "_index": "testxufeifei",
   "_type": "doc",
   "_id": "1",
   "_score": 1,
   "_source": { 
      "account_number": 1,
      "balance": 39225,
      "firstname": "Amber",
      "lastname": "Duke",
      "age": 32,
      "gender": "M",
      "address": "880 Holmes Lane",
      "employer": "Pyrami",
      "email": "[email protected]",
      "city": "Brogan",
      "state": "IL"
}
}
]}

在上述例子中，bool must_not 子句指定了其中的任何一个条件都不满足时即可匹配。

我们可以在一个bool查询中同时指定must，should和must_not子句。此外，我们也可以在一个bool子句中组合另一个bool来模拟任何复杂的多重布尔逻辑。

如下的示例返回所有age属性为39，并且state属性不为ID的账户文档：

{"query": {"bool": {"must": [{"match": {"age": "39"}}],"must_not": [{"match": {"state": "ID"}}]}}}

返回的数据结果为

{"hits": [ 
   { 
     "_index": "testxufeifei",
     "_type": "doc",
     "_id": "6",
     "_score": 1.4054651,
     "_source": { 
       "account_number": 25,
       "balance": 40540,
       "firstname": "Virginia",
       "lastname": "Ayala",
       "age": 39,
       "gender": "F",
       "address": "171 Putnam Avenue",
       "employer": "Filodyne",
       "email": "[email protected]",
       "city": "Nicholson",
       "state": "PA"
    }
}
,
{ 
    "_index": "testxufeifei",
    "_type": "doc",
    "_id": "5",
    "_score": 0.30685282,
    "_source": { 
        "account_number": 25,
        "balance": 40540,
        "firstname": "Virginia",
        "lastname": "Ayala",
        "age": 39,
        "gender": "F",
        "address": "171 Putnam Avenue",
        "employer": "Filodyne",
        "email": "[email protected]",
        "city": "Nicholson",
        "state": "PA"
    }
  }
 ]
}

执行过滤

在之前的章节中，我们跳过了一个叫做文档得分(在搜索结果中的_score属性)的小细节。这个得分是一个数值，它是一个相对量，用来衡量搜索结果跟我们指定的关键字的相关程度。分数越高，说明这个文档的相关性越大，分数越低，说明这个文档的相关性越小。

但是一些查询结果并不总是需要产生得分，尤其是当他们仅仅被用来过滤文档集的时候。Elasticsearch会检测这种情况并自动优化查询以免计算无用的分数。

我们在前面章节介绍的bool 查询也支持 filter 子句，它允许我们可以在不改变得分计算逻辑的的情况下限制其他子句匹配的查询结果。为了示例说明，让我们介绍一下range 查询，它允许我们通过一个值区间来过滤文档。这个通常用在数值和日期过滤上。

如下的示例使用bool查询返回所有余额在30000到40000之间的账户(包含边界)。换句话说，我们想查询账户余额大于等于20000并且小于等于30000的用户。

{"query": {"bool": { "must": {"match_all": {}},"filter": {"range": {"balance": {"gte": 30000,"lte": 40000}}}}}}

返回的数据是

{"hits": [ 
{ 
   "_index": "testxufeifei",
   "_type": "doc",
   "_id": "1",
   "_score": 1,
   "_source": { 
   "account_number": 1,
   "balance": 39225,
   "firstname": "Amber",
   "lastname": "Duke",
   "age": 32,
   "gender": "M",
   "address": "880 Holmes Lane",
   "employer": "Pyrami",
   "email": "[email protected]",
   "city": "Brogan",
   "state": "IL"
   }
}
,
{ 
   "_index": "testxufeifei",
   "_type": "doc",
   "_id": "3",
   "_score": 1,
   "_source": { 
      "account_number": 13,
      "balance": 32838,
      "firstname": "Nanette",
      "lastname": "Bates",
      "age": 28,
      "gender": "F",
      "address": "789 Madison Street",
      "employer": "Quility",
      "email": "[email protected]",
      "city": "Nogal",
      "state": "VA"
   }
   }
 ]
}

仔细分析一下上面的例子，bool查询在查询部分使用match_all，在过滤部分使用range。我们可以使用任何的查询来代替查询部分和过滤部分。在上面的例子中，range查询让结果更加合乎情理，因为文档在这个区间中一定是符合的，就是说，没有比这些相关性更大的了。

执行聚合

聚合提供了功能可以分组并统计你的数据。理解聚合最简单的方式就是可以把它粗略的看做SQL的GROUP BY操作和SQL的聚合函数。在Elasticsearch中，你可以在执行搜索后在一个返回结果中同时返回搜索结果和聚合结果。你可以使用简洁的API执行搜索和多个聚合操作，并且可以一次拿到所有的结果，避免网络切换，就此而言，这是一个非常强大和高效功能。

作为开始，如下的例子将账户按state进行分组，然后按count降序(默认)返回前10组(默认)states。

{"size": 0,"aggs": {"group_by_state": { "terms": {"field": "state"}}}}

返回数据类型为

"took": 1,
"timed_out": false,
"_shards": {
- "total": 5,
- "successful": 5,
- "failed": 0
},
"hits": {
- "total": 7,
- "max_score": 0,
- "hits": [ ]
},
"aggregations": {
- "group_by_state": {
  - "doc_count_error_upper_bound": 0,
  - "sum_other_doc_count": 0,
  - "buckets": [
    - {
      - "key": "pa",
      - "doc_count": 2
      },
    - {
      - "key": "il",
      - "doc_count": 1
      },
    - {
      - "key": "in",
      - "doc_count": 1
      },
    - {
      - "key": "md",
      - "doc_count": 1
      },
    - {
      - "key": "tn",
      - "doc_count": 1
      },
    - {
      - "key": "va",
      - "doc_count": 1
      }
    ]
  }
}

}

上面的聚合的例子跟如下的SQL类似：

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

注意我们设置了size=0来不显示hits搜索结果，因为我们这里只关心聚合结果。

如下示例我们在上一个聚合的基础上构建，这个示例计算每个state分组的平均账户余额(还是使用默认按count倒序返回前3个)：

{"size": 0,"aggs": {"group_by_state": {"terms": { "field": "state"},"aggs": { "average_balance": {"avg": {"field": "balance"} }}}}}

查询结果为：

"aggregations": {

"group_by_state": {
- "doc_count_error_upper_bound": 0,
- "sum_other_doc_count": 0,
- "buckets": [
  - {
    - "key": "pa",
    - "doc_count": 2,
    - "average_balance": {
      - "value": 40540
      }
    },
  - {
    - "key": "il",
    - "doc_count": 1,
    - "average_balance": {
      - "value": 39225
      }
    },
  - {
    - "key": "in",
    - "doc_count": 1,
    - "average_balance": {
      - "value": 48086
      }
    },
  - {
    - "key": "md",
    - "doc_count": 1,
    - "average_balance": {
      - "value": 4180
      }
    },
  - {
    - "key": "tn",
    - "doc_count": 1,
    - "average_balance": {
      - "value": 5686
      }
    },
  - {
    - "key": "va",
    - "doc_count": 1,
    - "average_balance": {
      - "value": 32838
      }
    }
  ]
}

}

如下示例演示我们如何按年龄区间分组(20-29，30-39，40-49)，然后按性别，最后获取每个年龄区间，每个性别的平均账户余额：

{ "size": 0,"aggs": {"group_by_age": {"range": { "field": "age", "ranges": [{"from": 20, "to": 30}, { "from": 30, "to": 40},{ "from": 40,"to": 50 }]},
"aggs": {"group_by_gender": {"terms": {"field": "gender"},
"aggs": {"average_balance": { "avg": {"field": "balance"}}}}}}}

}

查询结果为

"aggregations": {

"group_by_age": {
- "buckets": [
  - {
    - "key": "20.0-30.0",
    - "from": 20,
    - "from_as_string": "20.0",
    - "to": 30,
    - "to_as_string": "30.0",
    - "doc_count": 1,
    - "group_by_gender": {
      - "doc_count_error_upper_bound": 0,
      - "sum_other_doc_count": 0,
      - "buckets": [
        
        {
        
        "key": "f",
        
        "doc_count": 1,
        
        "average_balance": {
        
        "value": 32838
        
        }
        
        }
        
        ]
      }
    },
  - {
    - "key": "30.0-40.0",
    - "from": 30,
    - "from_as_string": "30.0",
    - "to": 40,
    - "to_as_string": "40.0",
    - "doc_count": 6,
    - "group_by_gender": {
      - "doc_count_error_upper_bound": 0,
      - "sum_other_doc_count": 0,
      - "buckets": [
        
        {
        
        "key": "f",
        
        "doc_count": 3,
        
        "average_balance": {
        
        "value": 43055.333333333336
        
        }
        
        },
        
        {
        
        "key": "m",
        
        "doc_count": 3,
        
        "average_balance": {
        
        "value": 16363.666666666666
        
        }
        
        }
        
        ]
      }
    },
  - {
    - "key": "40.0-50.0",
    - "from": 40,
    - "from_as_string": "40.0",
    - "to": 50,
    - "to_as_string": "50.0",
    - "doc_count": 0,
    - "group_by_gender": {
      - "doc_count_error_upper_bound": 0,
      - "sum_other_doc_count": 0,
      - "buckets": [ ]
      }
    }
  ]
}

}

Elasticsearch是一个既简单又复杂的产品。我们到目前为止已经学习了基础的知识，知道了它是什么，它内部的实现原理，以及如何使用REST API去操作它。希望此教程能帮助你理解Elasticsearch以及更重要的东西，鼓励你去实践它剩余的更多的特性！