安装Kibana
这里主要是为了使用Kibana的Dev Tools控制台方便访问ES
这里直接使用docker-compose安装,并带有俩个elasticsearch组成的伪集群:
version: '3.7'
networks:
esnet:
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
container_name: elasticsearch
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- http.cors.enabled=true
- http.cors.allow-origin=*
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata1:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- esnet
elasticsearch2:
image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
container_name: elasticsearch2
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.zen.ping.unicast.hosts=elasticsearch"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata2:/usr/share/elasticsearch/data
networks:
- esnet
kibana:
image: docker.elastic.co/kibana/kibana:6.5.1
environment:
- SERVER_NAME=kibana
- ELASTICSEARCH_URL=http://elasticsearch:9200
- XPACK_MONITORING_ENABLED=true
ports:
- 5601:5601
networks:
- esnet
volumes:
esdata1:
driver: local
esdata2:
driver: local
index API
新增 index
PUT employee
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
可以在新增时对index进行定制化配置,详细配置可参考官网的:index settings
删除index
DELETE employee
插入document
如果不需要对index做定制化配置,可以通过以下API自动创建index并插入数据:
PUT /employee/_doc/1
{
"name" : "zhangsan",
"age" : 28,
"signature":"I like watching movies",
"hobby" : ["book","music"]
}
这里使用PUT请求在es中新增了一个员工信息,es中的数据都是存储在index中,从6.x版本开始,一个index下只能有一个type,并且推荐设置为 “_doc”,因为在7.x版本中弃用了type的概念,旧版API中type的位置在7.x中只能是“_doc”。
查询API
单个查询document
GET /employee/_doc/1
验证单个document是否存在
HEAD employee/_doc/1
简单搜索
这里使用_search API默认做index下的全查询:
GET employee/_doc/_search
还可以使用 q 参数添加查询条件:
GET employee/_doc/_search?q=name:zhangsan
查询表达式搜索
表达式全查询
GET /employee/_doc/_search
{
"query":{
"match_all":{ }
}
}
带条件的表达式查询
employee 索引中 name 为 zhangsan的员工
GET /employee/_doc/_search
{
"query":{
"match":{
"name" : "zhangsan"
}
}
}
带条件和过滤器的查询
employee 索引中 age 大于27且 name 是 zhangsan
GET /employee/_doc/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "zhangsan"
}
},
"filter": {
"range": {
"age": {
"gt": 27
}
}
}
}
}
}
- must 中为必须匹配的条件
- filter 中为过滤条件,range是一个范围过滤器
全文检索
查询有 signature 中包含 dislike watching movies 的 employee
GET /employee/_doc/_search
{
"query":{
"match":{
"signature":"dislike watching movies"
}
}
}
这里贴一下结果中的 hits 部分
"hits" : {
"total" : 3,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.29748765,
"_source" : {
"name" : "zhangsan",
"age" : 28,
"signature" : "I like watching movies",
"hobby" : [
"book",
"music"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.27407023,
"_source" : {
"name" : "wangwu",
"age" : 26,
"signature" : "I also like watching movies",
"hobby" : [
"book",
"game"
]
}
}
]
}
三个 employee 的 signature 并没有含有全部的 dislike watching movies,但是查询出的员工的signature字段至少会含有其中一个单词,并且各个员工的 _score 字段值不一样,完全包含dislike watching movies三个单词的lisi员工份数最高,且这三个 employee 的顺序也是按照 _score 字段从高到低排列的。这个 _score 是文档的相关性得分。
短语精确匹配
GET /employee/_doc/_search
{
"query":{
"match_phrase":{
"signature":"dislike watching movies"
}
}
}
匹配条件与上一个一样,只是将查询API从 match 变为 match_phrase,还是贴出查询结果的 hist 部分:
"hits" : {
"total" : 1,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
}
}
]
}
结果只有一条,这一条的signature字段必定包含 dislike watching movies 这个短语。
高亮搜索
api是 highlight ,注意它是与 query 同级的。
GET /employee/_doc/_search
{
"query":{
"match":{
"signature":"dislike watching movies"
}
},
"highlight":{
"fields":{
"signature":{}
}
}
}
下面是查询结果:
"hits" : {
"total" : 3,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
},
"highlight" : {
"signature" : [
"I <em>dislike</em> <em>watching</em> <em>movies</em>,I like reading"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.29748765,
"_source" : {
"name" : "zhangsan",
"age" : 28,
"signature" : "I like watching movies",
"hobby" : [
"book",
"music"
]
},
"highlight" : {
"signature" : [
"I like <em>watching</em> <em>movies</em>"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.27407023,
"_source" : {
"name" : "wangwu",
"age" : 26,
"signature" : "I also like watching movies",
"hobby" : [
"book",
"game"
]
},
"highlight" : {
"signature" : [
"I also like <em>watching</em> <em>movies</em>"
]
}
}
]
}
查询的每一个结果中多了一个 highlight 字段,该字段中会将目标字段中符合查询条件的单词用 <em>标签包上。
聚合分析
这里做一个最受欢迎的 hobby:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby"
}
}
}
}
- aggs 表示聚合api开始
- all_hobby 为此次聚合统计名称,任意定义
- terms 为聚合api中的分词统计api,可以对指定字段分词,并统计每个词组在全文中的出现次数
- field 指定分析字段
但是这里执行报错了:
root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [hobby] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
根据提示,文本类型字段的 fielddata 属性默认为关闭的,需要手动开启:
PUT /employee/_mapping/_doc
{
"properties":{
"hobby":{
"type":"text",
"fielddata":"true"
}
}
}
这里将hobby的fielddata设值为true后,ES会对hobby生成一个反向的倒排索引,类似于数据库中的索引,为了做分析、统计等功能。但是额外的索引会占用内存,建议不要在数据量较多的字段设置。也可以使用keyword字段来做分析、统计,像这样:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby.keyword"
}
}
}
}
再次聚合统计,结果的 buckets 如下:
"buckets" : [
{
"key" : "book",
"doc_count" : 2
},
{
"key" : "music",
"doc_count" : 2
},
{
"key" : "game",
"doc_count" : 1
},
{
"key" : "movie",
"doc_count" : 1
}
]
可以看到最受欢迎的hobby是 book 。
也可以和 query 一起使用,它会在查询的结果中进行聚合统计,像这样:
GET /employee/_doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gt": 26
}
}
}
}
},
"aggs": {
"all_hobby": {
"terms": {
"field": "hobby.keyword"
}
}
}
}
多层聚合
比如统计不同的 hobby 包含员工的平均年龄:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby.keyword"
},
"aggs":{
"avg_age":{
"avg":{
"field":"age"
}
}
}
}
}
}
结果如下:
"buckets" : [
{
"key" : "book",
"doc_count" : 2,
"avg_age" : {
"value" : 27.0
}
},
{
"key" : "music",
"doc_count" : 2,
"avg_age" : {
"value" : 27.5
}
},
{
"key" : "game",
"doc_count" : 1,
"avg_age" : {
"value" : 26.0
}
},
{
"key" : "movie",
"doc_count" : 1,
"avg_age" : {
"value" : 27.0
}
}
]
这里结果看起来比较复杂,拿出第一个:
{
"key" : "book",
"doc_count" : 2,
"avg_age" : {
"value" : 27.0
}
}
- key 分析的词组,从字段中分词获取
- doc_count 是分词统计结果,也就是key中的词组在全文中有多少个员工包含它
- avg_age 这个是内层聚合分析名称,在查询时自定义的
- value 是平均年龄,这个平均年龄是针对于上层的统计结果而言的,在这里就是对 hobby 含有 book 词组的俩个员工计算他们的平均年龄。