Elasticsearch 6.5.1学习笔记(二)简单API

安装Kibana

这里主要是为了使用Kibana的Dev Tools控制台方便访问ES
这里直接使用docker-compose安装,并带有俩个elasticsearch组成的伪集群:

version: '3.7'
networks:
  esnet:
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
    container_name: elasticsearch
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - http.cors.enabled=true
      - http.cors.allow-origin=*
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet
  elasticsearch2:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
    container_name: elasticsearch2
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata2:/usr/share/elasticsearch/data
    networks:
      - esnet
  kibana: 
    image: docker.elastic.co/kibana/kibana:6.5.1
    environment: 
      - SERVER_NAME=kibana
      - ELASTICSEARCH_URL=http://elasticsearch:9200
      - XPACK_MONITORING_ENABLED=true
    ports: 
      - 5601:5601
    networks: 
      - esnet
volumes:
  esdata1:
    driver: local
  esdata2:
    driver: local

index API

新增 index

PUT employee
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

可以在新增时对index进行定制化配置,详细配置可参考官网的:index settings

删除index

DELETE employee

插入document

如果不需要对index做定制化配置,可以通过以下API自动创建index并插入数据:

PUT /employee/_doc/1
{
  "name" : "zhangsan",
  "age" : 28,
  "signature":"I like watching movies",
  "hobby" : ["book","music"]
}

这里使用PUT请求在es中新增了一个员工信息,es中的数据都是存储在index中,从6.x版本开始,一个index下只能有一个type,并且推荐设置为 “_doc”,因为在7.x版本中弃用了type的概念,旧版API中type的位置在7.x中只能是“_doc”。

查询API

单个查询document

GET /employee/_doc/1

验证单个document是否存在

HEAD employee/_doc/1

简单搜索

这里使用_search API默认做index下的全查询:

GET employee/_doc/_search

还可以使用 q 参数添加查询条件:

GET employee/_doc/_search?q=name:zhangsan

查询表达式搜索

表达式全查询

GET /employee/_doc/_search
{
  "query":{
    "match_all":{   }
  }
}

带条件的表达式查询

employee 索引中 name 为 zhangsan的员工

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "name" : "zhangsan"
    }
  }
}

带条件和过滤器的查询

employee 索引中 age 大于27且 name 是 zhangsan

GET /employee/_doc/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "zhangsan"
        }
      },
      "filter": {
        "range": {
          "age": {
            "gt": 27
          }
        }
      }
    }
  }
}
  1. must 中为必须匹配的条件
  2. filter 中为过滤条件,range是一个范围过滤器

全文检索

查询有 signature 中包含 dislike watching movies 的 employee

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "signature":"dislike watching movies"
    }
  }
}

这里贴一下结果中的 hits 部分

"hits" : {
    "total" : 3,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.29748765,
        "_source" : {
          "name" : "zhangsan",
          "age" : 28,
          "signature" : "I like watching movies",
          "hobby" : [
            "book",
            "music"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.27407023,
        "_source" : {
          "name" : "wangwu",
          "age" : 26,
          "signature" : "I also like watching movies",
          "hobby" : [
            "book",
            "game"
          ]
        }
      }
    ]
  }

三个 employee 的 signature 并没有含有全部的 dislike watching movies,但是查询出的员工的signature字段至少会含有其中一个单词,并且各个员工的 _score 字段值不一样,完全包含dislike watching movies三个单词的lisi员工份数最高,且这三个 employee 的顺序也是按照 _score 字段从高到低排列的。这个 _score 是文档的相关性得分。

短语精确匹配

GET /employee/_doc/_search
{
  "query":{
    "match_phrase":{
      "signature":"dislike watching movies"
    }
  }
}

匹配条件与上一个一样,只是将查询API从 match 变为 match_phrase,还是贴出查询结果的 hist 部分:

"hits" : {
    "total" : 1,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        }
      }
    ]
  }

结果只有一条,这一条的signature字段必定包含 dislike watching movies 这个短语。

高亮搜索

api是 highlight ,注意它是与 query 同级的。

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "signature":"dislike watching movies"
    }
  },
  "highlight":{
      "fields":{
        "signature":{}
      }
    }
}

下面是查询结果:

"hits" : {
    "total" : 3,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        },
        "highlight" : {
          "signature" : [
            "I <em>dislike</em> <em>watching</em> <em>movies</em>,I like reading"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.29748765,
        "_source" : {
          "name" : "zhangsan",
          "age" : 28,
          "signature" : "I like watching movies",
          "hobby" : [
            "book",
            "music"
          ]
        },
        "highlight" : {
          "signature" : [
            "I like <em>watching</em> <em>movies</em>"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.27407023,
        "_source" : {
          "name" : "wangwu",
          "age" : 26,
          "signature" : "I also like watching movies",
          "hobby" : [
            "book",
            "game"
          ]
        },
        "highlight" : {
          "signature" : [
            "I also like <em>watching</em> <em>movies</em>"
          ]
        }
      }
    ]
  }

查询的每一个结果中多了一个 highlight 字段,该字段中会将目标字段中符合查询条件的单词用 <em>标签包上。

聚合分析

这里做一个最受欢迎的 hobby:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby"
      }
    }
  }
}
  1. aggs 表示聚合api开始
  2. all_hobby 为此次聚合统计名称,任意定义
  3. terms 为聚合api中的分词统计api,可以对指定字段分词,并统计每个词组在全文中的出现次数
  4. field 指定分析字段

但是这里执行报错了:

root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [hobby] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],

根据提示,文本类型字段的 fielddata 属性默认为关闭的,需要手动开启:

PUT /employee/_mapping/_doc
{
  "properties":{
    "hobby":{
      "type":"text",
      "fielddata":"true"
    }
  }
}

这里将hobby的fielddata设值为true后,ES会对hobby生成一个反向的倒排索引,类似于数据库中的索引,为了做分析、统计等功能。但是额外的索引会占用内存,建议不要在数据量较多的字段设置。也可以使用keyword字段来做分析、统计,像这样:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby.keyword"
      }
    }
  }
}

再次聚合统计,结果的 buckets 如下:

"buckets" : [
        {
          "key" : "book",
          "doc_count" : 2
        },
        {
          "key" : "music",
          "doc_count" : 2
        },
        {
          "key" : "game",
          "doc_count" : 1
        },
        {
          "key" : "movie",
          "doc_count" : 1
        }
      ]

可以看到最受欢迎的hobby是 book 。
也可以和 query 一起使用,它会在查询的结果中进行聚合统计,像这样:

GET /employee/_doc/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 26
          }
        }
      }
    }
  },
  "aggs": {
    "all_hobby": {
      "terms": {
        "field": "hobby.keyword"
      }
    }
  }
}

多层聚合

比如统计不同的 hobby 包含员工的平均年龄:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby.keyword"
      },
      "aggs":{
        "avg_age":{
          "avg":{
            "field":"age"
          }
        }
      }
    }
  }
}

结果如下:

"buckets" : [
        {
          "key" : "book",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.0
          }
        },
        {
          "key" : "music",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.5
          }
        },
        {
          "key" : "game",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 26.0
          }
        },
        {
          "key" : "movie",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 27.0
          }
        }
      ]

这里结果看起来比较复杂,拿出第一个:

 {
          "key" : "book",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.0
          }
        }
  1. key 分析的词组,从字段中分词获取
  2. doc_count 是分词统计结果,也就是key中的词组在全文中有多少个员工包含它
  3. avg_age 这个是内层聚合分析名称,在查询时自定义的
  4. value 是平均年龄,这个平均年龄是针对于上层的统计结果而言的,在这里就是对 hobby 含有 book 词组的俩个员工计算他们的平均年龄。
发布了23 篇原创文章 · 获赞 6 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/qq_22606825/article/details/84571042