ES（四）ES使用（基本查询、聚合查询）

基本操作

操作索引

1.新建索引

curl -XPUT localhost:9200/index01

文档管理

1.新建文档

curl -XPUT -d ‘{‘id’:1,‘title’:‘es简介’}’ http://localhost:9200/index01/article/1

2.获取文档

curl -XGET http://192.168.168.101:9200/index01/article/1

3.删除文档

curl -XDELETE http://192.168.168.101:9200/index01/article/1

查询操作

类Lucene查询


_exists_:execution_completed_time
__type:company_extended_business
weibo_type:18 OR weibo_type:24 OR weibo_type:25
NOT company_id:442966
first_consume_time:{'2019-01-03 00:00:00' TO '2019-01-03 00:00:00'}

基本查询

指定请求头

–header “content-Type:application/json”

准备数据

curl -XPUT -d '{"id":1,"title":"es简介","content":"es好用好用真好用"}' http://192.168.168.101:9200/index01/article/1
curl -XPUT -d '{"id":1,"title":"java编程思想","content":"这就是个工具书"}' http://192.168.168.101:9200/index01/article/2
curl -XPUT -d '{"id":1,"title":"大数据简介","content":"你知道什么是大数据吗，就是大数据"}' http://192.168.168.101:9200/index01/article/3

term query

curl -XGET http://192.168.168.101:9200/index01/_search -d {'query':{'term':{'title':'你好'}}}

查询的字段只有一个值得时候，应该使用term而不是terms，在查询字段包含多个的时候才使用terms，使用terms语法，json中必须包含数组

match在匹配时会对所查找的关键词进行分词，然后按分词匹配查找，而term会直接对关键词进行查找。一般**模糊查找的时候，多用match，而精确查找时可以使用term

terms query

{
    'query':{
        'terms':{
            'tag':["search",'nosql','hello']
        }
    }
}

match query

{'query':{'match':{'title':'你好'}}}

{
   "query": {
     "match": {
       "__type": "info"
     }
   },
   "sort": [
     {
       "campaign_end_time": {
         "order": "desc"
       }
     }
   ]
}

match_all

{'query':{'match_all':{'title':'标题一样'}}}

multi match

多值匹配查询

{
  "query": {
    "multi_match": {
      "query": "运动 上衣",
      "fields": [
        "brandName^100",
        "brandName.brandName_pinyin^100",
        "brandName.brandName_keyword^100",
        "sortName^80",
        "sortName.sortName_pinyin^80",
        "productName^60",
        "productKeyword^20"
      ],
      "type": <multi-match-type>,
      "operator": "AND"
    }
  }
}

Bool query

bool查询包含四个子句，must，filter，should，must_not

{
    'query':{
        'bool':{
            'must':[{
                'term':{
                    '_type':{
                        'value':'age'
                    }
                }
             },{
                 'term':{
                     'account_grade':{
                         'value':'23'
                     }
                 }
             }
           ]
            
        }
    }
    
}

{
	"bool":{
            "must":{
                "term":{"user":"lucy"}
            },
            "filter":{
                "term":{"tag":"teach"}	
            },
            "should":[
              	{"term":{"tag":"wow"}},
                {"term":{"tag":"elasticsearch"}}
            ],
           	"mininum_should_match":1,
           	"boost":1.0  		            
        }
}

Filter query

query和filter的区别：query查询的时候，会先比较查询条件，然后计算分值，最后返回文档结果；而filter是先判断是否满足查询条件，如果不满足会缓存查询结果（记录该文档不满足结果），满足的话，就直接缓存结果

filter快在：对结果进行缓存，避免计算分值

{
    "query": {
      "bool": {
        "must": [
          {"match_all": {}}
        ],
        "filter": {
          "range": {
            "create_admin_id": {
              "gte": 10,
              "lte": 20
            }
          }
        }
      }
    }
}

range query

{
	'query':{
    	'range':{
            'age':{
                'gte':'30',
                'lte':'20'
            }
    	}
	}
}

通配符查询

{
    'query':{
        'wildcard':{
            'title':'cr?me'
        }
    }
    
}

正则表达式查询

{
    'query':{
        'regex':{
            'title':{
                'value':'cr.m[ae]',
                'boost':10.0
            }
        }
    }
}

前缀查询

{
    'query':{
        'match_phrase_prefix':{
            'title':{
                'query':'crime punish',
                'slop':1
            }
        }
    }
}

query_string

{
    'query':{
        'query_string':{
            'query':'title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)'
        }
    }
}

聚合查询

聚合提供了用户进行分组和数理统计的能力，可以把聚合理解成SQL中的GROUP BY和分组函数

指标聚合/桶聚合

Metrics（度量/指标）：简单的对过滤出来的数据集进行avg，max操作，是一个单一的数值

Bucket（桶）：将过滤出来的数据集按条件分成多个小数据集，然后Metrics会分别作用在这些小数据集上

max/min/avg/sum/stats

{
    'aggs':{c
        'group_sum':{
            'sum':{
                'field':'money'
            }
        }
    }
}

{
   "aggs":{
      "avg_fees":{
      		"avg":{
      			"field":"fees"
      		}
      	}
   }
}

terms聚合

terms根据字段值项分组聚合.field按什么字段分组,size指定返回多少个分组,shard_size指定每个分片上返回多少个分组,order排序方式.可以指定include和exclude正则筛选表达式的值,指定missing设置缺省值

{
    'aggs':{
        'group_by_type':{
            'terms':{
                'field':'_type'
            }
        }
    }
}

{
    "size": 0, 
    "aggs": {
      "terms":{
        "terms": {
          "field": "__type",
          "size": 10
        }
      }
    }
}
{
    "size": 0, 
    "aggs": {
      "terms":{
        "terms": {
          "field": "__type",
          "size": 10,
          "order": {
            "_count": "asc"
          }
        }
      }
    }
}
{
    "size": 0, 
    "aggs": {
      "agg_terms": {
        "terms": {
          "field": "cost",
          "order": {
            "_count": "asc"
          }
        },
        "aggs": {
          "max_balance": {
            "max": {
              "field": "cost"
            }
          }
        }
      }
    }
}
{
    "size": 0, 
    "aggs": {
      "agg_terms": {
        "terms": {
          "field": "cost",
          "include": ".*",
          "exclude": ".*"
        }
      }
    }
}

cardinality去重

{
    "size": 0, 
    "aggs": {
      "count_type": {
        "cardinality": {
          "field": "__type"
        }
      }
    }
}
cardinality

percentiles百分比

percentiles对指定字段（脚本）的值按从小到大累计每个值对应的文档数的占比（占所有命中文档数的百分比），返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值
{
    "size": 0, 
    "aggs": {
      "age_percents":{
        "percentiles": {
          "field": "age",
          "percents": [
            1,
            5,
            25,
            50,
            75,
            95,
            99
          ]
        }
      }
       
    }
}


{
  "size": 0,
  "aggs": {
    "states": {
      "terms": {
        "field": "gender"
      },
      "aggs": {
        "banlances": {
          "percentile_ranks": {
            "field": "balance",
            "values": [
              20000,
              40000
            ]
          }
        }
      }
    }
  }

percentiles rank

统计小于等于指定值得文档比

{
    "size": 0, 
    "aggs": {
      "tests": {
        "percentile_ranks": {
          "field": "age",
          "values": [
            10,
            15
          ]
        }
      }
    }
}

filter聚合

filter对满足过滤查询的文档进行聚合计算,在查询命中的文档中选取过滤条件的文档进行聚合,先过滤在聚合

{
    "size": 0, 
    "aggs": {
      "agg_filter":{
        "filter": {
          "match":{"gender":"F"}
        },
        "aggs": {
          "avgs": {
            "avg": {
              "field": "age"
            }
          }
        }
      }
    }
}

filtters聚合

多个过滤组聚合计算

{
    "size": 0, 
    "aggs": {
      "message": {
        "filters": {
          
          "filters": {
            "errors": {
              "exists": {
                "field": "__type"
              }
            },
            "warring":{
              "term": {
                "__type": "info"
              }
            }
          }
        }
      }
    }
}

range聚合

{
    "aggs": {
      "agg_range": {
        "range": {
          "field": "cost",
          "ranges": [
            {
              "from": 50,
              "to": 70
            },
            {
              "from": 100
            }
          ]
        },
        "aggs": {
          "bmax": {
            "max": {
              "field": "cost"
            }
          }
        }
      }
    }
}

date_range聚合

{
     "aggs": {
       "date_aggrs": {
         "date_range": {
           "field": "accepted_time",
           "format": "MM-yyy", 
           "ranges": [
             {
               "from": "now-10d/d",
               "to": "now"
             }
           ]
         }
       }
     }
}

date_histogram

时间直方图聚合,就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合

{ 
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "accepted_time",
        "interval": "quarter",
        "min_doc_count" : 0, //可以返回没有数据的月份
        "extended_bounds" : { //强制返回数据的范围
           "min" : "2014-01-01",
           "max" : "2014-12-31"
        }
      }
    }
  }
}

missing聚合

{ 
  
  "aggs": {
    "account_missing": {
      "missing": {
        "field": "__type"
      }
    }
  }
}

LogStash操作

启动logStash

logstash -e ‘input{stdin{}}output{stdout{codec=>rubydebug}}’

IK分词器

curl -XPOST http://192.168.168.101:9200/_analyze -d ‘{“analyzer”:“ik”,“text”:“JAVA编程思想”}’
http://192.168.168.101:9200/index01/_analyze?analyzer=ik&text=中华人民共和国

IK分词器
curl -XPUT -d ‘{“id”:1,“kw”:“我们都爱中华人民共和国”}’ http://192.168.168.101:9200/haha1/haha/1

Mapping

查看mapping
curl -XGET http://192.168.168.101:9200/jtdb_item/tb_item/_mapping