ElasticSearch基础
1 引入ES
1.1 ES概念
- ElasticSearch是一个基于Lucene的搜索服务器
- 是一个分布式、高扩展、高实时的搜索与数据分析引擎
- 基于RESTful web接口
- Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎
Elasticsearch官网
应用场景:
- 搜索:海量数据的查询
- 日志数据分析
- 实时数据分析
1.2 ES存储和查询原理
1.2.1 倒排索引
倒排索引:将文档进行分词,形成词条和id的对应关系即为反向索引.
e.g:以唐诗为例
正向索引: 由《静夜思》–>窗前明月光—>“前”字
反向索引: “前”字–>窗前明月光–>《静夜思》
1 向索引的实现就是对诗句进行分词,分成单个的词,由词推据,即为反向索引.
2 “床前明月光”–> 分词,将一段文本按照一定的规则,拆分为不同的词条(term).
1.2.2 ES存储和查询的原理
index(索引):相当于mysql的库
映射:相当于mysql 的表结构
document(文档):相当于mysql的表中的数据
数据库查询存在的问题:
1 性能低:使用模糊查询,左边有通配符,不会走索引,会全表扫描,性能低
2. 功能弱:如果以”华为手机“作为条件,查询不出来数据
Es使用倒排索引,对title 进行分词
- 使用“手机”作为关键字查询
- 生成的倒排索引中,词条会排序,形成一颗树形结构,提升词条的查询速度
- 使用“华为手机”作为关键字查询
- 华为:1,3
- 手机:1,2,3
2 ES核心组成部分
索引(index)
ElasticSearch存储数据的地方,可以理解成关系型数据库中的数据库概念.
映射(mapping)
mapping定义了每个字段的类型、字段所使用的分词器等。相当于关系型数据库中的表结构.
文档(document)
Elasticsearch中的最小数据单元,常以json格式显示. 一个document相当于关系型数据库中的一行数据.
倒排索引
一个倒排索引由文档中所有不重复词的列表构成,对于其中每个词,对应一个包含它的文档id列表.
类型(type)
一种type就像一类表。如用户表、角色表等。在Elasticsearch7.X默认type为_doc
\- ES 5.x中一个index可以有多种type。
\- ES 6.x中一个index只能有一种type。
\ - ES 7.x以后,将逐步移除type这个概念,现在的操作已经不再使用,默认_doc
3 脚本操作ES
3.1 RESTful风格介绍
- ST(Representational State Transfer),表述性状态转移,是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就是RESTful。就是一种定义接口的规范
- 基于HTTP
- 使用XML格式定义或JSON格式定义
- 每一个URI代表1种资源
- 客户端使用GET、POST、PUT、DELETE 4个表示操作方式的动词对服务端资源进行操作:
- GET:用来获取资源
- POST:用来新建资源(也可以用于更新资源)
- PUT:用来更新资源
- DELETE:用来删除资源
3.2 操作索引
新建
PUT http://ip:端口/索引名称
查询:
GET http://ip:端口/索引名称 # 查询单个索引信息
GET http://ip:端口/索引名称1,索引名称2... # 查询多个索引信息
GET http://ip:端口/_all # 查询所有索引信息
删除索引:
DELETE http://ip:端口/索引名称
关闭、打开索引:
POST http://ip:端口/索引名称/_close
POST http://ip:端口/索引名称/_open
3.3 ES数据类型
1 简单数据类型
- 字符串
- 聚合:相当于mysql 中的sum(求和)
- text:会分词,不支持聚合
- keyword:不会分词,将全部内容作为一个词条,支持聚合
- 聚合:相当于mysql 中的sum(求和)
- 数值
- 布尔:boolean
- 二进制:binary
- 范围类型
- integer_range
- float_range
- long_range
- double_range
- date_range
- 日期:date
3.4 操作映射
使用kibana工具
1 先创建索引,再添加映射
# 创建索引
PUT person
GET person
#添加映射
PUT /person/_mapping
{
"properties":{
"name":{
"type":"text"
},
"age":{
"type":"integer"
}
}
}
2 创建索引并添加映射
#创建索引并添加映射
PUT /person1
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
}
GET person1/_mapping
添加字段
#添加字段
PUT /person1/_mapping
{
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
3.5 操作文档
1 添加文档,指定id
POST /person1/_doc/2
{
"name":"张三",
"age":18,
"address":"北京"
}
GET /person1/_doc/1
2 添加文档,不指定id
#添加文档,不指定id,会生成唯一随机的id
POST /person1/_doc/
{
"name":"张三",
"age":18,
"address":"北京"
}
#查询所有文档
GET /person1/_search
#删除指定id文档
DELETE /person1/_doc/1
3.6 分词器
IK分词器有两种分词模式:ik_max_word和ik_smart模式
1 ik_max_word: 会将文本做最细粒度的拆分
#方式一ik_max_word
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "乒乓球明年总冠军"
}
ik_max_word分词器执行如下:
{
"tokens" : [
{
"token" : "乒乓球",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "乒乓",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "球",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "明年",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "总冠军",
"start_offset" : 5,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "冠军",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 5
}
]
}
2 ik_smart: 会做最粗粒度的拆分
#方式二ik_smart
GET /_analyze
{
"analyzer": "ik_smart",
"text": "乒乓球明年总冠军"
}
ik_smart分词器执行如下:
{
"tokens" : [
{
"token" : "乒乓球",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "明年",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "总冠军",
"start_offset" : 5,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 2
}
]
}
- 词条查询:term
-词条查询不会分析查询条件,只有当词条和查询字符串完全匹配时才匹配搜索 - 全文查询:match
- 全文查询会分析查询条件,先将查询条件进行分词,然后查询,求并集
1 创建索引,添加映射,并指定分词器为ik分词器
PUT person2
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"address": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
2 添加文档
POST /person2/_doc/1
{
"name":"张三",
"age":18,
"address":"北京海淀区"
}
POST /person2/_doc/2
{
"name":"李四",
"age":18,
"address":"北京朝阳区"
}
POST /person2/_doc/3
{
"name":"王五",
"age":18,
"address":"北京昌平区"
}
3 查询映射
GET person2
4 查看分词效果
GET _analyze
{
"analyzer": "ik_max_word",
"text": "北京海淀"
}
5 词条查询:term
GET /person2/_search
{
"query": {
"term": {
"address": {
"value": "北京"
}
}
}
}
6 全文查询(match): 全文查询会分析查询条件,先将查询条件进行分词,然后查询,求并集
GET /person2/_search
{
"query": {
"match": {
"address":"北京昌平"
}
}
}
4 ES高级操作
4.1 bulk批量操作-脚本
#批量操作
#1.删除5号
#新增8号
#更新2号 name为2号
POST _bulk
{
"delete":{
"_index":"person1","_id":"5"}}
{
"create":{
"_index":"person1","_id":"8"}}
{
"name":"八号","age":18,"address":"北京"}
{
"update":{
"_index":"person1","_id":"2"}}
{
"doc":{
"name":"2号"}}
5 ES查询
5.1 matchAll-脚本
matchAll:查询所有
# 默认情况下,es一次展示10条数据,通过from和size来控制分页
# 查询结果详解
GET goods/_search
{
"query": {
"match_all": {
}
},
"from": 0,
"size": 100
}
5.2 termQuery
- text:会分词,不支持聚合
- keyword:不会分词,将全部内容作为一个词条,支持聚合
term查询:不会对查询条件进行分词
GET goods/_search
{
"query": {
"term": {
"title": {
"value": "华为"
}
}
}
}
term查询,查询text类型字段时,只有其中的单词相匹配都会查到,text字段会对数据进行分词.
查询categoryName 字段时,categoryName字段为keyword ,keyword:不会分词,将全部内容作为一个词条
GET goods/_search
{
"query": {
"term": {
"categoryName": {
"value": "华为手机"
}
}
}
}
5.3 matchQuery
match查询:
- 会对查询条件进行分词
- 然后将分词后的查询条件和词条进行等值匹配
- 默认取并集(OR)
# match查询
GET goods/_search
{
"query": {
"match": {
"title": "华为手机"
}
},
"size": 500
}
- match 的默认搜索(or 并集)
- match的 and(交集) 搜索
总结:
- term query会去倒排索引中寻找确切的term,它并不知道分词器的存在。这种查询适合keyword 、numeric、date
- match query知道分词器的存在。并且理解是如何被分词的
5.4 模糊查询-脚本
5.4.1 wildcard查询
wildcard查询: 会对查询条件进行分词。还可以使用通配符 ?(任意单个字符) 和 * (0个或多个字符)
# wildcard 查询。查询条件分词,模糊查询
GET goods/_search
{
"query": {
"wildcard": {
"title": {
"value": "华*"
}
}
}
}
5.4.2 正则查询
\W:匹配包括下划线的任何单词字符,等价于 [A-Z a-z 0-9_] 开头的反斜杠是转义符
+号多次出现
(.)*为任意字符
正则查询取决于正则表达式的效率
GET goods/_search
{
"query": {
"regexp": {
"title": "\\w+(.)*"
}
}
}
5.4.3 前缀查询
对keyword类型支持比较好
# 前缀查询 对keyword类型支持比较好
GET goods/_search
{
"query": {
"prefix": {
"brandName": {
"value": "三"
}
}
}
}
5.5 范围&排序查询
# 范围查询
GET goods/_search
{
"query": {
"range": {
"price": {
"gte": 2000,
"lte": 3000
}
}
},
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
5.6 queryString查询
queryString 多条件查询:
- 会对查询条件进行分词
- 然后将分词后的查询条件和词条进行等值匹配
- 默认取并集(OR)
- 可以指定多个查询字段
query_string:识别query中的连接符(or 、and)
# queryString
GET goods/_search
{
"query": {
"query_string": {
"fields": ["title","categoryName","brandName"],
"query": "华为 AND 手机"
}
}
}
simple_query_string:不识别query中的连接符(or 、and),查询时会将 “华为”、“and”、“手机”分别进行查询.
GET goods/_search
{
"query": {
"simple_query_string": {
"fields": ["title","categoryName","brandName"],
"query": "华为 AND 手机"
}
}
}
query_string:有default_operator连接符的脚本
GET goods/_search
{
"query": {
"query_string": {
"fields": ["title","brandName","categoryName"],
"query": "华为手机 "
, "default_operator": "AND"
}
}
}
simple_query_string:有default_operator连接符的脚本
GET goods/_search
{
"query": {
"simple_query_string": {
"fields": ["title","brandName","categoryName"],
"query": "华为手机 "
, "default_operator": "OR"
}
}
}
- query中的or and 是查询时 匹配条件是否同时出现----or 出现一个即可,and 两个条件同时出现
- default_operator的or and 是对结果进行 并集(or)、交集(and).
5.7 布尔查询-脚本
boolQuery:对多个查询条件连接。连接方式:
- must(and):条件必须成立
- must_not(not):条件必须不成立
- should(or):条件可以成立
- filter:条件必须成立,性能比must高。不会计算得分
**得分:**即条件匹配度,匹配度越高,得分越高
# boolquery
#must和filter配合使用时,max_score(得分)是显示的
#must 默认数组形式
GET goods/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"brandName": {
"value": "华为"
}
}
}
],
"filter":[
{
"term": {
"title": "手机"
}
},
{
"range":{
"price": {
"gte": 2000,
"lte": 3000
}
}
}
]
}
}
}
#filter 单独使用 filter可以是单个条件,也可多个条件(数组形式)
GET goods/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"brandName": {
"value": "华为"
}
}
}
]
}
}
}
5.8 聚合查询-脚本
-
指标聚合:相当于MySQL的聚合函数。max、min、avg、sum等.
-
桶聚合:相当于MySQL的 group by 操作。不要对text类型的数据进行分组,会失败.
# 聚合查询
# 指标聚合 聚合函数
GET goods/_search
{
"query": {
"match": {
"title": "手机"
}
},
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
}
}
# 桶聚合 分组
GET goods/_search
{
"query": {
"match": {
"title": "手机"
}
},
"aggs": {
"goods_brands": {
"terms": {
"field": "brandName",
"size": 100
}
}
}
}
5.9 高亮查询-脚本
高亮三要素:
- 高亮字段
- 前缀
- 后缀
GET goods/_search
{
"query": {
"match": {
"title": "电视"
}
},
"highlight": {
"fields": {
"title": {
"pre_tags": "<font color='red'>",
"post_tags": "</font>"
}
}
}
}
5.10 重建索引&索引别名
#查询别名 默认别名无法查看,默认别名同索引名
GET goods/_alias/
#结果
{
"goods" : {
"aliases" : {
}
}
}
1 新建student_index_v1索引
# -------重建索引-----------
# 新建student_index_v1。索引名称必须全部小写
PUT student_index_v1
{
"mappings": {
"properties": {
"birthday":{
"type": "date"
}
}
}
}
#查看 student_index_v1 结构
GET student_index_v1
#添加数据
PUT student_index_v1/_doc/1
{
"birthday":"1999-11-11"
}
#查看数据
GET student_index_v1/_search
#添加数据
PUT student_index_v1/_doc/1
{
"birthday":"1999年11月11日"
}
2 重建索引:将student_index_v1 数据拷贝到 student_index_v2
# 业务变更了,需要改变birthday字段的类型为text
# 1. 创建新的索引 student_index_v2
# 2. 将student_index_v1 数据拷贝到 student_index_v2
# 创建新的索引 student_index_v2
PUT student_index_v2
{
"mappings": {
"properties": {
"birthday":{
"type": "text"
}
}
}
}
# 将student_index_v1 数据拷贝到 student_index_v2
# _reindex 拷贝数据
POST _reindex
{
"source": {
"index": "student_index_v1"
},
"dest": {
"index": "student_index_v2"
}
}
GET student_index_v2/_search
PUT student_index_v2/_doc/2
{
"birthday":"1999年11月11日"
}
3 创建索引库别名:
**注意:**DELETE student_index_v1 这一操作将删除student_index_v1索引库,并不是删除别名.
# 思考: 现在java代码中操作es,还是使用的实student_index_v1老的索引名称。
# 1. 改代码(不推荐)
# 2. 索引别名(推荐)
# 步骤:
# 0. 先删除student_index_v1
# 1. 给student_index_v2起个别名 student_index_v1
# 先删除student_index_v1
#DELETE student_index_v1 这一操作将删除student_index_v1索引库
#索引库默认的别名与索引库同名,无法删除
# 给student_index_v1起个别名 student_index_v11
POST student_index_v2/_alias/student_index_v11
#测试删除命令
POST /_aliases
{
"actions": [
{
"remove": {
"index": "student_index_v1", "alias": "student_index_v11"}}
]
}
# 给student_index_v2起个别名 student_index_v1
POST student_index_v2/_alias/student_index_v1
#查询别名
GET goods/_alias/
GET student_index_v1/_search
GET student_index_v2/_search
6 ElasticSearch JavaApi
6.1 SpringBoot整合ES(7.4.0版本)
1 搭建SpringBoot工程
2 引入ElasticSearch相关坐标
<!--引入es的坐标-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.4.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.4.0</version>
</dependency>
3 测试ElasticSearchConfig
@Configuration
@ConfigurationProperties(prefix="elasticsearch")
public class ElasticSearchConfig {
private String host;
private int port;
public String getHost() {
return host;
}
public void setHost(String host) {
this.host = host;
}
public int getPort() {
return port;
}
public void setPort(int port) {
this.port = port;
}
@Bean
public RestHighLevelClient client(){
return new RestHighLevelClient(RestClient.builder(
new HttpHost(host,port,"http")
));
}
}
使用@Autowired注入RestHighLevelClient 如果报红线,则是因为配置类所在的包和测试类所在的包,包名不一致造成的.
@SpringBootTest
class ElasticsearchDay01ApplicationTests {
@Autowired
RestHighLevelClient client;
/**
* 测试
*/
@Test
void contextLoads() {
System.out.println(client);
}
}
6.2 创建索引
1 添加索引
/**
* 添加索引
* @throws IOException
*/
@Test
public void addIndex() throws IOException {
//1.使用client获取操作索引对象
IndicesClient indices = client.indices();
//2.具体操作获取返回值
//2.1 设置索引名称
CreateIndexRequest createIndexRequest=new CreateIndexRequest("cf");
CreateIndexResponse createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);
//3.根据返回值判断结果
System.out.println(createIndexResponse.isAcknowledged());
}
2 添加索引,并添加映射
/**
* 添加索引,并添加映射
*/
@Test
public void addIndexAndMapping() throws IOException {
//1.使用client获取操作索引对象
IndicesClient indices = client.indices();
//2.具体操作获取返回值
//2.具体操作,获取返回值
CreateIndexRequest createIndexRequest = new CreateIndexRequest("cf");
//2.1 设置mappings
String mapping = "{\n" +
" \"properties\" : {\n" +
" \"address\" : {\n" +
" \"type\" : \"text\",\n" +
" \"analyzer\" : \"ik_max_word\"\n" +
" },\n" +
" \"age\" : {\n" +
" \"type\" : \"long\"\n" +
" },\n" +
" \"name\" : {\n" +
" \"type\" : \"keyword\"\n" +
" }\n" +
" }\n" +
" }";
createIndexRequest.mapping(mapping,XContentType.JSON);
CreateIndexResponse createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);
//3.根据返回值判断结果
System.out.println(createIndexResponse.isAcknowledged());
}
6.3 创建索引
查询索引:
/**
* 查询索引
*/
@Test
public void queryIndex() throws IOException {
IndicesClient indices = client.indices();
GetIndexRequest getRequest=new GetIndexRequest("cf");
GetIndexResponse response = indices.get(getRequest, RequestOptions.DEFAULT);
Map<String, MappingMetaData> mappings = response.getMappings();
//iter 提示foreach
for (String key : mappings.keySet()) {
System.out.println(key+"==="+mappings.get(key).getSourceAsMap());
}
}
删除索引
/**
* 删除索引
*/
@Test
public void deleteIndex() throws IOException {
IndicesClient indices = client.indices();
DeleteIndexRequest deleteRequest=new DeleteIndexRequest("cf2");
AcknowledgedResponse delete = indices.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println(delete.isAcknowledged());
}
索引是否存在
/**
* 索引是否存在
*/
@Test
public void existIndex() throws IOException {
IndicesClient indices = client.indices();
GetIndexRequest getIndexRequest=new GetIndexRequest("cf2");
boolean exists = indices.exists(getIndexRequest, RequestOptions.DEFAULT);
System.out.println(exists);
}
6.4 添加文档
1 添加文档,使用map作为数据
@Test
public void addDoc1() throws IOException {
Map<String, Object> map=new HashMap<>();
map.put("name","张三");
map.put("age","18");
map.put("address","北京二环");
IndexRequest request=new IndexRequest("cf").id("1").source(map);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response.getId());
}
2 添加文档,使用对象作为数据
@Test
public void addDoc2() throws IOException {
Person person=new Person();
person.setId("2");
person.setName("李四");
person.setAge(20);
person.setAddress("北京三环");
String data = JSON.toJSONString(person);
IndexRequest request=new IndexRequest("cf").id(person.getId()).source(data,XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response.getId());
}
6.5 修改、查询、删除文档
1 修改文档:添加文档时,如果id存在则修改,id不存在则添加
/**
* 修改文档:添加文档时,如果id存在则修改,id不存在则添加
*/
@Test
public void UpdateDoc() throws IOException {
Person person=new Person();
person.setId("2");
person.setName("李四");
person.setAge(20);
person.setAddress("北京三环车王");
String data = JSON.toJSONString(person);
IndexRequest request=new IndexRequest("cf").id(person.getId()).source(data,XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response.getId());
}
2 根据id查询文档
/**
* 根据id查询文档
*/
@Test
public void getDoc() throws IOException {
//设置查询的索引、文档
GetRequest indexRequest=new GetRequest("cf","2");
GetResponse response = client.get(indexRequest, RequestOptions.DEFAULT);
System.out.println(response.getSourceAsString());
}
3 根据id删除文档
/**
* 根据id删除文档
*/
@Test
public void delDoc() throws IOException {
//设置要删除的索引、文档
DeleteRequest deleteRequest=new DeleteRequest("cf","1");
DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println(response.getId());
}
6.6 JAVA AP操作ES高级操作
准备条件:
1 创建一个实体类Goods ,添加通用mapper坐标,定义一个GoodsMapper接口继承Mapper接口.
批量插入测试代码
/**
* 批量插入测试代码
*/
@Test
public void importData() throws IOException {
//1.查询所有数据,mysql
List<Goods> goodsList = goodsMapper.selectAll();
//判断数据是否有值
System.out.println(goodsList.size());
//2.bulk导入
BulkRequest bulkRequest = new BulkRequest();
//2.1 循环goodsList,创建IndexRequest添加数据
for (Goods goods : goodsList) {
//2.2 设置spec规格信息 Map的数据 specStr:{}
String specStr = goods.getSpecStr();
//将json格式字符串转为Map集合
Map map = JSON.parseObject(specStr, Map.class);
//设置spec map
goods.setSpec(map);
//将goods对象转换为json字符串
String data = JSON.toJSONString(goods);//map --> {}
IndexRequest indexRequest = new IndexRequest("goods");
indexRequest.id(goods.getId()+"").source(data, XContentType.JSON);
bulkRequest.add(indexRequest);
}
BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println(response.status());
}
matchAll查询所有
/**
* 查询所有
* 1. matchAll
* 2. 将查询结果封装为Goods对象,装载到List中
* 3. 分页。默认显示10条
*/
@Test
public void testMatchAll() throws IOException {
//2 构建查询请求对象 指定查询的索引名称
SearchRequest searchRequest = new SearchRequest("goods");
//4 创建查询条件构建器
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
----------------------------------------
//6 查询条件
QueryBuilder query = QueryBuilders.matchAllQuery();
----------------------------------------
//5 指定查询条件
sourceBuilder.query(query);
//3 添加查询条件构建器
searchRequest.source(sourceBuilder);
-----------------------------------
//8 添加分页信息
sourceBuilder.from(1);
sourceBuilder.size(50);
-----------------------------------
//1 查询,获取查询结果
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
//7 获取命中对象
SearchHits searchHits = searchResponse.getHits();
//7.1 获取总记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数"+value);
List<Goods> goodsList = new ArrayList<>();
//7.2 获取Hits数据 数组
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
//获取json字符串格式的数据
String sourceAsString = hit.getSourceAsString();
//转为java对象
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
termQuery:词条查询 不会分词
/**
* termQuery:词条查询 不会分词
*/
@Test
public void testTermQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
QueryBuilder query = QueryBuilders.termQuery("title","华为");//term词条查询
sourceBulider.query(query);
//------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
==matchQuery:词条查询 ==
/**
* matchQuery:词条查询 分词查询之wildcard查询
*/
@Test
public void testMatchQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
MatchQueryBuilder query = QueryBuilders.matchQuery("title", "华为手机");//term词条查询
query.operator(Operator.AND);//求并集
sourceBulider.query(query);
//------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
/**
* WildcardQuery:词条查询 分词查询之prefix查询
*/
@Test
public void testWildcardQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
//1 wildcard查询:可以理解为通配符查询,?表示一个单词,*表示任意个
// WildcardQueryBuilder query = QueryBuilders.wildcardQuery("title", "华*");
//2 prefix查询 对keyword类型支持比较好
PrefixQueryBuilder query = QueryBuilders.prefixQuery("brandName", "三");
sourceBulider.query(query);
//------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
rangeQuery范围查询
/**
* 1. 范围查询:rangeQuery
* 2. 排序
*/
@Test
public void testRangeQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
RangeQueryBuilder query = QueryBuilders.rangeQuery("price");
query.gte(2000);
query.lte(3000);
sourceBulider.query(query);
//排序
sourceBulider.sort("price", SortOrder.DESC);
//------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
queryString多条件查询
/**
* 1. 多条件查询queryString
*/
@Test
public void testQueryString() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
QueryStringQueryBuilder query = QueryBuilders.queryStringQuery("华为手机").field("title").field("brandName").field("categoryName").defaultOperator(Operator.AND);
sourceBulider.query(query);
//------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
boolQuery布尔查询
/**
* 布尔查询:boolQuery
* 1. 查询品牌名称为:华为
* 2. 查询标题包含:手机
* 3. 查询价格在:2000-3000
*/
@Test
public void testBoolQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------------------------
//1.构建boolQuery
BoolQueryBuilder query = QueryBuilders.boolQuery();
//2.1 查询品牌名称为:华为
TermQueryBuilder termQuery = QueryBuilders.termQuery("brandName", "华为");
query.must(termQuery);
//2.2. 查询标题包含:手机
MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("title", "手机");
query.filter(matchQuery);
//2.3 查询价格在:2000-3000
QueryBuilder rangeQuery = QueryBuilders.rangeQuery("price");
((RangeQueryBuilder) rangeQuery).gte(2000);
((RangeQueryBuilder) rangeQuery).lte(3000);
query.filter(rangeQuery);
//3.使用boolQuery连接
//------------------------------------------------
sourceBulider.query(query);
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
}
聚合查询:桶聚合,分组查询
/**
* 聚合查询:桶聚合,分组查询
* 1. 查询title包含手机的数据
* 2. 查询品牌列表
*/
@Test
public void testAggQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
//------------------------聚合条件----------------------------------------
// 1. 查询title包含手机的数据
MatchQueryBuilder query = QueryBuilders.matchQuery("title", "手机");
sourceBulider.query(query);
// 2. 查询品牌列表
/*
参数:
1. 自定义的名称,将来用于获取数据
2. 分组的字段
*/
AggregationBuilder agg = AggregationBuilders.terms("goods_brands").field("brandName").size(100);
sourceBulider.aggregation(agg);
//------------------------------------------------------------------------
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
//--------------通过聚合查询获取聚合结果--------------------------
// 获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
Map<String, Aggregation> aggregationMap = aggregations.asMap();
//System.out.println(aggregationMap);
Terms goods_brands = (Terms) aggregationMap.get("goods_brands");
List<? extends Terms.Bucket> buckets = goods_brands.getBuckets();
List brands = new ArrayList();
for (Terms.Bucket bucket : buckets) {
Object key = bucket.getKey();
brands.add(key);
}
for (Object brand : brands) {
System.out.println(brand);
}
//--------------------------------------------------------
}
高亮查询
/**
*
* 高亮查询:
* 1. 设置高亮
* * 高亮字段
* * 前缀
* * 后缀
* 2. 将高亮了的字段数据,替换原有数据
*/
@Test
public void testHighLightQuery() throws IOException {
SearchRequest searchRequest = new SearchRequest("goods");
SearchSourceBuilder sourceBulider = new SearchSourceBuilder();
// 1. 查询title包含手机的数据
MatchQueryBuilder query = QueryBuilders.matchQuery("title", "手机");
sourceBulider.query(query);
//---------------设置高亮ing--------------------------------
//设置高亮
HighlightBuilder highlighter = new HighlightBuilder();
//设置三要素
highlighter.field("title");
highlighter.preTags("<font color='red'>");
highlighter.postTags("</font>");
sourceBulider.highlighter(highlighter);
//------------------设置高亮end----------------------------------
// 2. 查询品牌列表
/*
参数:
1. 自定义的名称,将来用于获取数据
2. 分组的字段
*/
AggregationBuilder agg = AggregationBuilders.terms("goods_brands").field("brandName").size(100);
sourceBulider.aggregation(agg);
searchRequest.source(sourceBulider);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits();
//获取记录数
long value = searchHits.getTotalHits().value;
System.out.println("总记录数:"+value);
List<Goods> goodsList = new ArrayList<>();
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
//转为java
Goods goods = JSON.parseObject(sourceAsString, Goods.class);
//----------------替换高亮结果ing--------------------------
// 获取高亮结果,替换goods中的title
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField HighlightField = highlightFields.get("title");
Text[] fragments = HighlightField.fragments();
//替换
goods.setTitle(fragments[0].toString());
//-------------------替换高亮结果end---------------------------
goodsList.add(goods);
}
for (Goods goods : goodsList) {
System.out.println(goods);
}
// 获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
Map<String, Aggregation> aggregationMap = aggregations.asMap();
//System.out.println(aggregationMap);
Terms goods_brands = (Terms) aggregationMap.get("goods_brands");
List<? extends Terms.Bucket> buckets = goods_brands.getBuckets();
List brands = new ArrayList();
for (Terms.Bucket bucket : buckets) {
Object key = bucket.getKey();
brands.add(key);
}
for (Object brand : brands) {
System.out.println(brand);
}
}
7 SpringBoot整合ES(5.6.8版本)
1 搭建SpringBoot工程
2 引入ElasticSearch相关坐标
ElasticSearch和springboot整合(5.6.8版本)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
3 在配置文件中添加elasticSearch信息
spring:
data:
elasticsearch:
cluster-name: my-application
cluster-nodes: 192.168.174.90:9300
4 创建一个接口继承ElasticsearchRepository
/**
* 创建一个接口,实现ElasticsearchRepository类<实体类,id类型>
*/
public interface UserInfoRepository extends ElasticsearchRepository<UserInfo,Long> {
}
5 创建一个实体类
//注意:索引indexName值不可以包含大写字母
@Data
@Document(indexName = "userinfo",type = "doc")
@Table(name = "userinfo")
public class UserInfo {
@Id
Long id;
@Field(type = FieldType.Text,analyzer = "ik_max_word")
String title; //标题
@Field(type = FieldType.Keyword)
String category;// 分类
@Field(type = FieldType.Keyword)
String brand; // 品牌
@Field(type = FieldType.Double)
Double price; // 价格
@Field(type = FieldType.Keyword)
String images; // 图片地址
}
6 创建实体类接口继承通用Mapper
public interface UserInfoMapper extends Mapper<UserInfo> {
}
7 测试
@RunWith(SpringRunner.class)
@SpringBootTest
public class EsTest {
@Autowired
private ElasticsearchTemplate template;
@Autowired
private UserInfoRepository userInfoRepository;
@Autowired
private UserInfoMapper userInfoMapper;
......
}
7.1 创建索引
/**
* 创建索引userinfo (只能小写)
*/
@Test
public void createIndex() {
template.createIndex(UserInfo.class);
}
7.2 添加文档
/**
* 添加文档
*/
@Test
public void testAdd() {
UserInfo userInfo = new UserInfo();
userInfo.setId(1L);
userInfo.setTitle("我是程序员");
userInfo.setBrand("小米");
userInfo.setCategory("手机");
userInfo.setPrice(12d);
userInfoRepository.save(userInfo);
}
7.3 修改文档
/**
* id存在就是修改,否则就是插入
*/
@Test
public void testUpdate() {
UserInfo userInfo = new UserInfo();
userInfo.setId(1L);
userInfo.setTitle("我是程序员777");
userInfo.setBrand("小米");
userInfo.setCategory("手机");
userInfo.setPrice(12d);
userInfoRepository.save(userInfo);
}
7.4 批量添加文档
/**
* 批量添加文档
*/
@Test
public void testAddList() {
UserInfo userInfo = new UserInfo();
userInfo.setId(2L);
userInfo.setTitle("我是前端工程师");
userInfo.setBrand("小米");
userInfo.setCategory("手机");
userInfo.setPrice(12d);
UserInfo userInfo2 = new UserInfo();
userInfo2.setId(3L);
userInfo2.setTitle("我是java工程师");
userInfo2.setBrand("小米");
userInfo2.setCategory("手机");
userInfo2.setPrice(12d);
List<UserInfo> list = new ArrayList<>();
list.add(userInfo);
list.add(userInfo2);
userInfoRepository.saveAll(list);
}
7.5 删除文档
/**
* 删除文档
*/
@Test
public void testDelete() {
userInfoRepository.deleteById(1L);
}
7.6 根据id查询
/**
* 根据id查询
*/
@Test
public void findById() {
Optional<UserInfo> optional = userInfoRepository.findById(2L);
UserInfo userInfo = optional.get();
System.out.println(userInfo);
}
7.7 查询所有,并按照id降序排序
/**
* 查询所有,并按照id降序排序
*/
@Test
public void find() {
Iterable<UserInfo> iterable = userInfoRepository.findAll(Sort.by(Sort.Direction.DESC, "id"));
// iterable.forEach(userInfo -> System.out.println(userInfo));
iterable.forEach(System.out::println);
}
7.7 从数据库中查询数据 批量导入ES
/**
* 从数据库中查询数据 批量导入elasticsearch索引库
*/
@Test
public void fromDBImportData() {
//从数据库查询数据
List<UserInfo> userInfos = userInfoMapper.selectAll();
//导入elasticsearch索引库
userInfoRepository.saveAll(userInfos);
}
8 ES集群简介
8.1 ES集群概述
-
集群和分布式:
- 集群:多个人做一样的事。
- 分布式:多个人做不一样的事
-
集群解决的问题:
- 让系统高可用
- 分担请求压力
-
分布式解决的问题:
- 分担存储和计算的压力,提速解耦
-
集群和分布式架构往往是并存的
8.2 ES集群相关概念
ES 集群:
- ElasticSearch 天然支持分布式
- ElasticSearch 的设计隐藏了分布式本身的复杂性
ES集群相关概念:
- 集群(cluster): 一组拥有共同的 cluster name的节点
- 节点(node) : 集群中的一个 Elasticearch 实例
- 索引(index) : es存储数据的地方。相当于关系数据库中的database概念
- 分片(shard): 索引可以被拆分为不同的部分进行存储,称为分片。在集群环境下,一个索引的不同分片可以拆分到不同的节点中.
- 主分片(Primary shard): 相对于副本分片的定义
- 副本分片(Replica shard): 每个主分片可以有一个或者多个副本,数据和主分片一样
8.3 JavaAPI 访问集群
1 配置文件application.yml
elasticsearch:
host: 192.168.200.130
port: 9200
host1: 192.168.200.130
port1: 9201
host2: 192.168.200.130
port2: 9202
host3: 192.168.200.130
port3: 9203
2 配置类ElasticSearchConfig
private String host1;
private int port1;
private String host2;
private int port2;
private String host3;
private int port3;
//get/set ...
@Bean("clusterClient")
public RestHighLevelClient clusterClient(){
return new RestHighLevelClient(RestClient.builder(
new HttpHost(host1,port1,"http"),
new HttpHost(host2,port2,"http"),
new HttpHost(host3,port3,"http")
));
}
3 测试类
@Resource(name="clusterClient")
RestHighLevelClient clusterClient;
/**
* 测试集群
* @throws IOException
*/
@Test
public void testCluster() throws IOException {
//设置查询的索引、文档
GetRequest indexRequest=new GetRequest("cluster_test","1");
GetResponse response = clusterClient.get(indexRequest, RequestOptions.DEFAULT);
System.out.println(response.getSourceAsString());
}
8.4 分片配置
- 在创建索引时,如果不指定分片配置,则默认主分片1,副本分片1
- 在创建索引时,可以通过settings设置分片
#分片配置
#"number_of_shards": 3, 主分片数量
#"number_of_replicas": 1 主分片备份数量,每一个主分片有一个备份
# 3个主分片+3个副分片=6个分片
PUT cluster_test1
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"name":{
"type": "text"
}
}
}
}
1 三个节点正常运行(0、1、2分片标号)
2 节点3号挂掉
3 将挂掉节点的分片,自平衡到其他节点
4 节点3号恢复正常后,节点分片将自平衡回去(并不一定是原来的分片)
举例:3个节点 5个分片
分片与自平衡
- 当节点挂掉后,挂掉的节点分片会自平衡到其他节点中
注意:分片数量一旦确定好,不能修改。
索引分片推荐配置方案:
- 每个分片推荐大小10-30GB
- 分片数量推荐 = 节点数量 * 1~3倍
e.g:1000GB数据,应该有多少个分片?多少个节点?
- 每个分片20多GB 则可以分为40个分片
- 分片数量推荐 = 节点数量 * 1~3倍 --> 40/2=20 即20个节点
8.5 路由原理
路由: 文档存入对应的分片,ES计算分片编号的过程.
路由算法 : shard_index = hash(id) % number_of_primary_shards
查询id为5的文档:假如hash(5)=17 ,根据算法17%3=2,ES就去节点2查询结果.
总结: 因ES存,取数据都根据路由算法,所以ES分片数量一旦确定好,不支持修改分片数量
8.6 脑裂
ElasticSearch 集群正常状态:
- 一个正常es集群中只有一个主节点(Master),主节点负责管理整个集群。如创建或删除索引,跟踪哪些节点是群集的一部分,并决定哪些分片分配给相关的节点.
- 集群的所有节点都会选择同一个节点作为主节点
脑裂现象:
- 脑裂问题的出现就是因为从节点在选择主节点上出现分歧导致一个集群出现多个主节点从而使集群分裂,使得集群处于异常状态.
脑裂产生的原因:
1 网络原因:网络延迟
2 节点负载
- 主节点的角色既为master又为data。数据访问量较大时,可能会导致Master节点停止响应(假死状态).
3 JVM内存回收
- 当Master节点设置的JVM内存较小时,引发JVM的大规模内存回收,造成ES进程失去响应
避免脑裂:
1 网络原因:
- discovery.zen.ping.timeout 超时时间配置大一点。默认是3S
2 节点负载:角色分离策略 - 候选主节点配置为:
- node.master: true
- node.data: false
- 数据节点配置为:
- node.master: false
- node.data: true
3 JVM内存回收:
- 修改 config/jvm.options 文件的 -Xms 和 -Xmx 为服务器的内存一半