ElasticSearch first day of learning

Introduction to ElasticSearch

What is ElasticSearch
Elaticsearch, abbreviated as es, es is an open source, highly-scalable distributed full-text search engine, which can store and retrieve data in near real-time; it has good scalability and can be extended to hundreds of servers, processing PB level The data. es also uses Java development and uses Lucene as its core to implement all indexing and search functions, but its purpose is to hide the complexity of Lucene through a simple RESTful API, so that full-text search becomes simple.
ElasticSearch vs. Solr

  • Solr uses Zookeeper for distributed management, while Elasticsearch itself has distributed coordination management functions;
  • Solr supports more formats of data, while Elasticsearch only supports json file format;
  • Solr officially provides more functions, while Elasticsearch itself focuses more on core functions, and most advanced functions are provided by third-party plug-ins;
  • Solr performs better than Elasticsearch in traditional search applications, but its efficiency is significantly lower than Elasticsearch when dealing with real-time search applications

ElasticSearch installation and startup

ElasticSearch is developed using java, and the jdk version required for this version of es is 1.8 or higher, so before installing ElasticSearch, ensure that JDK1.8+ is installed and the JDK environment variables are correctly configured, otherwise the startup of ElasticSearch will fail.

download

The official address of ElasticSearch: https://www.elastic.co/products/elasticsearch

installation

The installation of the Window version of ElasticSearch is very simple. Similar to the Window version of Tomcat, the installation is complete after unzipping. The directory structure of the unzipped ElasticSearch is as follows.
Insert picture description here
Modify the elasticsearch configuration file: config/elasticsearch.yml and add the following two commands:

http.cors.enabled: true
http.cors.allow‐origin: “*”

This step is to allow elasticsearch to cross-access. If you do not install the elasticsearch-head, you can start it directly without modifying it.

Start ES service

Click elasticsearch.bat in the bin directory under ElasticSearch to start, and the log information displayed on the console is as follows:
Insert picture description here
note: 9300 is the tcp communication port, which is executed between clusters and TCPClient, and 9200 is the RESTful interface of the http protocol.
Visit the ElasticSearch server through a browser and see the following json information returned, which means that the service is started successfully:
Insert picture description here
here it means that your es is installed and started. There is no graphical page, unlike solr, it does not come with it, and it is not angry gas! ! !
You have to install the ES graphical graphical interface plug-in

Install ES graphical graphical interface plug-in

There are two ways to install plug-ins, online installation and local installation. This document uses the local installation method to install the head plugin. To install head in elasticsearch-5-* version above, you need to install node and grunt to
download the head plugin: https://github.com/mobz/elasticsearch-head
Unzip the elasticsearch-head-master compressed package to any directory, but it must be installed with elasticsearch The directory is distinguished.
Download nodejs: https://nodejs.org/en/download/
Double-click the installer and go to the next step.
After installation, you can enter through the cmd console: node -v to view the version number.
Insert picture description here
Install grunt as a global command. Grunt is a project building tool based on Node.js.
Enter the following command in the cmd console:
download the grount command

npm install ‐g grunt‐cli

Enter the elasticsearch-head-master directory to start head, and enter the command at the command prompt:

npm install
grunt server

Insert picture description here
Enter
http://localhost:9100 in the browser.
Insert picture description here
If you cannot connect to the es service successfully, you need to modify the configuration file in the ElasticSearch config directory: config/elasticsearch.yml, and add the following two commands

http.cors.enabled: true
http.cors.allow‐origin: “*”

Then restart the ElasticSearch service.

ElasticSearch related concepts (terms)

Elasticsearch is document oriented, which means it can store entire objects or documents. However, it is not only storage, it also indexes the content of each document so that it can be searched. In Elasticsearch, you can index, search, sort, and filter documents (rather than rows and columns). Elasticsearch compares traditional relational databases as follows:

Relational DB ‐> Databases ‐> Tables ‐> Rows ‐> Columns
Elasticsearch ‐> Indices ‐> Types ‐> Documents ‐> Fields

Elasticsearch core concepts

Index

An index is a collection of documents with similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and an index for order data. An index is identified by a name (must be all lowercase letters), and when we want to index, search, update, and delete documents corresponding to this index, we must use this name. In a cluster, any number of indexes can be defined.

Field

It is equivalent to the field of the data table, and the classification and identification of the document data according to different attributes

Mapping

Mapping is to make some restrictions on the way and rules of processing data, such as the data type of a field, default value, analyzer, whether to be indexed, etc. These are all settings that can be set in the mapping. The use of rule settings is also called mapping. Processing data according to the optimal rules greatly improves performance. Therefore, it is necessary to establish a mapping, and it is necessary to think about how to establish a mapping to achieve better performance.

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a certain customer, a document for a certain product, and of course, you can also have a document for a certain order. Documents are expressed in JSON (Javascript Object Notation) format, and JSON is an Internet data interaction format that exists everywhere.
In an index/type, you can store as many documents as you want. Note that although a document physically exists in an index, the document must be indexed/given an index type.

Near real-time NRT

Elasticsearch is a near real-time search platform. This means that there is a slight delay from indexing a document until the document can be searched (usually within 1 second)

Cluster

A cluster is organized by one or more nodes, which together hold the entire data and provide index and search functions together. A cluster is identified by a unique name, which is "elasticsearch" by default. This name is important because a node can only join the cluster by specifying the name of a certain cluster

Node

A node is a server in the cluster. As a part of the cluster, it stores data and participates in the indexing and search functions of the cluster. Similar to a cluster, a node is also identified by a name. By default, this name is the name of a random Marvel comic character. This name will be assigned to the node when it is started. This name is very important for management work, because in this management process, you will determine which servers in the network correspond to which nodes in the Elasticsearch cluster.

A node can join a specified cluster by configuring the cluster name. By default, each node will be arranged to join a cluster called "elasticsearch", which means that if you start several nodes in your network and assume they can discover each other, they will automatically Formed and joined a cluster called "elasticsearch".

In a cluster, you can have as many nodes as you want. Moreover, if there are currently no Elasticsearch nodes running on your network, starting a node at this time will create and join a cluster called "elasticsearch" by default.

Sharding and replication shards&replicas

An index can store a large amount of data beyond the hardware limit of a single node. For example, an index with 1 billion documents occupies 1TB of disk space, and no node has such a large disk space; or a single node processes search requests and responds too slowly. To solve this problem, Elasticsearch provides the ability to divide the index into multiple parts, which are called shards. When you create an index, you can specify the number of shards you want. Each shard itself is also a fully functional and independent "index", which can be placed on any node in the cluster. Fragmentation is important for two reasons:
1) Allows you to split/expand your content capacity horizontally.
2) Allows you to perform distributed and parallel operations on shards (potentially on multiple nodes) to improve performance/throughput.

As for how a shard is distributed and how its documents are aggregated back to search requests, it is completely managed by Elasticsearch, which is transparent to you as a user.

In a network/cloud environment, failure can happen at any time. A certain shard/node is offline for some reason or disappears for any reason. In this case, a failover mechanism is very useful. And it is highly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard. These copies are called replicated shards, or simply called replications.

There are two main reasons why replication is important: Provides high availability in the case of shard/node failure. For this reason, it is very important to note that the replicated shard is never placed on the same node as the original/primary shard. Expand your search volume/throughput, because searches can be run in parallel on all replications. In short, each index can be divided into multiple fragments. An index can also be copied 0 times (meaning no copying) or multiple times. Once replicated, each index has a difference between the primary shard (the original shard as the source of replication) and the replicated shard (the copy of the primary shard). The number of shards and replications can be specified when the index is created. After the index is created, you can dynamically change the number of replications at any time, but you cannot change the number of shards afterwards.

By default, each index in Elasticsearch is sharded with 5 primary shards and 1 replication, which means that if your cluster has at least two nodes, your index will have 5 primary shards and Another 5 replicated shards (1 full copy), so there are 10 shards in total for each index.

ElasticSearch client operation

In actual development, there are mainly three ways to serve as the client of elasticsearch:

  • The first type, elasticsearch-head plugin
  • The second is to use the Restful interface provided by elasticsearch to directly access
  • The third is to use the API provided by elasticsearch for access

Use Postman tool to test

Create index and mapping

Refer to here
. I used ES.7x and above. I read the official website of ES and said that no specific index is supported. The default index type is _doc
7 or higher. ES version

{
    
    
    "mappings": {
    
    
            "properties": {
    
    
                "id": {
    
    
                	"type": "long",
                    "store": true
                },
                "title": {
    
    
                	"type": "text",
                    "store": true,
                    "analyzer":"standard"
                },
                "content": {
    
    
                	"type": "text",
                    "store": true,
                    "analyzer":"standard"
                }
            }
    }
}

Below 7 it will be fine with the following

{
    
    
    "mappings": {
    
    
        "article": {
    
    
            "properties": {
    
    
                "id": {
    
    
                	"type": "long",
                    "store": true,
                    "index":"not_analyzed"
                },
                "title": {
    
    
                	"type": "text",
                    "store": true,
                    "index":"analyzed",
                    "analyzer":"standard"
                },
                "content": {
    
    
                	"type": "text",
                    "store": true,
                    "index":"analyzed",
                    "analyzer":"standard"
                }
            }
        }
    }
}


Insert picture description here
Index information in postMan head tool
Insert picture description here

Set Mapping after index creation

We can set the mapping information when creating the index, of course, we can also create the index first and then set the mapping.
In the previous step, without setting the mapping information, use the put method to create an index directly, and then set the mapping information.
Requested url:

PUT http://127.0.0.1:9200/blog2

two:

PUT http://127.0.0.1:9200/blog2/hello/_mapping

{
    
    
    "hello": {
    
    
            "properties": {
    
    
                "id":{
    
    
                	"type":"long",
                	"store":true
                },
                "title":{
    
    
                	"type":"text",
                	"store":true,
                	"index":true,
                	"analyzer":"standard"
                },
                "content":{
    
    
                	"type":"text",
                	"store":true,
                	"index":true,
                	"analyzer":"standard"
                }
            }
        }
  }

Insert picture description here

Delete index

Request url

DELETE localhost:9200/blog1

Insert picture description here

Document operation

Create document document

Request url:

POST localhost:9200/blog1/article/1

Request body:

{
    
    
	"id":1,
	"title":"ElasticSearch是一个基于Lucene的搜索服务器",
	"content":"它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。"
}

Results The
Insert picture description here
results are as follows:
Insert picture description here

Modify document

Request url:

POST localhost:9200/blog1/article/1

Request body

{
    
    
	"id":1,
	"title":"【修改】ElasticSearch是一个基于Lucene的搜索服务器",
	"content":"【修改】它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。"
}

Postman screenshot: elasticsearch
Insert picture description here
-head view:
Insert picture description here

Delete document

Request url:

DELETE localhost:9200/blog1/article/1

Elasticsearch-head view:
Insert picture description here

Query documents-query by id

Request url:

GET localhost:9200/blog1/article/1

postMan screenshot
Insert picture description here

Query document-querystring query

Request url:

POST localhost:9200/blog1/article/_search

Request body:

{
    
    
    "query": {
    
    
        "query_string": {
    
    
            "default_field": "title",
            "query": "搜索服务器"
        }
    }
}

Postman screenshot:
Insert picture description here

Query document-term query

Request url:

POST localhost:9200/blog1/article/_search

Request body:

{
    
    
    "query": {
    
    
        "term": {
    
    
            "title": "搜索"
        }
    }
}

postman screenshot
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_39095899/article/details/108339000