This is the mason (bysocket.com) of 27 to share the essence
ES (ElasticSearch) is a distributed search engine. Engine too obscure, in fact, similar to a MySQL, a memory. Facilitate the following features:
- Near real-time search
- Full-text search, structure search, statistical analysis
Then the data stored in the ES where they come from?
The answer is data synchronization. We recommend the following ways:
Data transmission (Data Transmission) is a support data exchange between RDBMS (relational database), NoSQL, OLAP data sources and other data services provided by Ali cloud. [Ali]
https://help.aliyun.com/product/26590.htmlTo praise one hundred million orders exploration and practice synchronized [my brother to stay out of the team]
https://mp.weixin.qq.com/s/33KACMxXkgzZyIL9m6q4YA
Return to the ES Evolution
A small flow stage
At that time in start-up companies, the full amount of sync every time, then you can look in the morning to run the task. ES CRUD or directly to data synchronization.
Pseudo single cluster can also be run. Specific full-text search ideas:
- Based on "phrase match" and set the minimum match weight value
- Where's the phrase, using the word IK tokenizer
- Based Fiter implement screening
- Pageable implement paging sorted based on
Specifically look at my blog and ES series GitHub.
Second, the large flow slowly
The order of magnitude is estimated one million / million data records and queries synchronization.
Can not be a pseudo-single cluster, and operation and maintenance level to resolve this amount:
- ElasticSearch multiple instances running (node Node) of the cluster assembly is ElasticSearch
- Add more nodes to the cluster via the horizontal expansion
How the level of expansion
Create a master slice it has been identified in the index. The read operation can simultaneously be the master and the sub-fragment fragment processing. Therefore, more fragmented, it will have higher throughput. Naturally, the need to add more hardware resources to support throughput. Described herein can not improve performance, because fewer resources become available each slice. Dynamically adjusting the number of copies of fragments, clusters scale on demand, such as the number of copies the default value is 1 to 2:
PUT /blogs/_settings
{
"number_of_replicas" : 2
}
Basically a cluster Cluster mouth of each business Soso: orders, merchandise, etc.
Third, the sudden surge of orders flow
Suddenly found a problem:
- A cluster inside the big slow search index will affect other indexes A small cluster.
For example, the index is now orders the same big, slow investigation. Affecting other businesses. It should not be that way, you supposed to?
The answer is: the physical isolation for multi-cluster:
- Divided into many clusters: Cluster orders, cluster commodities isolation
- Multiple Computer Support
This time is often the origin of the problem: how businesses a single point of upgrading?
A project, the relevant data is stored items index. Magnitude of the project is growing, billion-middleweight, middleweight trillion. That a large index query of what will be a bottleneck. How this time to optimize it?
Solution: hot and cold separator; Split
Break large index, it is not hard. Similar slice routing rules can be specified according to the specific service.
Here, we can define the 1000 index, are named project_1, project_2, project_3 ...
Then a layer of simple proxy ES cluster in the top rack. Inside the core routing business rules can be:
project_id project increment ID
index_id get out of the index corresponding to the ID
index_id = project_id % 1000
- ES proxy layer: do the real total index and sub-index mapping
- ES index configuration management: do mapping index of business
- ES cluster
Hot and cold separator; are similar to the intermediate state is hottest independent data independent cluster index. Periodically delete data from the inside end state. Then the index is less data to support search search queries big thief. Why not.
- Finish -