A preliminary understanding of ElasticSearch

Official website: https://www.elastic.co/cn/downloads/elasticsearch

Elaticsearch , referred to as es, es is an open source highly scalable distributed full-text search engine. It can store and retrieve data in near real-time. It has good scalability and can be expanded to hundreds of servers to process PB levels (big data). era) data. es is also developed in java and uses Lucene as its core to implement all indexing and search functions, but its purpose isHide the complexity of Lucene with a simple RESTful API to make full-text search simple

According to statistics from DB Engines, an internationally authoritative database product evaluation organization, in January 2016, ElasticSearch surpassed Solr and became the number one search engine application.

Who is using elasticsearch:

1. Wikipedia, similar to Baidu Encyclopedia, full-text search, highlighting, search recommendations
2. The Guardian (foreign news website), similar to Sohu News, user behavior log (click, browse, favorite, comment) + social network data (for a certain Relevant views on a certain news), data analysis, give the author of each news article to let him know the public feedback of his article (good, bad, popular, garbage, contempt, worship) 3. Stack Overflow (foreign program
anomalies Discussion forum), IT issues, program errors, submit it, and someone will discuss and answer it with you. Full-text search, search for related questions and answers. If the program reports an error, the error message will be pasted into it to search if there is a corresponding one. Answer
4. GitHub (open source code management), search hundreds of billions of lines of code
5. E-commerce website, retrieve products
6. Log data analysis, logstash collection logs, ES for complex data analysis, ELK technology, elasticsearch+logstash+kibana
7 , commodity price monitoring website, the user sets the price threshold of a certain commodity, and when it is lower than the threshold, a notification message is sent to the user. For example, if you subscribe to the monitoring of toothpaste, if the family set of Colgate toothpaste is less than 50 yuan, you will be notified. Me, I will buy
8. BI system, business intelligence, Business Intelligence. For example, there is a large shopping mall group, BI, which analyzes the trend of user consumption amount in a certain area in the past three years and the composition of the user group, and produces several related reports, ** area, the annual consumption amount in the past three years Showing 100% growth, and 85% of the user group are senior white-collar workers, opening a new shopping mall. ES performs data analysis and mining, and Kibana performs data visualization
9. Domestic: site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, etc.), data analysis (ES popular one a usage scenario)

Introduction to ElasticSearch

  • Elasticsearch is a real-time distributed search and analytics engine . It makes it possible for you to process big data faster than ever before.
  • It's used for full-text search, structured search, analytics , and a mix of the three:
  • Wikipedia uses Elasticsearch to provide full-text search and highlight keywords , as well as search suggestion functions such as real-time search (search-asyou-type) and search error correction (did-you-mean).
  • The Guardian uses Elasticsearch to combine user logs and social network data to provide their editors with real-time feedback on public response to newly published articles.
  • StackOverflow combines full-text search with geolocation queries, as well as more-like-this functionality to find relevant questions and answers.
  • Github uses Elasticsearch to retrieve 130 billion lines of code.
  • But Elasticsearch isn't just for large enterprises; it's also allowing startups like DataDog and Klout to turn initial ideas into scalable solutions.
  • Elasticsearch can run on your laptop or on hundreds of servers to process petabytes of data.
  • Elasticsearch is an open source search engine based on Apache Lucene™. Whether in the open source or proprietary fields, Lucene can be considered to be the most advanced, best-performing, and most comprehensive search engine library to date.
  • However, Lucene is just a library . To use it, you must use Java as the development language and integrate it directly into your application. To make matters worse, Lucene is very complex and you need in-depth knowledge of retrieval to understand how it works.
  • Elasticsearch is also developed in Java and uses Lucene as its core to implement all indexing and search functions, but its purpose is to hide the complexity of Lucene through a simple RESTful API, thereby making full-text search simple.

People generally compare Solr and es, and I also made a summary:

  • Solr is faster when simply searching existing data
    Insert image description here
  • When building indexes in real time, Solr will cause io blocking and poor query performance. ElasticSearch has obvious advantages.
    Insert image description here
  • As the amount of data increases, Solr's search efficiency will become lower, while ElasticSearch has no obvious change.

Insert image description here

  • After transforming our search infrastructure from Solr to ElasticSearch, we saw an instant ~50x improvement in searchability
    Insert image description here

Summarize

1. es is basically ready to use out of the box (you can use it just by decompressing it!), which is very simple. Solr installation is a little complicated!
2. Solr uses Zookeeper for distributed management, and Elasticsearch itself has distributed coordination management functions.
3. Solr supports more formats of data, such as JSON, XML, and CSV, while Elasticsearch only supports json file format.
4. Solr officially provides more functions, while Elasticsearch itself focuses more on core functions. Advanced functions are mostly provided by third-party plug-ins. For example, the graphical interface requires friendly support from Kibana. 5. Solr queries are fast, but updating the index is slow (that is,
inserting Slow deletion), used for applications with many queries such as e-commerce;

  • ES indexing is fast (that is, querying is slow), that is, real-time querying is fast, and it is used for searches such as Facebook, Sina, etc.
  • Solr is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.

6. Solr is relatively mature and has a larger and more mature community of users, developers and contributors, while Elasticsearch has fewer developers and maintainers, updates too quickly, and has higher learning and usage costs.

Guess you like

Origin blog.csdn.net/qq_43649799/article/details/123021097