Solr vs. ElasticSearch [关闭]

本文翻译自:Solr vs. ElasticSearch [closed]

What are the core architectural differences between these technologies? 这些技术之间的核心架构差异是什么?

Also, what use cases are generally more appropriate for each? 另外,哪些用例通常更适合每种用例?


#1楼

参考:https://stackoom.com/question/gqrx/Solr-vs-ElasticSearch-关闭


#2楼

Update 更新

Now that the question scope has been corrected, I might add something in this regard as well: 既然问题范围已得到纠正,我也可以在这方面添加一些内容:

There are many comparisons between Apache Solr and ElasticSearch available, so I'll reference those I found most useful myself, ie covering the most important aspects: Apache SolrElasticSearch之间有很多比较,所以我会参考我自己认为最有用的那些,即涵盖最重要的方面:

  • Bob Yoplait already linked kimchy's answer to ElasticSearch, Sphinx, Lucene, Solr, Xapian. Bob Yoplait已经将kimchy的答案与ElasticSearch,Sphinx,Lucene,Solr,Xapian联系起来。 Which fits for which usage? 哪种适合哪种用途? , which summarizes the reasons why he went ahead and created ElasticSearch , which in his opinion provides a much superior distributed model and ease of use in comparison to Solr. ,总结了他继续创建ElasticSearch的原因,他认为ElasticSearch与Solr相比提供了更优越的分布式模型和易用性

  • Ryan Sonnek's Realtime Search: Solr vs Elasticsearch provides an insightful analysis/comparison and explains why he switched from Solr to ElasticSeach, despite being a happy Solr user already - he summarizes this as follows: Ryan Sonnek的实时搜索:Solr vs Elasticsearch提供了深刻的分析/比较,并解释了为什么他从Solr切换到ElasticSeach,尽管他已经是一个快乐的Solr用户 - 他总结如下:

    Solr may be the weapon of choice when building standard search applications , but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications . 在构建标准搜索应用程序时Solr可能是首选武器,但Elasticsearch通过用于创建现代实时搜索应用程序架构将其提升到新的水平。 Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. 渗透是一个令人兴奋和创新的特点,单独将Solr从水中吹走。 Elasticsearch is scalable, speedy and a dream to integrate with . Elasticsearch具有可扩展性,快速性和集成的梦想 Adios Solr, it was nice knowing you. Adios Solr,知道你很高兴。 [emphasis mine] [强调我的]

  • The Wikipedia article on ElasticSearch quotes a comparison from the reputed German iX magazine, listing advantages and disadvantages, which pretty much summarize what has been said above already: 关于ElasticSearch的维基百科文章引用了着名的德国iX杂志的比较 ,列出了优点和缺点,几乎总结了上面已经说过的内容:

    Advantages : 优点

    • ElasticSearch is distributed. ElasticSearch是分布式的。 No separate project required. 不需要单独的项目。 Replicas are near real-time too, which is called "Push replication". 副本也接近实时,称为“推送复制”。
    • ElasticSearch fully supports the near real-time search of Apache Lucene. ElasticSearch完全支持Apache Lucene的近实时搜索。
    • Handling multitenancy is not a special configuration, where with Solr a more advanced setup is necessary. 处理多租户不是一种特殊的配置,使用Solr需要更高级的设置。
    • ElasticSearch introduces the concept of the Gateway, which makes full backups easier. ElasticSearch引入了Gateway的概念,使完整备份更容易。

    Disadvantages : 缺点

    • Only one main developer [not applicable anymore according to the current elasticsearch GitHub organization , besides having a pretty active committer base in the first place] 只有一个主要开发人员 [根据目前的弹性搜索GitHub组织不再适用,除了首先拥有一个相当活跃的提交者基础]
    • No autowarming feature [not applicable anymore according to the new Index Warmup API ] 没有自动装配功能 [根据新的Index Warmup API不再适用]

Initial Answer 初步答复

They are completely different technologies addressing completely different use cases, thus cannot be compared at all in any meaningful way: 它们是针对完全不同的用例的完全不同的技术,因此无法以任何有意义的方式进行比较:

  • Apache Solr - Apache Solr offers Lucene's capabilities in an easy to use, fast search server with additional features like faceting, scalability and much more Apache Solr - Apache Solr在易于使用,快速搜索服务器中提供Lucene的功能,具有分面,可扩展性等更多功能

  • Amazon ElastiCache - Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. Amazon ElastiCache - Amazon ElastiCache是​​一种Web服务,可以轻松部署,操作和扩展云中的内存缓存

    • Please note that Amazon ElastiCache is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service (see Memcached for details). 请注意, Amazon ElastiCache与Memcached(一种广泛采用的内存对象缓存系统)协议兼容,因此您现在使用现有Memcached环境的代码,应用程序和常用工具将与该服务无缝 协作(有关详细信息,请参阅Memcached )。

[emphasis mine] [强调我的]

Maybe this has been confused with the following two related technologies one way or another: 也许这已经与以下两种相关技术混淆了:

  • ElasticSearch - It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene. ElasticSearch - 它是一个基于Apache Lucene构建的开源(Apache 2),分布式,RESTful,搜索引擎。

  • Amazon CloudSearch - Amazon CloudSearch is a fully-managed search service in the cloud that allows customers to easily integrate fast and highly scalable search functionality into their applications. Amazon CloudSearch - Amazon CloudSearch是云中的一个完全托管的搜索服务,允许客户轻松地将快速和高度可扩展的搜索功能集成到他们的应用程序中。

The Solr and ElasticSearch offerings sound strikingly similar at first sight, and both use the same backend search engine, namely Apache Lucene . SolrElasticSearch产品乍一看听起来非常相似,都使用相同的后端搜索引擎,即Apache Lucene

While Solr is older, quite versatile and mature and widely used accordingly, ElasticSearch has been developed specifically to address Solr shortcomings with scalability requirements in modern cloud environments, which are hard(er) to address with Solr . 虽然Solr较老,功能多样且成熟并且相应地广泛使用,但ElasticSearch专门用于解决Solr在现代云环境中的可扩展性要求方面的缺点,而Solr很难解决这些问题。

As such it would probably be most useful to compare ElasticSearch with the recently introduced Amazon CloudSearch (see the introductory post Start Searching in One Hour for Less Than $100 / Month ), because both claim to cover the same use cases in principle. 因此,将ElasticSearch与最近推出的Amazon CloudSearch进行比较可能是最有用的(请参阅介绍性文章在一小时内以低于100美元/月的价格开始搜索 ),因为两者都声称原则上涵盖了相同的用例。


#3楼

I see some of the above answers are now a bit out of date. 我看到上面的一些答案现在有点过时了。 From my perspective, and I work with both Solr(Cloud and non-Cloud) and ElasticSearch on a daily basis, here are some interesting differences: 从我的角度来看,我每天都与Solr(云和非云)和ElasticSearch合作,这里有一些有趣的差异:

  • Community: Solr has a bigger, more mature user, dev, and contributor community. 社区:Solr拥有更大,更成熟的用户,开发者和贡献者社区。 ES has a smaller, but active community of users and a growing community of contributors ES拥有一个规模较小但活跃的用户社区和不断增长的贡献者社区
  • Maturity: Solr is more mature, but ES has grown rapidly and I consider it stable 成熟度:Solr更成熟,但ES增长迅速,我认为它稳定
  • Performance: hard to judge. 表现:很难判断。 I/we have not done direct performance benchmarks. 我/我们还没有做过直接的性能基准测试。 A person at LinkedIn did compare Solr vs. ES vs. Sensei once, but the initial results should be ignored because they used non-expert setup for both Solr and ES. LinkedIn上的一个人确实比较了Solr vs. ES与Sensei的比较,但最初的结果应该被忽略,因为他们使用Solr和ES的非专家设置。
  • Design: People love Solr. 设计:人们喜欢Solr。 The Java API is somewhat verbose, but people like how it's put together. Java API有点冗长,但人们喜欢它是如何组合在一起的。 Solr code is unfortunately not always very pretty. 遗憾的是,Solr代码并不总是很漂亮。 Also, ES has sharding, real-time replication, document and routing built-in. 此外,ES还具有内置分片,实时复制,文档和路由功能。 While some of this exists in Solr, too, it feels a bit like an after-thought. 虽然其中一些也存在于Solr中,但感觉有点像经过深思熟虑。
  • Support: there are companies providing tech and consulting support for both Solr and ElasticSearch. 支持:有些公司为Solr和ElasticSearch提供技术和咨询支持。 I think the only company that provides support for both is Sematext (disclosure: I'm Sematext founder) 我认为唯一为两者提供支持的公司是Sematext(披露:我是Sematext的创始人)
  • Scalability: both can be scaled to very large clusters. 可伸缩性:两者都可以扩展到非常大的集群。 ES is easier to scale than pre-Solr 4.0 version of Solr, but with Solr 4.0 that's no longer the case. ES比Solr 4.0前版本的Solr更容易扩展,但是Solr 4.0已经不再适用了。

For more thorough coverage of Solr vs. ElasticSearch topic have a look at https://sematext.com/blog/solr-vs-elasticsearch-part-1-overview/ . 有关Solr与ElasticSearch主题的更全面介绍,请查看https://sematext.com/blog/solr-vs-elasticsearch-part-1-overview/ This is the first post in the series of posts from Sematext doing direct and neutral Solr vs. ElasticSearch comparison. 这是Sematext系列帖子中第一篇发表直接和中立的Solr与ElasticSearch比较的帖子。 Disclosure: I work at Sematext. 披露:我在Sematext工作。


#4楼

While all of the above links have merit, and have benefited me greatly in the past, as a linguist "exposed" to various Lucene search engines for the last 15 years, I have to say that elastic-search development is very fast in Python. 虽然上述所有链接都具有优点,并且过去使我受益匪浅,但作为语言学家在过去的15年中“暴露”到各种Lucene搜索引擎,我不得不说Python中的弹性搜索开发速度非常快。 That being said, some of the code felt non-intuitive to me. 话虽这么说,有些代码对我来说不直观。 So, I reached out to one component of the ELK stack, Kibana, from an open source perspective, and found that I could generate the somewhat cryptic code of elasticsearch very easily in Kibana. 因此,我从开源的角度联系了ELK堆栈的一个组件Kibana,发现我可以在Kibana中轻松生成一些有点神秘的弹性搜​​索代码。 Also, I could pull Chrome Sense es queries into Kibana as well. 此外,我也可以将Chrome Sense es查询拉入Kibana。 If you use Kibana to evaluate es, it will further speed up your evaluation. 如果您使用Kibana评估es,它将进一步加快您的评估速度。 What took hours to run on other platforms was up and running in JSON in Sense on top of elasticsearch (RESTful interface) in a few minutes at worst (largest data sets); 在其他平台上运行花费数小时的时间是在弹性搜索(RESTful接口)之上的Sense中以最差的几分钟(最大数据集)运行在JSON中的JSON; in seconds at best. 在几秒钟内充其量。 The documentation for elasticsearch, while 700+ pages, didn't answer questions I had that normally would be resolved in SOLR or other Lucene documentation, which obviously took more time to analyze. elasticsearch的文档虽然超过700页,却没有回答我通常会在SOLR或其他Lucene文档中解决的问题,这显然需要更多的时间来分析。 Also, you may want to take a look at Aggregates in elastic-search, which have taken Faceting to a new level. 此外,您可能需要查看弹性搜索中的聚合,这会使Faceting达到一个新的水平。

Bigger picture: if you're doing data science, text analytics, or computational linguistics, elasticsearch has some ranking algorithms that seem to innovate well in the information retrieval area. 更大的图片:如果您正在进行数据科学,文本分析或计算语言学,弹性搜索有一些排名算法,似乎在信息检索领域创新。 If you're using any TF/IDF algorithms, Text Frequency/Inverse Document Frequency, elasticsearch extends this 1960's algorithm to a new level, even using BM25, Best Match 25, and other Relevancy Ranking algorithms. 如果你正在使用任何TF / IDF算法,文本频率/逆文档频率,elasticsearch将这个1960年代的算法扩展到一个新的水平,甚至使用BM25,Best Match 25和其他相关性排名算法。 So, if you are scoring or ranking words, phrases or sentences, elasticsearch does this scoring on the fly, without the large overhead of other data analytics approaches that take hours--another elasticsearch time savings. 因此,如果您对单词,短语或句子进行评分或排名,那么elasticsearch会立即进行评分,而不需要花费数小时的其他数据分析方法的大量开销 - 另一个弹性搜索时间节省。 With es, combining some of the strengths of bucketing from aggregations with the real-time JSON data relevancy scoring and ranking, you could find a winning combination, depending on either your agile (stories) or architectural(use cases) approach. 使用es,将聚合中的一些优势与实时JSON数据相关性评分和排名相结合,您可以找到一个成功的组合,具体取决于您的敏捷(故事)或架构(用例)方法。

Note: did see a similar discussion on aggregations above, but not on aggregations and relevancy scoring--my apology for any overlap. 注意:确实看到了关于上面聚合的类似讨论,但没有看到聚合和相关性评分 - 我对任何重叠的道歉。 Disclosure: I don't work for elastic and won't be able to benefit in the near future from their excellent work due to a different architecural path, unless I do some charity work with elasticsearch, which wouldn't be a bad idea 披露:我不会为弹性工作,并且由于不同的建筑路径而无法在不久的将来从他们出色的工作中受益,除非我做一些与弹性搜索的慈善工作,这不是一个坏主意


#5楼

I have use Elasticsearch for 3 years and Solr for about a month, I feel elasticsearch cluster is quite easy to install as compared to Solr installation. 我使用Elasticsearch 3年和Solr大约一个月,我觉得与Solr安装相比,elasticsearch集群很容易安装。 Elasticsearch has a pool of help documents with great explanation. Elasticsearch有一个帮助文档池,有很好的解释。 One of the use case I was stuck up with Histogram Aggregation which was available in ES however not found in Solr. 其中一个用例我坚持使用直方图聚合,这在ES中可用,但在Solr中找不到。


#6楼

If you are already using SOLR, remain stick to it. 如果您已经在使用SOLR,请坚持使用它。 If you are starting up, go for Elastic search. 如果您正在启动,请转到弹性搜索。

Maximum major issues have been fixed in SOLR and it is quite mature. 最大的主要问题已在SOLR中得到修复,并且相当成熟。

发布了0 篇原创文章 · 获赞 51 · 访问量 34万+

猜你喜欢

转载自blog.csdn.net/CHCH998/article/details/105467510