Elasticsearch检索优化-集群和索引级别的一些参数优化

一、背景说明

六台服务器：内存-32G，CPU-8，硬盘200G
六节点(elasticsearch版本-5.6.3)：角色配置-node.master: true node.data: true，内存16G （机器剩余内存也部署了一些其它程序）
集群整体情况：106-indices 563-shards 317,749,447-docs 834.63GB

二、优化

1、节点独占服务器（每台服务器只布署一个节点，不要布署其它与es节点无关的程序），多余内存留给操作系统缓存（lucene使用缓存对于检索速度有至关重要的影响）；
2、设置打印慢速检索日志，对每个索引设置以下选项；

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.query.trace": "500ms",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.fetch.info": "800ms",
    "index.search.slowlog.threshold.fetch.debug": "500ms",
    "index.search.slowlog.threshold.fetch.trace": "200ms"
}

3、根据业务需求设置索引的刷新时间为30s，对每个索引设置以下选项；

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{ "index.refresh_interval": "30s"}

4、设置内存锁定-可以在每个节点上禁用内存交换以确保稳定性，并且应该不惜一切代价避免交换。它可能导致垃圾收集持续数分钟而不是毫秒，并且可能导致节点响应缓慢甚至断开与集群的连接。
Elasticsearch配置文件elasticsearch.yml配置：

bootstrap.memory_lock: true

5、索引级setting-触动冲刷得规模-可设置为JVM堆内存得百分比10%（默认512mb），对每个索引设置以下选项；

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{"index.translog.flush_threshold_size": "1.6gb"}

6、每个分片的数据量控制不要超过30GB，单个集群规模不要超过300+节点，根据业务需求合理规划集群；
7、索引级setting-用于合并的最大线程数（设置为1可以让磁盘更好的运转），要注意的是如果你是用HDD而非SSD的磁盘的话，最好是用单线程为妙；

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{"index.merge.scheduler.max_thread_count": 1}

8、强制限定一个节点上某个index的shard数量；

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{"index.routing.allocation.total_shards_per_node" : 1}

9、tranlog持久化策略调整
Elasticsearch 2.0之后为了保证不丢数据，每次 index、bulk、delete、update 完成的时候，一定触发刷新 translog 到磁盘上，才给请求返回 200 OK。这个改变在提高数据安全性的同时当然也降低了一点性能。如果你不在意这点可能性，还是希望性能优先，可以在 index template 里设置如下参数：

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{"index.translog.durability": "async"}

10、禁止动态分配分片
有时，Elasticsearch将重新平衡集群中的分片。此操作可能会降低检索的性能。在生产模式下，需要时，可以通过cluster.routing.rebalance.enable设置将重新平衡设置为none。
其中典型的应用场景之包括：
（1）集群中临时重启、剔除一个节点；
（2）集群逐个升级节点；当您关闭节点时，分配过程将立即尝试将该节点上的分片复制到集群中的其他节点，从而导致大量浪费的IO. 在关闭节点之前禁用分配可以避免这种情况。

curl -PUT http://localhost:9210/_cluster/settings/
{ "transient": {  "cluster.routing.allocation.enable": "none" }}

11、（6.x以上版本支持的设置）打开自适应副本选择
应打开自适应副本选择。该请求将被重定向到响应最快的节点。当存在多个数据副本时，elasticsearch可以使用一组称为自适应副本选择的标准，根据包含每个分片副本的节点的响应时间，服务时间和队列大小来选择数据的最佳副本。这样可以提高查询吞吐量并减少搜索量大的应用程序的延迟。这个配置默认是关闭的，实战打开方法：

curl -PUT http://localhost:9210/_cluster/settings/
{"transient": { "cluster.routing.use_adaptive_replica_selection": true }}

12、关闭动态创建mapping的设置

curl -PUT http://localhost:9210/_cluster/settings/
{"persistent": {"action.auto_create_index": false }}

或者可以设置白名单只让符合命名规则的索引mapping自动创建：

{"persistent": { "action.auto_create_index": "logstash-*,.kibana*"}}

13、（6.x以上）设置sync间隔时间设定

curl -PUT http://localhost:9210/_all/_settings?preserve_existing=false
{"index.translog.sync_interval": "60s"}

上述所有集群级别的设置：

{
    "persistent": {
        "cluster.routing.allocation.enable": "none",
        "action.auto_create_index": false
    }
}

上述所有索引级别的设置：

{
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.query.trace": "500ms",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.fetch.info": "800ms",
    "index.search.slowlog.threshold.fetch.debug": "500ms",
    "index.search.slowlog.threshold.fetch.trace": "200ms",
    "index.refresh_interval": "30s",
    "index.translog.flush_threshold_size": "1.6gb",
    "index.merge.scheduler.max_thread_count": 1,
    "index.routing.allocation.total_shards_per_node": 1,
    "index.translog.durability": "async"
}

三、注意点

1、persistent-集群重启不失效
2、transient-集群重启失效
3、5.x默认mapping内字段不可自动创建"dynamic": “false”，不需要单独设置
4、es的缓存分为三种：节点缓存（filter context）（Segment合并时失效）、分片查询缓存（Cache Query）（分片refresh时失效）、和Fielddata Cache（Segment合并时失效）。
Elasticsearch的优化是非常复杂的，以上只是沧海一粟，需要更深入的优化请参考更多资料。

TALK_IS_CHEAP 博客专家

发布了173 篇原创文章 · 获赞 113 · 访问量 30万+

私信关注