安装调优

vm.max_map_count

max_map_count文件包含限制一个进程可以拥有的VMA(虚拟内存区域)的数量，系统默认是65530，修改成262144。解决方法是修改/etc/sysctl.conf配置文件，添加vm.max_map_count=262144.

This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.

While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation.

The default value is 65536.

调优写速度

bulk，通过单节点单分片双倍递增法调优bulk size，
多线程进行bulk
留意TOO_MANY_REQUESTS (429) response codes (EsRejectedExecutionException with the Java client),即达到写入瓶颈。指数回退（exponential backoff），选择一个随机时间进行等待
增加index.refresh_interval，默认1s，强制ES每秒创建一个新的segment。
初始加载数据，禁用refresh(index.refresh_interval:-1)、设置副本为0(index.number_of_replicas:0)
禁用swapping（bootstrap.memory_lock: true）
给系统的文件系统保留内存，给filesystem cache使用，至少一半的内存。
使用自动生成的ID。显式的会去检查在同分片是否存在该ID的文档，自动生成的会跳过检查。
更好的硬件
- indexing is I/O bound
- SSD
- local storage，not remote filesystems
- RAID 0 array，快，风险在于任一磁盘坏则全部不可用—-副本来平衡

增加indexing buffer size，特别是写多读少，主要用于写的场景。

    indices.memory.index_buffer_size：百分比或绝对值，默认10% heap

每分片最大 512 MB，再大改进性能不大。

调优搜索性能

给filesystem cache预留半数以上内存，ES重度依赖。
更好的硬件：CPU、磁盘、网卡
优化doc model(mapping),
- 避免join：相比常规，nested慢几倍，parent-child慢几百倍

预索引数据：根据查询方式，优化写入数据的方式

固定后缀匹配 –> 后缀另存一个字段

换种方式查询
如下，之前根据price 字段进行range查找。改成存储range到一个字段，用terms查找。

PUT index/type/1
{
  "designation": "spoon",
  "price": 13
}
GET index/_search
{
  "size":0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 10 },
          { "from": 10, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

PUT index
{
  "mappings": {
    "type": {
      "properties": {
        "price_range": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT index/type/1
{
  "designation": "spoon",
  "price": 13,
  "price_range": "10-100"
}
GET index/_search
{
  "aggs": {
    "price_ranges": {
      "terms": {
        "field": "price_range"
      }
    }
  }
}

mappings优化
- 如numeric 值，不一定就要存为numeric 类型。典型的如标识类的ISBN ，有唯一性的，映射为 keyword 可能比integer 或long更好。
避免scripts。不得不时，尽量用painless或expressions 引擎。

搜索 date ,记得尽量round【query cache】。now are typically not cacheable。通过round date，better use of the query cache.

"now-1h" —–> "now-1h/m" ,rounded to the minute.

或者split： split ranges into a large cacheable part and smaller not cacheable parts.

"should": [
    {
      "range": {
        "my_date": {
          "gte": "now-1h",
          "lte": "now-1h/m"
        }
      }
    },
    {
      "range": {
        "my_date": {
          "gt": "now-1h/m",
          "lt": "now/m"
        }
      }
    },
    {
      "range": {
        "my_date": {
          "gte": "now/m",
          "lte": "now"
        }
      }
    }
]

强制merge只读索引为一个segment，特别是基于时间序列的索引。
POST /twitter/_forcemerge 参数：
- max_num_segments 默认只是检查是否需要merge,若需要则执行。
- only_expunge_deletes 仅merge掉已标记删除的doc,默认false.
Warm up global ordinals
- a data-structure that is used in order to run terms aggs on keyword fields.
- loaded lazily in memory，ES预先不知道要在哪个字段执行aggs。
- 字段mapping配置："eager_global_ordinals": true
Warm up the filesystem cache（慎），配置index.store.preload

磁盘

index –>可搜索
ES5.0 “index”: false,不是no了
doc values,fielddata –> 聚合、排序

text(字符串)
“norms”: false
为了评分，默认存储归一化因子(normalization factors)，若只需字符匹配，无需评分，可禁用向index写norms信息.
“index_options”: “freqs”
默认存储frequencies(打分) and positions(短语查询) in the index。
若不需要短语查询，可不index positions信息，仅仅freqs

同时不关心打分和不用短语查询，以上2个可同时设置

不要使用默认的dynamic string mappings，将会index为2个字段(text and keyword)，若你只需要一个，将造成浪费
PUT index
{
  "mappings": {
    "type": {
      "dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}

禁用_all
若不需要向所有字段，则禁用。
索引设置#使用best_compression （_source、stored），节省磁盘角度。
index.codec,默认可换用DEFLATE高压缩比例，代价是较慢的

使用最小的number type:节省磁盘
byte, short, integer or long
floating points同样，使用scaled_float ，或half_float –> float –>double

ES 优化

安装调优

vm.max_map_count

调优写速度

调优搜索性能

磁盘

猜你喜欢