ElasticSearch学习笔记(七)--重建索引

重建索引不会复制源索引的设置,应该在执行_reindex之前,指定目标索引的设置,包括mappings、分片数、副本数等。

第一个示例

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy"
  }
}

_reindex是获取了一个快照来进行索引重建的。 处理版本冲突,可以在目标索引中指定version_type属性,包括"inernal"和"external"两个选项。(==这两个选项的作用,我没看懂==)

在目标索引的参数中加入op_type属性,并将些属性设置为"create",_reindex将只创建那些在目标索引中不存在的文档。所有已存在的文档将导致一个版本冲突,但不影响_reindex的执行。可以设置conflicts为"proceed",只统计版本冲突的文档数量,两者的区别如下: 请求参数

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy",
    "op_type": "create"
  }
}

响应结果如下

{
  "took": 2,
  "timed_out": false,
  "total": 2,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 2,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "test-copy",
      "type": "doc",
      "id": "2",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[doc][2]: version conflict, document already exists (current version [1])",
        "index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
        "shard": "2",
        "index": "test-copy"
      },
      "status": 409
    },
    {
      "index": "test-copy",
      "type": "doc",
      "id": "1",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[doc][1]: version conflict, document already exists (current version [1])",
        "index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
        "shard": "3",
        "index": "test-copy"
      },
      "status": 409
    }
  ]
}

请求参数

POST _reindex
{
  "conflicts": "proceed", 
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy",
    "op_type": "create"
  }
}

响应结果

{
  "took": 5,
  "timed_out": false,
  "total": 3,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 3,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

可以指定多个源索引,如"index": ["source_index_1", "source_index_2"]。 可以限制从目标索引复制文档的数量,在源索引中可以使作query和sort,并且可以指定_source字段

POST _reindex
{
    "size":1,
    "source":{
        "index": "test",
        "sort": {
            "date": "desc"
        },
        "query": {
          "match": {
            "test": "data"
          }
        },
        "_source": ["field1", "field2"]
    },
    "dest":{...}
}

_reindex支持script来修改文档。

假如源文档中有一个名为"flag"字段,你想在目标文档中改为"tag",可以执行以下语句

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2"
  },
  "script": {
    "source": "ctx._source.tag = ctx._source.remove(\"flag\")"
  }
}

从远程elasticsearch重建索引

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

可以在elasticsearch.yml中配置允许的远程服务器白名单: reindex.remote.whitelist: ["first-host:9200", "second-host:9200"]

远程重建会使用一个最大为100Mb的堆缓冲区,如果源索引中的文档尺寸很大,要合理的指定每个批次的数量,即前面提到的size属性。

可以指定socket_timeoutconnect_timeout,如果不指定,这两个参数的默认值为30秒。

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "source"
  },
  "dest": {
    "index": "dest"
  }
}

更多功能查看官方文档: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/docs-reindex.html

猜你喜欢

转载自my.oschina.net/k2XI0fj/blog/1649873