elasticsearch2.x升级到6.x 完成数据迁移

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_36666651/article/details/83792167

官方文档方法描述:https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html

官方对于版本升级的规则描述:https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html#rolling-upgrades

Rolling upgrades can be performed between minor versions. Elasticsearch 6.x supports rolling upgrades from Elasticsearch 5.6. Upgrading from earlier 5.x versions requires a full cluster restart. You must reindex to upgrade from versions prior to 5.x.

To upgrade an Elasticsearch 5.x cluster that contains indices created in 2.x, you must reindex or delete them before upgrading to 6.x. For more information, see Reindex in place.

To upgrade an Elasticsearch cluster running 2.x, you have two options:

  • Perform a full cluster restart upgrade to 5.6, reindex the 2.x indices, then perform a rolling upgrade to 6.x. If your Elasticsearch 2.x cluster contains indices that were created before 2.x, you must either delete or reindex them before upgrading to 5.6. For more information about upgrading from 2.x to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
  • Create a new 6.x cluster and reindex from remote to import indices directly from the 2.x cluster.

To upgrade an Elasticsearch 1.x cluster, you have two options:

  • Perform a full cluster restart upgrade to Elasticsearch 2.4.x and reindex or delete the 1.x indices. Then, perform a full cluster restart upgrade to 5.6 and reindex or delete the 2.x indices. Finally, perform a rolling upgrade to 6.x. For more information about upgrading from 1.x to 2.4, see Upgrading Elasticsearch in the Elasticsearch 2.4 Reference. For more information about upgrading from 2.4 to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
  • Create a new 6.x cluster and reindex from remote to import indices directly from the 1.x cluster.

旧es版本为2.4.6,三节点

127.0.0.1:9201,127.0.0.1:9301
127.0.0.1:9202,127.0.0.1:9302
127.0.0.1:9203,127.0.0.1:9303

新es版本为6.4.2,三节点

127.0.0.1:19201,127.0.0.1:19301
127.0.0.1:19202,127.0.0.1:19302
127.0.0.1:19203,127.0.0.1:19303

先启动一个新版本的es节点,配置文件如下:

# 这个参数必须设置  此参数是用来允许单机允许多个实例,默认不允许
node.max_local_storage_nodes: 32

# 用来重构索引的旧集群地址
reindex.remote.whitelist: 127.0.0.1:9201

# 使用分区属性来控制分片的分配 以及请求的分配
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone-1,zone-2,zone-3
node.attr.zone: zone-1

# 集群名称 同一个集群里的此参数要一致
cluster.name: version6

# 节点名称  统一集群的节点之间需要不一致
node.name: node-6-1

# 此节点为master候选节点
node.master: true
# 此节点为data节点
node.data: true

### es的节点分为三类  client  master data  
### client的master和data都是false, 作用类似于nginx的请求转发
### master负责结果聚合等工作,压力较大  所以不建议节点同时为master和data
### data节点存储分片负责部分计算

# data存放的路径
path.data: F:\\es\\data\\6.4.2\\node-6-1
# log存放的路径
path.logs: F:\\es\\logs\\6.4.2\\node-6-1

# 此节点对外的ip  本地直接写127.0.0.1即可
network.host: 127.0.0.1
# 对外暴露的http访问端口
http.port: 19201
# es内部通讯端口  单播使用
transport.tcp.port: 19301
# 关闭组播  1.x的参数  5.x 6.x添加会报错
# discovery.zen.ping.multicast.enabled: false
# 单播访问的地址 域名直接改成127.0.0.1即可  或者修改下本机host将127.0.0.1映射到多个域名,我是这么做的  我的有一个client 3个master  所以配了四个地址
# discovery.zen.ping.unicast.hosts: ["127.0.0.1:19301","127.0.0.1:19302","127.0.0.1:19303"]
discovery.zen.ping.unicast.hosts: ["127.0.0.1:19301"]

# CORS跨域访问设置
http.cors.enabled: true
http.cors.allow-origin: "*"

# 设置master选举需要赞同的最小节点数
# 值得计算方式为 master候选节点数除2加1
# 例如 node.master值为true的节点数量为5 值即为(5/2)+1 = 3
# discovery.zen.minimum_master_nodes: 2
discovery.zen.minimum_master_nodes: 1

然后执行reindex操作,官方给的示例:

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://oldhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

在执行reindex操作之前,可以先创建索引,并设置设置相应参数,来加快索引导入时的初始化,这里导入的索引为test1:
PS:这只是优化速度的一个方面,通过调整批量写入的数量batch_size以及导入的并发性slices,可以加快效率,不过这两个参数需要结合具体集群情况进行调优,没有统一的标准。

# 控制segment创建的速度,1s,即每秒创建一个,默认30s 这里设置为-1,不主动创建segment,加快导入速度
# index.refresh_interval: -1
# 控制分片的数量,为了加快初始化导入索引的速度,这里直接设置为0,表示关闭副本
# index.number_of_replicas: 0

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/test1" -d '{
    "settings":{
        "refresh_interval":-1,
        "number_of_replicas":0
    }
}'

然后执行reindex操作(数据迁移方式也可以具体控制,例如相同doc直接跳过,只创建新的等):

curl -X POST "127.0.0.1:19201/_reindex" -d '{
  "source": {
    "remote": {
      "host": "http://127.0.0.1:9201"
    },
    "index": "test1"
  },
  "dest": {
    "index": "test1"
  }
}'

上面我们为了加快导入速度修改了配置,这里改回来:

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/test1/_settings" -d '{
    "refresh_interval": "30s",
    "number_of_replicas":1
}'

然后,依次将剩余节点加入集群即可,加入方法,配置文件中discovery.zen.ping.unicast.hosts:加入集群已存在的节点和本节点。

然后修改集群参数,之前第一个节点最小选举数量设置的是1,三个节点容易脑裂,节点全部加入后,统一修改集群参数:

curl -X PUT -H "Content-Type:application/json" "127.0.0.1:19201/_cluster/settings" -d '{
  "persistent": {
    "discovery.zen.minimum_master_nodes":2
  }
}'

实际生产中,索引较大,迁移时间可能会过长,这时候如果一直等待会出现请求超时的情况,所以我们可以执行后台任务,只需增加wait_for_completion参数即可:

curl -X POST "127.0.0.1:19201/_reindex?wait_for_completion=false" -d '{
  "source": {
    "remote": {
      "host": "http://127.0.0.1:9201"
    },
    "index": "test1"
  },
  "dest": {
    "index": "test1"
  }
}'

这时请求会直接返回taskID:

{
    "task": "q270bwrnQAe3K4SBu0GW8w:5012"
}

然后我们可以通过查询taskID来查看请求状态:

# GET _tasks/TASK_ID 

curl -X GET "127.0.0.1:19201/_tasks/q270bwrnQAe3K4SBu0GW8w:5012"

可以看到请求的执行状态:

{
    "completed": true,
    "task": {
        "node": "q270bwrnQAe3K4SBu0GW8w",
        "id": 5012,
        "type": "transport",
        "action": "indices:data/write/reindex",
        "status": {
            "total": 20753,
            "updated": 20753,
            "created": 0,
            "deleted": 0,
            "batches": 21,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
                "bulk": 0,
                "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": -1,
            "throttled_until_millis": 0
        },
        "description": "reindex from [host=127.0.0.1 port=9201 query={\n  \"match_all\" : {\n    \"boost\" : 1.0\n  }\n}][test1] to [test1]",
        "start_time_in_millis": 1541498674939,
        "running_time_in_nanos": 9599382168,
        "cancellable": true,
        "headers": {}
    },
    "response": {
        "took": 9595,
        "timed_out": false,
        "total": 20753,
        "updated": 20753,
        "created": 0,
        "deleted": 0,
        "batches": 21,
        "version_conflicts": 0,
        "noops": 0,
        "retries": {
            "bulk": 0,
            "search": 0
        },
        "throttled_millis": 0,
        "requests_per_second": -1,
        "throttled_until_millis": 0,
        "failures": []
    }
}

PS:升级迁移前,可在旧的es集群前加上代理,让客户端连接代理地址,然后再进行数据迁移,择机迁移完成后修改代理指向,某种程度上可以做到不停服迁移升级

猜你喜欢

转载自blog.csdn.net/qq_36666651/article/details/83792167