备份、迁移和重建索引

ES 的运维里，备份和重建索引非常常见。比如 Mapping 设计错了要改字段类型，分词器换了要让历史数据重新分词，集群迁移要搬数据，这些都离不开备份、导出、reindex 和别名切换。

先说一个原则

生产环境里，ES 里的数据最好不要是唯一数据源。

如果 ES 是从 MySQL、Kafka、日志文件等来源同步出来的搜索视图，那么最可靠的恢复方式通常是“从源头重放”。如果 ES 保存的是唯一数据，那就必须做快照备份，并定期演练恢复。

elasticdump 适合做什么？

elasticdump 是一个常用的导入导出工具，适合小规模迁移、临时备份、开发测试环境复制数据。

安装：

bash

npm install elasticdump -g

导出 Mapping：

bash

elasticdump \
  --input=http://127.0.0.1:9200/articles \
  --output=articles_mapping.json \
  --type=mapping

导出数据：

bash

elasticdump \
  --input=http://127.0.0.1:9200/articles \
  --output=articles_data.json \
  --type=data

导入 Mapping：

bash

elasticdump \
  --input=articles_mapping.json \
  --output=http://127.0.0.1:9200/articles_new \
  --type=mapping

导入数据：

bash

elasticdump \
  --input=articles_data.json \
  --output=http://127.0.0.1:9200/articles_new \
  --type=data

如果 ES 开启了认证：

bash

elasticdump \
  --input=http://elastic:password@127.0.0.1:9200/articles \
  --output=articles_data.json \
  --type=data

命令里直接写密码会进入 shell 历史，生产环境要注意凭据泄露风险，尽量用环境变量、临时账号或受控脚本。

快照备份更适合生产

生产级备份更推荐 ES 自带的 snapshot/restore。它可以把索引快照保存到共享文件系统、S3、HDFS 等仓库。

创建仓库示例：

http

PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/data/es_backup"
  }
}

创建快照：

http

PUT /_snapshot/my_backup/snapshot_2025_05_16?wait_for_completion=true
{
  "indices": "articles,logs-*",
  "ignore_unavailable": true,
  "include_global_state": false
}

查看快照：

http

GET /_snapshot/my_backup/_all

恢复：

http

POST /_snapshot/my_backup/snapshot_2025_05_16/_restore
{
  "indices": "articles",
  "rename_pattern": "articles",
  "rename_replacement": "articles_restore"
}

快照适合大数据量和长期备份，elasticdump 更像灵活的小工具。

为什么需要重建索引？

常见原因有：

字段类型设计错了，比如 keyword 要改 date。
text 字段要换 analyzer。
要调整分片数量。
要清理历史脏数据。
要把多个旧索引迁移到新结构。
要给索引加上别名和生命周期管理。

已有字段类型通常不能原地修改，所以重建索引是 ES 的常规操作。

重建索引的基本流程

假设旧索引叫 articles_v1，新索引叫 articles_v2。

第一步，创建新索引并设置 Mapping：

http

PUT /articles_v2
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0,
    "refresh_interval": -1
  },
  "mappings": {
    "properties": {
      "articleId": { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      },
      "publishTime": { "type": "date" },
      "updateTime": { "type": "date" }
    }
  }
}

这里临时关闭副本和 refresh，是为了提高导入速度。导入完成后要恢复。

第二步，记录开始时间。

text

start_time = 2025-05-16T10:00:00Z

如果业务还在持续写入，后面需要根据 updateTime 追增量。

第三步，异步执行全量 reindex：

http

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "articles_v1"
  },
  "dest": {
    "index": "articles_v2"
  }
}

返回结果里会有 task id：

json

{ "task": "nodeId:12345" }

查看进度：

http

GET /_tasks/nodeId:12345

第四步，同步增量数据：

http

POST /_reindex
{
  "source": {
    "index": "articles_v1",
    "query": {
      "range": {
        "updateTime": {
          "gte": "2025-05-16T10:00:00Z"
        }
      }
    }
  },
  "dest": {
    "index": "articles_v2",
    "op_type": "index"
  }
}

如果 _id 保持一致，这个操作可以覆盖新索引中的旧版本，方便做最终一致补偿。

第五步，恢复索引设置：

http

PUT /articles_v2/_settings
{
  "index": {
    "number_of_replicas": 1,
    "refresh_interval": "1s"
  }
}

手动 refresh：

http

POST /articles_v2/_refresh

第六步，使用别名原子切换：

http

POST /_aliases
{
  "actions": [
    { "remove": { "index": "articles_v1", "alias": "articles" } },
    { "add": { "index": "articles_v2", "alias": "articles" } }
  ]
}

应用侧只访问别名 articles，就可以避免改代码和非原子切换。

读写别名更稳

更完整的做法是区分读别名和写别名：

http

POST /_aliases
{
  "actions": [
    { "add": { "index": "articles_v1", "alias": "articles_read" } },
    { "add": { "index": "articles_v1", "alias": "articles_write", "is_write_index": true } }
  ]
}

重建完成后，读别名切到新索引，写别名也切到新索引。对于有持续写入的系统，这比应用里写死索引名安全很多。

Reindex 要注意什么？

源索引数据量大时，用 wait_for_completion=false 异步执行。
用 tasks API 观察进度和失败原因。
业务持续写入时，必须有可靠的 updateTime 或版本号字段追增量。
重建期间新旧索引 _id 尽量保持一致，方便幂等覆盖。
新索引导入前可以关闭副本和 refresh，导入后必须恢复。
切换前要对比文档数量和抽样查询结果。
不要在业务高峰做大规模 reindex。

冷热数据和 ILM

日志、行为事件、行情快照这类时间序列数据，通常越新的数据访问越频繁，越旧的数据越少访问。

ES 可以通过 ILM 管理生命周期：

hot：新数据写入，高性能磁盘。
warm/cold：旧数据迁移到低成本节点。
delete：超过保留时间自动删除。

示例：

http

PUT /_ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "1d", "max_size": "50gb" }
        }
      },
      "cold": {
        "min_age": "7d",
        "actions": {
          "forcemerge": { "max_num_segments": 1 }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}

ILM 适合日志类索引，不一定适合强业务搜索索引。业务索引是否按时间滚动，要看查询模式和数据更新模式。

安全认证的基本方向

ES 不应该裸奔在公网。至少要做到：

不暴露 9200 到公网。
开启认证和授权。
使用最小权限账号。
重要环境启用 TLS。
定期轮换密码。
备份脚本不要硬编码长期有效的高权限密码。

启用认证后，客户端、elasticdump、同步任务都要配置账号密码或 API Key。

小结

ES 运维里最常见的动作是：备份、重建、迁移、切别名。

记住几条经验：

小规模导入导出可以用 elasticdump，生产备份优先用 snapshot。
字段类型和 analyzer 改不了时，新建索引再 reindex。
重建索引时，用别名做原子切换。
有持续写入时，必须设计增量追平方案。
导入优化参数用完要恢复。

把这些流程练熟，ES 后续演进就不会每次都像“大手术”。

备份、迁移和重建索引 ​

先说一个原则 ​

elasticdump 适合做什么？ ​

快照备份更适合生产 ​

为什么需要重建索引？ ​

重建索引的基本流程 ​

读写别名更稳 ​

Reindex 要注意什么？ ​

冷热数据和 ILM ​

安全认证的基本方向 ​

小结 ​

备份、迁移和重建索引

先说一个原则

elasticdump 适合做什么？

快照备份更适合生产

为什么需要重建索引？

重建索引的基本流程

读写别名更稳

Reindex 要注意什么？

冷热数据和 ILM

安全认证的基本方向

小结