Query DSL 常见查询怎么写？

ES 查询主要使用 Query DSL，它本质上是一套 JSON 查询语言。刚开始看会觉得嵌套很多，但只要抓住几个核心查询，就能覆盖大多数业务场景。

先分清 query 和 filter

ES 查询里有两个重要概念：

query context：会计算相关性分数 _score。
filter context：只判断是否匹配，不计算分数，结果更容易被缓存。

比如全文搜索标题时，需要算相关性，用 query：

json

{ "match": { "title": "Elasticsearch 查询" } }

比如过滤状态、时间范围、标签时，不需要算分，用 filter：

json

{ "term": { "status": "published" } }

生产里常见写法是：全文检索放 must，精确条件放 filter。

term：精确匹配

term 不会对查询值做分词，适合 keyword、数字、日期、布尔字段。

http

GET /articles/_search
{
  "query": {
    "term": {
      "status": "published"
    }
  }
}

多个精确值用 terms：

http

GET /articles/_search
{
  "query": {
    "terms": {
      "tags": ["MySQL", "Elasticsearch"]
    }
  }
}

如果字段是 text，term 查询匹配的是倒排索引里的 term，不是原始文本。比如 title 被分词后，拿完整标题去 term 查，可能查不到。

match：全文检索

match 会先对查询内容分词，再去倒排索引里匹配，适合 text 字段。

http

GET /articles/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch 写入流程"
    }
  }
}

默认情况下，多个词之间通常是 OR 关系。可以改成 AND：

json

{
  "match": {
    "title": {
      "query": "Elasticsearch 写入流程",
      "operator": "and"
    }
  }
}

AND 更严格，召回少；OR 更宽松，召回多。实际搜索常常需要在召回和精度之间取平衡。

match_phrase：短语匹配

match_phrase 会要求分词后的词按顺序相邻或接近，适合搜索短语。

http

GET /articles/_search
{
  "query": {
    "match_phrase": {
      "title": "写入 流程"
    }
  }
}

可以通过 slop 允许中间隔几个词：

json

{
  "match_phrase": {
    "title": {
      "query": "Elasticsearch 流程",
      "slop": 2
    }
  }
}

range：范围查询

range 常用于日期、数字：

http

GET /articles/_search
{
  "query": {
    "range": {
      "publishTime": {
        "gte": "2025-01-01 00:00:00",
        "lt": "2026-01-01 00:00:00"
      }
    }
  }
}

参数含义：

gt：大于。
gte：大于等于。
lt：小于。
lte：小于等于。

日期字段建议用 date 类型。不要把日期当 keyword 再做 range，除非格式严格保证字典序等于时间序。

exists：字段是否存在

查询存在 category 字段的文档：

json

{ "exists": { "field": "category" } }

查询不存在 category 字段的文档，放到 must_not：

json

{
  "bool": {
    "must_not": [
      { "exists": { "field": "category" } }
    ]
  }
}

注意：字段值是空数组或 null 时，通常不会被索引，也可能表现为不存在。

bool：组合查询

bool 是最常用的组合查询，它有四个子句：

must：必须匹配，会影响分数。
filter：必须匹配，不影响分数。
should：应该匹配，常用于 OR 或提升分数。
must_not：必须不匹配，不影响分数。

典型业务查询：

http

GET /articles/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" } }
      ],
      "filter": [
        { "term": { "status": "published" } },
        { "range": { "publishTime": { "gte": "2025-01-01 00:00:00" } } }
      ],
      "must_not": [
        { "terms": { "articleId": ["a1", "a2"] } }
      ],
      "should": [
        { "term": { "tags": "搜索" } },
        { "term": { "tags": "数据库" } }
      ]
    }
  }
}

这里 must 决定全文相关性，filter 负责业务硬条件，should 命中后会提高分数。

should 的一个坑

bool 里如果只有 should，没有 must/filter，默认至少要匹配一个 should。

json

{
  "bool": {
    "should": [
      { "term": { "tags": "MySQL" } },
      { "term": { "tags": "ES" } }
    ]
  }
}

这类似：

sql

tags = 'MySQL' OR tags = 'ES'

但如果 bool 里已经有 must 或 filter，should 默认会变成“可选加分项”。如果你希望 should 仍然至少命中一个，要显式设置：

json

{
  "bool": {
    "filter": [
      { "term": { "status": "published" } }
    ],
    "should": [
      { "term": { "tags": "MySQL" } },
      { "term": { "tags": "ES" } }
    ],
    "minimum_should_match": 1
  }
}

这个规则很重要，很多“为什么 OR 条件没生效”的问题都和它有关。

排序

按发布时间倒序：

http

GET /articles/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "publishTime": { "order": "desc" } },
    { "_id": { "order": "desc" } }
  ]
}

如果同时要按相关性和时间排序：

json

"sort": [
  { "_score": { "order": "desc" } },
  { "publishTime": { "order": "desc" } },
  { "_id": { "order": "desc" } }
]

排序字段通常要用 keyword、date、number 等类型，不要直接用 text。

分页：from/size 和 search_after

普通分页：

http

GET /articles/_search
{
  "from": 0,
  "size": 10,
  "query": { "match_all": {} }
}

from/size 适合浅分页。深分页会让每个分片取出大量候选结果，再交给协调节点排序合并，成本很高。

深分页建议用 search_after：

http

GET /articles/_search
{
  "size": 10,
  "query": { "match_all": {} },
  "sort": [
    { "publishTime": "desc" },
    { "_id": "desc" }
  ],
  "search_after": ["2025-05-16 10:00:00", "article_100"]
}

search_after 要求排序字段稳定，下一页使用上一页最后一条结果的 sort 值。

高亮

搜索结果中高亮命中的关键词：

http

GET /articles/_search
{
  "query": {
    "match": { "title": "Elasticsearch" }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

高亮会额外消耗资源，大字段、大结果集要谨慎使用。

聚合

统计每个标签下有多少文章：

http

GET /articles/_search
{
  "size": 0,
  "aggs": {
    "tag_count": {
      "terms": {
        "field": "tags",
        "size": 20
      }
    }
  }
}

按时间分桶：

http

GET /articles/_search
{
  "size": 0,
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "publishTime",
        "calendar_interval": "day"
      }
    }
  }
}

聚合字段通常需要 doc_values，keyword、number、date 默认适合聚合。text 字段一般不直接聚合。

function_score：自定义排序分数

有时业务不只看文本相关性，还要叠加时间、标签权重、热度等因素。可以使用 function_score：

http

GET /articles/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            { "match": { "title": "Elasticsearch" } }
          ],
          "filter": [
            { "term": { "status": "published" } }
          ]
        }
      },
      "functions": [
        {
          "gauss": {
            "publishTime": {
              "origin": "now",
              "scale": "7d",
              "decay": 0.5
            }
          }
        },
        {
          "filter": { "term": { "tags": "重点" } },
          "weight": 2
        }
      ],
      "score_mode": "sum",
      "boost_mode": "sum"
    }
  }
}

如果要写 script_score，也要注意脚本执行成本。能用字段权重、衰减函数解决的，就尽量不要上复杂脚本。

SQL 转 DSL

ES 提供 SQL translate API，可以把 SQL 转成 DSL：

http

POST /_sql/translate
{
  "query": "select * from articles where status = 'published' order by publishTime desc"
}

这个功能适合辅助理解，不建议完全依赖。复杂 SQL 转出来的 DSL 经常还需要人工调整，尤其是 nested、全文检索、相关性排序这些场景。

查询问题怎么排查？

可以按这个顺序查：

看 Mapping：字段是 text 还是 keyword？日期是不是 date？
用 _analyze 看分词结果。
用 term 查 keyword，用 match 查 text。
检查 bool 里的 should 是否需要 minimum_should_match。
检查时间范围格式和时区。
检查排序字段是否可排序。
用 profile API 分析慢查询。

profile 示例：

http

GET /articles/_search
{
  "profile": true,
  "query": {
    "match": { "title": "Elasticsearch" }
  }
}

小结

Query DSL 看起来很复杂，但常用套路很稳定：

精确过滤用 term/terms/range/exists，尽量放 filter。
全文检索用 match/match_phrase，放 query context。
多条件组合用 bool。
should 是否必须命中，要注意 minimum_should_match。
排序聚合用 keyword、date、number 字段。
深分页用 search_after。

把这些基础查询用熟，大多数 ES 业务查询就能写得清楚又稳定。

Query DSL 常见查询怎么写？ ​

先分清 query 和 filter ​

term：精确匹配 ​

match：全文检索 ​

match_phrase：短语匹配 ​

range：范围查询 ​

exists：字段是否存在 ​

bool：组合查询 ​

should 的一个坑 ​

排序 ​

分页：from/size 和 search_after ​

高亮 ​

聚合 ​

function_score：自定义排序分数 ​

SQL 转 DSL ​

查询问题怎么排查？ ​

小结 ​

Query DSL 常见查询怎么写？

先分清 query 和 filter

term：精确匹配

match：全文检索

match_phrase：短语匹配

range：范围查询

exists：字段是否存在

bool：组合查询

should 的一个坑

排序

分页：from/size 和 search_after

高亮

聚合

function_score：自定义排序分数

SQL 转 DSL

查询问题怎么排查？

小结