标准令牌过滤器在Elasticsearch中到底做什么？（标准令牌工具）

25-02-24 15

如果您想了解标准令牌过滤器在Elasticsearch中到底做什么？的相关知识，那么本文是一篇不可错过的文章，我们将对标准令牌工具进行全面详尽的解释，并且为您提供关于DebeziumPostgres和

如果您想了解标准令牌过滤器在Elasticsearch中到底做什么？的相关知识，那么本文是一篇不可错过的文章，我们将对标准令牌工具进行全面详尽的解释，并且为您提供关于Debezium Postgres和ElasticSearch-在ElasticSearch中存储复杂对象、ElasticSearch n-gram令牌过滤器未找到部分单词、elasticsearch 中的令牌究竟是什么？、elasticsearch中如何应用过滤器？的有价值的信息。

本文目录一览：

标准令牌过滤器在Elasticsearch中到底做什么？（标准令牌工具）
Debezium Postgres和ElasticSearch-在ElasticSearch中存储复杂对象
ElasticSearch n-gram令牌过滤器未找到部分单词
elasticsearch 中的令牌究竟是什么？
elasticsearch中如何应用过滤器？

标准令牌过滤器在Elasticsearch中到底做什么？（标准令牌工具）

文档中没有示例，我只是想知道从输入中得到什么。

答案1

小编典典

在Elasticsearch 0.16（Lucene
3.1）之前，标准令牌过滤器是“标准化由标准令牌生成器提取的令牌”。具体来说，它是删除''s首字母缩写词和句点的末尾。因此，通过标准过滤器后，Apple''s
C.E.O将成为Apple CEO过去。从Elasticsearch 0.16（Lucene
3.1）开始，标准令牌过滤器不执行任何操作（至少在当前情况下）。它只是将令牌传递到链中的下一个过滤器。

Debezium Postgres和ElasticSearch-在ElasticSearch中存储复杂对象

您需要使用发件箱模式，请参见https://debezium.io/documentation/reference/1.2/configuration/outbox-event-router.html

或者您可以使用聚合对象，请参见 https://github.com/debezium/debezium-examples/tree/master/jpa-aggregations https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join

ElasticSearch n-gram令牌过滤器未找到部分单词

我一直在与ElasticSearch一起玩我的一个新项目。我已将默认分析器设置为使用ngram
tokenfilter。这是我的elasticsearch.yml文件：

index:analysis:    analyzer:        default_index:            tokenizer: standard            filter: [standard, stop, mynGram]        default_search:            tokenizer: standard            filter: [standard, stop]    filter:        mynGram:            type: nGram            min_gram: 1            max_gram: 10

我创建了一个新索引并向其中添加了以下文档：

$ curl -XPUT http://localhost:9200/test/newtype/3 -d ''{"text": "one two three four five six"}''{"ok":true,"_index":"test","_type":"newtype","_id":"3"}

但是，当我使用查询text:hree或text:ive任何其他部分术语进行搜索时，ElasticSearch不会返回此文档。仅当我搜索确切的字词（如text:two）时，它才会返回文档。

我还尝试过更改配置文件，以便default_search也使用ngram令牌过滤器，但结果是相同的。我在这里做错什么，如何纠正？

答案1

小编典典

不确定default_ *设置。但是应用指定index_analyzer和search_analyzer的映射有效：

curl -XDELETE localhost:9200/twittercurl -XPOST localhost:9200/twitter -d ''{"index":   { "number_of_shards": 1,    "analysis": {       "filter": {                  "mynGram" : {"type": "nGram", "min_gram": 2, "max_gram": 10}                 },       "analyzer": { "a1" : {                    "type":"custom",                    "tokenizer": "standard",                    "filter": ["lowercase", "mynGram"]                    }                  }      }  }}}''curl -XPUT localhost:9200/twitter/tweet/_mapping -d ''{    "tweet" : {        "index_analyzer" : "a1",        "search_analyzer" : "standard",         "date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],        "properties" : {            "user": {"type":"string", "analyzer":"standard"},            "message" : {"type" : "string" }        }    }}''curl -XPUT ''http://localhost:9200/twitter/tweet/1'' -d ''{    "user" : "kimchy",    "post_date" : "2009-11-15T14:12:12",    "message" : "trying out Elastic Search"}''curl -XGET localhost:9200/twitter/_search?q=earcurl -XGET localhost:9200/twitter/_search?q=seacurl -XGET localhost:9200/twitter/_mapping

elasticsearch 中的令牌究竟是什么？

如何解决elasticsearch 中的令牌究竟是什么？？

我用谷歌搜索了我的问题，但找不到答案。我对 elasticsearch 还很陌生，我想我还没有了解令牌的概念。

我已经使用自定义 name_analyzer 构建了一个映射，该映射使用过滤器小写、唯一和 asciifolding，并带有 preserve_original=true。

我有字段 search_combo_name 和内容例如是这样的：

André,André Mustermann,andre.mustermann@gmail.com,Mustermann

当我使用 kibana 根据我的 name_analyzer 分析上面的字符串时，我得到以下结果：

{
  "tokens" : [
    {
      "token" : "andre","start_offset" : 0,"end_offset" : 5,"type" : "<ALPHANUM>","position" : 0
    },{
      "token" : "andré",{
      "token" : "mustermann","start_offset" : 13,"end_offset" : 23,"position" : 1
    },{
      "token" : "andre.mustermann","start_offset" : 25,"end_offset" : 41,"position" : 2
    },{
      "token" : "gmail.com","start_offset" : 42,"end_offset" : 51,"position" : 3
    }
  ]
}

这就是我期望的结果，但是这些令牌有什么用呢？当我使用 bool must/should 或 match 搜索时，elasticsearch 搜索字段的内容而不是标记，对吗？

解决方法

这些标记将被编入索引，然后您可以搜索。

所有查询都将在这些标记上运行（即不直接在原始内容上），这就是为什么设置适当的字段类型和分析器很重要（在 text 字段的情况下） ) 将数据索引到 Elasticsearch 中时。

不这样做可能会导致相关性不佳（以及性能不佳），即查询结果不佳和/或不精确，或者查询执行时间过长。这是一个非常广泛的主题，但如果您能更详细地介绍您的用例，我们可以提供更好的帮助。

elasticsearch中如何应用过滤器？

在ES中是否在查询之前应用了过滤器？

举例来说，我正在做一个非常缓慢的模糊搜索，但是我只是在很小的日期范围内进行。例如，您可以看下面的（PHP）：

$res=$client->search(array(''index'' => ''main'', ''body'' => array(    ''query'' => array(    ''bool'' => array(        ''should'' => array(            array(''wildcard'' => array(''title'' => ''*123*'')),        )    )    ),    ''filter'' => array(        ''and'' => array(            array(''range'' => array(''created'' => array(''gte'' => date(''c'',time()-3600), ''lte'' => date(''c'',time()+3600))))        )    ),    ''sort'' => array())));

在尝试进行较慢的搜索之前，是否会应用过滤器？

逻辑将要求先运行筛选器，然后再运行查询，但是我想确定一下。

答案1

小编典典

如果使用filtered-query，则会在对文档计分之前应用过滤器。

通常，这会大大加快速度。但是，模糊查询将仍然使用输入来构建更大的查询，而与过滤器无关。

当您filter在search对象上使用权时，查询将首先运行而不考虑过滤器，然后将文档从 匹配中 过滤掉-而构面将保持未过滤状态。

因此，filtered至少在不使用构面时，几乎应始终使用-query。

今天的关于标准令牌过滤器在Elasticsearch中到底做什么？和标准令牌工具的分享已经结束，谢谢您的关注，如果想了解更多关于Debezium Postgres和ElasticSearch-在ElasticSearch中存储复杂对象、ElasticSearch n-gram令牌过滤器未找到部分单词、elasticsearch 中的令牌究竟是什么？、elasticsearch中如何应用过滤器？的相关知识，请在本站进行查询。

本文标签：