如何解决基于前缀和自定义标记器的Elasticsearch自动建议
我目前正在使用ngram进行自动提示功能。
我有下面的过滤器,分析器:
"nGram_filter": {
"type": "nGram","min_gram": 3,"max_gram": 10,"token_chars": [
"letter","digit","punctuation","symbol"
]
}
"nGram_analyzer": {
"type": "custom","tokenizer": "whitespace","filter": [
"lowercase","asciifolding","nGram_filter"
]
}
现在,当我标记样本数据test_table_for analyzers
并搜索字符串 test , table ,分析器时,我可以以上记录。现在,我了解到令牌是使用我指定的过滤器创建的,因此可以正常工作。
但是我需要为此添加另一个功能-我也需要启用前缀过滤器。
例如:当我搜索 test_table (10个字符)时,由于最大n-gram为10,所以我能够获得结果,但是当我尝试 test_table_for 时,它返回零结果,因为记录test_table_for analyzers
中没有此令牌。
如何为现有的n-gram分析器添加基于前缀的过滤器?就像我应该能够在搜索时得到最多匹配10个字符的结果(当前有效),而且我也应该从搜索开始就建议搜索字符串何时匹配记录。
解决方法
使用单个分析器是不可能的,您必须创建另一个字段,您可以在其中创建一个edge_ngram tokens,该字段将用于prefix
搜索,添加索引映射,显示其中还包括您的电流分析仪。
索引映射
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram","min_gram": 1,"max_gram": 30
},"nGram_filter": {
"type": "nGram","min_gram": 3,"max_gram": 10,"token_chars": [
"letter","digit","punctuation","symbol"
]
}
},"analyzer": {
"prefixanalyzer": {
"type": "custom","tokenizer": "standard","filter": [
"lowercase","autocomplete_filter"
]
},"ngramanalyzer": {
"type": "custom","nGram_filter"
]
}
}
},"index.max_ngram_diff" : 30
},"mappings": {
"properties": {
"title_prefix": {
"type": "text","analyzer": "prefixanalyzer","search_analyzer": "standard"
},"title" :{
"type": "text","analyzer": "ngramanalyzer","search_analyzer": "standard"
}
}
}
}
现在您可以使用analyze
API来确认前缀令牌:
{
"analyzer": "prefixanalyzer","text" : "test_table_for analyzers"
}
并且您的令牌test_table_for
也存在,如下所示
{"tokens":[{"token":"t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"te",{"token":"tes",{"token":"test",{"token":"test_",{"token":"test_t",{"token":"test_ta",{"token":"test_tab",{"token":"test_tabl",{"token":"test_table",{"token":"test_table_",{"token":"test_table_f",{"token":"test_table_fo",{"token":"test_table_for",{"token":"a","start_offset":15,"end_offset":24,"position":1},{"token":"an",{"token":"ana",{"token":"anal",{"token":"analy",{"token":"analyz",{"token":"analyze",{"token":"analyzer",{"token":"analyzers","position":1}]}
现在,您可以使用multi-match查询,这将为您提供所需的搜索结果,如下所示:
搜索查询
{
"query": {
"multi_match": {
"query": "test_table_for","fields": [
"title","title_prefix"
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_63981157","_type": "_doc","_id": "1","_score": 0.45920232,"_source": {
"title_prefix": "test_table_for analyzers","title": "test_table_for analyzers"
}
}
]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。