ElasticSearch-在字段总术语中使用minimum_should_match百分比

如何解决ElasticSearch-在字段总术语中使用minimum_should_match百分比

查询其他短语中包含的短语时，我试图避免出现误报结果。

我的希望是，通过使用minimum_should_match参数，我将能够根据字段的总数将其设置为最少的字词。

{
   "match": {
       "notices.title": {
           "query": "Juan Pedro","minimum_should_match": "-1"
        }
   }
}

预期结果是匹配 A.title =“ Juan Pedro博士”，但否 B.title =“ Dr.胡安·佩德罗·潘”。如您所见，根据A中的术语总数，查询匹配-1，而B上的查询则为-2。

我已经阅读了文档，并且知道参数是要计算查询中子句总数所需的最小值，但是我希望有一种方法可以参考字段。

有什么想法吗？谢谢！

更新

按照@PrernaGupta所述的解决方案，为了避免不得不在查询中创建可变数量的匹配词，我最终使用了matchphrase。然后，我使用了字符串+1中的令牌数量来与创建的 title.length 字段进行比较。这似乎正在工作。让我知道您是否相信它会产生我没有看到的任何其他错误。

"bool": {
    "must": [
         {
           "match_phrase": {
                 "notices.title": {
                     "query": "Juan Pedro"
                  }
            }
          },{
            "term": {
                "notices.title.length": 3
             }
          }
     ]
}

再次感谢！

解决方法

您可能会使用must和must_not

“ minimum_should_match”：“-1”选项是“ Juan”或“ Pedro”匹配

您可以使用token_count字段数据类型来达到您的minimum_should_match条件。

映射：

shmem-key

索引数据：

shm_open

搜索查询：

ipcs -m

搜索结果：

"mappings": {
            "properties": {
                "notices": {
                    "properties": {
                        "title": {
                            "type": "text","fields": {
                                "length": {
                                    "type": "token_count","analyzer": "whitespace"
                                }
                            }
                        }
                    }
                }
            }
        }

您可以在此处编辑{"notices.title": "Dr. Juan Pedro"} {"notices.title": "Dr. Juan Pedro Pan"} {"notices.title": "Dr. Juan abc"}值，其中包含所需的术语总数，包括“ Juan”和“ Pedro”。