ElasticSearch搜索相关性及打分的相关原理

news2024/10/7 4:25:34

文章目录

  • 一、相关性和打分简介
  • 二、TF-IDF得分计算公式
  • 三、BM25(Best Matching 25)
  • 四、使用explain查看TF-IDF
  • 五、通过Boosting控制相关度

一、相关性和打分简介

手机

举个例子来说明:

  1. 假设有一个电商网站,用户在搜索框中输入了关键词"手机",然后触发了搜索操作。Elasticsearch会根据用户的查询,在索引中找到所有包含"手机"的文档,并按照相关性对这些文档进行打分。

  2. 相关性评分的目的是确定搜索结果的质量和排序。相关性评分越高,表示搜索结果与用户查询的匹配程度越好。

  3. 例如,对于一个包含"手机"关键词的文档,如果它在标题、描述和其他字段中多次出现"手机",那么它的相关性评分可能会很高。而对于一个只在描述中出现一次"手机"的文档,它的相关性评分可能会较低。

  4. 在Elasticsearch 5.0版本之前,Elasticsearch使用的是TF-IDF(Term Frequency-Inverse Document Frequency)算法来进行相关性判断和打分。

  5. TF-IDF算法是一种经典的信息检索算法,它考虑了词频(Term Frequency)和逆文档频率(Inverse Document Frequency)
    词频表示词在文档中的出现次数
    逆文档频率表示词在整个文档集合中的普遍程度。

  6. 根据TF-IDF算法,搜索词在文档中出现的次数越多,这个词对文档的相关性贡献越大。但是,如果这个词在整个文档集合中出现的次数越多,它对文档的相关性贡献越小,因为它在整个集合中普遍存在,不足以区分文档的重要性。

  7. 从Elasticsearch 5.0版本开始,默认使用了BM25(Best Matching 25)算法来进行相关性判断和打分。BM25算法在计算相关性得分时,考虑了文档的长度和搜索词的位置等因素。

  8. BM25算法相对于TF-IDF算法更为先进和准确,它在实际应用中表现更好。所以,对于Elasticsearch 5.0及以后的版本,推荐使用BM25算法来进行相关性判断和打分。

二、TF-IDF得分计算公式

TF-IDF(Term Frequency-Inverse Document Frequency)得分计算公式是一种用于衡量词语在文档集合中重要性的指标。其计算公式如下:

TF-IDF = TF * IDF

其中,TF(Term Frequency)表示词语在文档中的频率,可以通过以下公式计算:

TF = (词语在文档中出现的次数) / (文档中总词语数)

IDF(Inverse Document Frequency)表示逆文档频率,可以通过以下公式计算:

IDF = log((文档集合中文档总数) / (包含词语的文档数))

TF-IDF得分越高,表示词语在文档集合中越重要。它的核心思想是:当一个词语在某个文档中频繁出现(高TF值),同时在其他文档中较少出现(高IDF值)时,该词语对于该文档的重要性较高。

TF-IDF常用于信息检索、文本分类、关键词提取等自然语言处理任务中。

三、BM25(Best Matching 25)

BM25(Best Matching 25)算法可以被视为对TF-IDF算法的改进和扩展。BM25算法是一种用于信息检索的评分算法,相较于TF-IDF算法,它在一些方面进行了改进,以提高检索结果的质量。
bm2.5

主要的改进包括以下几个方面:

  1. 考虑文档长度:TF-IDF算法仅仅考虑了词频,对文档长度没有考虑。而BM25算法引入了文档长度因子,可以更好地处理文档长度不同的情况。

  2. 调整词频饱和度:TF-IDF算法中,词频的值会随着出现次数的增加而线性增长,容易导致过度强调高频词语。而BM25算法使用了对数函数来调整词频的饱和度,使得高频词语的权重增长趋于饱和。

  3. 引入文档频率饱和度:BM25算法引入了文档频率的饱和度因子,用于调整文档频率的影响。这可以避免过度强调出现在大多数文档中的常见词语。

  4. 综上所述,BM25算法在TF-IDF算法的基础上进行了改进,更加准确地评估了词语在文档集合中的重要性,提高了信息检索的效果。

四、使用explain查看TF-IDF

通过使用Elasticsearch的Explain API,你可以查看特定查询的TF-IDF得分。以下是一个示例,展示如何使用Explain API来计算TF-IDF得分:

  1. 首先,在Kibana的Dev Tools中执行以下命令,创建一个新的索引并索引一些文档数据:
PUT my_index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

POST my_index/_doc
{
  "text": "This is the first document."
}

POST my_index/_doc
{
  "text": "This document is the second document."
}

POST my_index/_doc
{
  "text": "And this is the third one."
}

POST my_index/_doc
{
  "text": "Is this the first document?"
}

执行如下:
在这里插入图片描述
2. 然后,使用Explain API来查看TF-IDF得分。在Kibana的Dev Tools中执行以下命令:

GET my_index/_search
{
  "explain": true,
  "query": {
    "match": {
      "text": "this is the first document"
    }
  }
}

以上命令将返回查询结果,并在每个匹配的文档中提供一个_explanation字段,其中包含有关得分计算的详细信息,包括TF-IDF得分。
explain
请注意,Explain API返回的_explanation字段中包含了与得分相关的详细信息,如词频、文档频率、字段长度等。你可以通过解析_explanation字段来提取和分析TF-IDF得分。
3. explain内容如下:

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.4186639,
    "hits" : [
      {
        "_shard" : "[my_index][0]",
        "_node" : "NkTFrxzGQuOf4zelToSeKQ",
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "08FDOokBF8lAViln2DRj",
        "_score" : 1.4186639,
        "_source" : {
          "text" : "This is the first document."
        },
        "_explanation" : {
          "value" : 1.4186639,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.10943023,
              "description" : "weight(text:this in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10943023,
              "description" : "weight(text:is in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10943023,
              "description" : "weight(text:the in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.7199211,
              "description" : "weight(text:first in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.7199211,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.6931472,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.3704521,
              "description" : "weight(text:document in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.3704521,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.35667494,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 3,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[my_index][0]",
        "_node" : "NkTFrxzGQuOf4zelToSeKQ",
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1sFEOokBF8lAVilnADS4",
        "_score" : 1.4186639,
        "_source" : {
          "text" : "Is this the first document?"
        },
        "_explanation" : {
          "value" : 1.4186639,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.10943023,
              "description" : "weight(text:this in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10943023,
              "description" : "weight(text:is in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10943023,
              "description" : "weight(text:the in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10943023,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.7199211,
              "description" : "weight(text:first in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.7199211,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.6931472,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 2,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.3704521,
              "description" : "weight(text:document in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.3704521,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.35667494,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 3,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.472103,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[my_index][0]",
        "_node" : "NkTFrxzGQuOf4zelToSeKQ",
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1MFDOokBF8lAViln6TSq",
        "_score" : 0.78294927,
        "_source" : {
          "text" : "This document is the second document."
        },
        "_explanation" : {
          "value" : 0.78294927,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.10158265,
              "description" : "weight(text:this in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10158265,
              "description" : "weight(text:is in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10158265,
              "description" : "weight(text:the in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.47820133,
              "description" : "weight(text:document in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.47820133,
                  "description" : "score(freq=2.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.35667494,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 3,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.6094183,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 2.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[my_index][0]",
        "_node" : "NkTFrxzGQuOf4zelToSeKQ",
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1cFDOokBF8lAViln9jS-",
        "_score" : 0.30474794,
        "_source" : {
          "text" : "And this is the third one."
        },
        "_explanation" : {
          "value" : 0.30474794,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.10158265,
              "description" : "weight(text:this in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10158265,
              "description" : "weight(text:is in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.10158265,
              "description" : "weight(text:the in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.10158265,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.105360515,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 4,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 4,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.43824703,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

五、通过Boosting控制相关度

在搜索引擎和信息检索领域中,Boosting可以用来提升或降低文档的相关度得分,以便更好地满足用户的查询需求。

Boosting可以在不同的层次上进行,包括查询级别的Boosting和文档级别的Boosting。

1、查询级别的Boosting:在查询级别上,可以使用Boosting来提高或降低特定查询的相关度得分。这种Boosting通常通过设置查询子句的权重或使用特定的查询类型来实现。例如,可以使用更高的权重来突出某些关键词,或使用更复杂的查询类型(如布尔查询、模糊查询)来调整相关度得分。

2、文档级别的Boosting:在文档级别上,可以使用Boosting来提高或降低特定文档的相关度得分。这种Boosting通常通过为文档定义一个额外的因子或属性来实现。例如,可以为某些文档设置更高的权重,以便它们在搜索结果中排名更靠前,或者可以为某些文档设置更低的权重,以便它们在搜索结果中排名更靠后。

通过使用Boosting技术,可以根据具体需求和优先级来调整搜索结果的相关度得分,从而提供更好的用户体验和更精确的搜索结果。

以下是一个使用boost参数的示例:

1、首先,使用Kibana的Dev Tools或者其他方式创建一个索引,并添加一些文档。例如,创建一个名为my_index的索引,并添加一些包含title和content字段的文档:

PUT my_index
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "content": { "type": "text" }
    }
  }
}

POST my_index/_doc
{
  "title": "Elasticsearch is powerful",
  "content": "Elasticsearch is a highly scalable and distributed search engine"
}

POST my_index/_doc
{
  "title": "Kibana is a visualization tool",
  "content": "Kibana is used to explore and visualize data stored in Elasticsearch"
}

2、然后,执行一个带有boost参数的查询来调整字段的相关度得分。例如,执行一个查询,将title字段的相关度得分提高两倍:

GET my_index/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearch",
        "boost": 2
      }
    }
  }
}

在上述查询中,我们使用了match查询来搜索包含关键词"Elasticsearch"的文档。
通过在match查询中设置boost参数为2,我们将title字段的相关度得分提高了两倍。
这意味着包含关键词"Elasticsearch"的文档在结果中会更加相关。

boost

  • 当boost大于1时,提高字段的相关度得分,使其在结果排序中具有更高的权重。
  • 当0 < boost < 1时,降低字段的相关度得分,使其在结果排序中具有较低的权重。
  • 当boost小于0时,会为字段贡献负分,可能导致字段在结果排序中的得分变为负数。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/768498.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

前端 mock 数据的几种方式

目录 接口demo Better-mock just mock koa webpack Charles 总结 具体需求开发前&#xff0c;后端往往只提供接口文档&#xff0c;对于前端&#xff0c;最简单的方式就是把想要的数据写死在代码里进行开发&#xff0c;但这样的坏处就是和后端联调前还需要再把写死的数据…

计算机视觉和滤帧技术

01 什么是计算机视觉 进入主题之前&#xff0c;先聊聊计算机视觉。 计算机视觉是指利用摄像头和电脑识别、跟踪和测量目标&#xff0c;并进行图像处理&#xff0c;使其适合人眼观察或仪器检测。作为一个科学学科&#xff0c;计算机视觉模拟生物视觉&#xff0c;旨在实现图像和视…

DAY48:动态规划(十一)爬楼梯(进阶版)+零钱兑换(理解DP数组“装满“含义)

文章目录 70.爬楼梯&#xff08;改版题目&#xff09;思路遍历顺序 完整版总结面试情况 322.零钱兑换&#xff08;DP数组含义的进一步理解&#xff09;思路DP数组含义递推公式遍历顺序初始化最开始的写法debug逻辑错误&#xff1a;背包不一定装满 修改完整版递归逻辑分析背包是…

极值理论 EVT、POT超阈值、GARCH 模型分析股票指数VaR、条件CVaR:多元化投资组合预测风险测度分析...

全文链接&#xff1a;http://tecdat.cn/?p24182 本文用 R 编程语言极值理论 (EVT) 以确定 10 只股票指数的风险价值&#xff08;和条件 VaR&#xff09;&#xff08;点击文末“阅读原文”获取完整代码数据&#xff09;。 使用 Anderson-Darling 检验对 10 只股票的组合数据进行…

EFCore—context在其他程序集时如何进行数据迁移

场景 解决方案&#xff1a; 代码示例&#xff1a; 场景 一般来说&#xff0c;如果efcore进行数据迁移的步骤如下 安装nuget包创建实体类创建config创建dbcontext 然后执行如下命令就可以成功迁移了 Add-Migration Init Update-Database 一执行&#xff0c;报错 Unable t…

(四)「消息队列」之 RabbitMQ 路由(使用 .NET 客户端)

0、引言 先决条件 本教程假设 RabbitMQ 已安装并且正在 本地主机 的标准端口&#xff08;5672&#xff09;上运行。如果您使用了不同的主机、端口或凭证&#xff0c;则要求调整连接设置。 获取帮助 如果您在阅读本教程时遇到问题&#xff0c;可以通过邮件列表或者 RabbitMQ 社区…

k8s之Pod容器的探针

目录 一、Pod 容器的探针二、探针的探测方式2.1 存活探针的使用2.1.1 exec方式2.1.2 httpGet方式2.1.3 tcpSocket方式 2.2 就绪探针的使用2.3 启动探针的使用2.4 7.Pod 容器的启动和退出动作 三、总结 一、Pod 容器的探针 探针是由 kubelet 对容器执行的定期诊断&#xff08;p…

M1安装ParallelsDesktop-18 重启镜像后无网络

M1安装ParallelsDesktop-18 重启镜像后无网络 重新关联镜像 sudo -b /Applications/Parallels\ Desktop.app/Contents/MacOS/prl_client_app

手机定屏死机问题操作指南

和你一起终身学习&#xff0c;这里是程序员Android 经典好文推荐&#xff0c;通过阅读本文&#xff0c;您将收获以下知识点: 一、定屏死机问题抓取 Log 要求二、 复现定屏死机问题后做什么三、检查adb是否可连的方法四、连接adb 抓取以下Log五、如果adb不可连&#xff0c;执行下…

一、数制及其转换

目录 常用进制介绍 十进制 二进制 八进制 十六进制 进制转换 二进制转十进制 十进制转二进制 二进制转十六进制 十六进制转二进制 二进制转八进制 八进制转二进制 十六进制转八进制 八进制转十六进制 常用进制介绍 十进制 介绍&#xff1a;十进制是日常生活中最…

基于linux下的高并发服务器开发(第二章)- 2.9 waitpid 函数

#include <sys/types.h> #include <sys/wait.h>pid_t waitpid(pid_t pid, int *wstatus, int options); 功能&#xff1a;回收指定进程号的子进程&#xff0c;可以设置是否阻塞。 参数&#xff1a; - pid: pid > 0…

Versal ACAP在线升级之Boot Image格式

1、简介 Xilinx FPGA、SOC器件和自适应计算加速平台&#xff08;ACAPs&#xff09;通常由多个硬件和软件二进制文件组成&#xff0c;用于启动这些设备后按照预期设计进行工作。这些二进制文件可以包括FPGA比特流、固件镜像、bootloader引导程序、操作系统和用户选择的应…

打造有效的产品帮助中心的关键步骤

在当今竞争激烈的市场中&#xff0c;为产品提供优质的客户支持是至关重要的。而一个有效的产品帮助中心可以成为解决客户问题和提供相关信息的中心。本文将介绍打造一个有效的产品帮助中心的关键步骤&#xff0c;帮助你提升客户体验并增强产品的用户满意度。 1. 选择合适的管理…

记一次ruoyi中使用Quartz实现定时任务

一、首先了解一下Quartz Quartz是OpenSymphony开源组织在Job scheduling领域又一个开源项目&#xff0c;它可以与J2EE与J2SE应用程序相结合也可以单独使用。Quartz可以用来创建简单或为运行十个&#xff0c;百个&#xff0c;甚至是好几万个Jobs这样复杂的程序。Jobs可以做成标…

13matlab数据分析多项式的求值(matlab程序)

1.简述 统计分析常用函数 求最大值 max 和 sum 积 prod 平均值&#xff1a;mean 累加和&#xff1a;cumsum 标准差&#xff1a;std 方差&#xff1a;var 相关系数&#xff1a;corrcoef 排序&#xff1a;sort 四则运算 1.多项式的加减运算就是所对应的系数向量的加减运算&#…

ToT: 利用大语言模型解决需要深思熟虑的问题(下)

ToT 摘要介绍利用大语言模型进行有意识的问题解决1. 思维分解2. 思维产生 G(p,s,k)3. 状态评估V(p,S)4. 搜索算法 实验24游戏1). 任务设置2). 基准3). ToT设置4).结果5). 错误分析 创意写作1). 任务设置2).基准3).ToT设置4).结果 交叉词 相关工作规划和决策自我反省程序引导的L…

常见问题-wp

指定顺序展示富集分析的term 调整热图的label角度 h1ggheatmap(dat[cg1,],cluster_rows T, #是否对行聚类cluster_cols T, #是否对列聚类tree_height_rows 0.28, #行聚类树高度tree_height_cols 0.1, #列聚类树高度annotation_cols group_list, #为列添加分组annotation_c…

软件检测报告对软件产品起的作用和编写原则分析

软件检测报告是一项对软件进行全面测试和评估的结果总结&#xff0c;通过对软件的功能、性能、安全性等方面的测试&#xff0c;以及通过分析软件的可靠性和稳定性&#xff0c;来评估软件的质量和合规性。 一、软件检测报告对软件产品起到的作用 1、提供一个全面的评估和分析软…

认识主被动无人机遥感数据、预处理无人机遥感数据、定量估算农林植被关键性状、期刊论文插图精细制作与Appdesigner应用开发

目录 第一章、认识主被动无人机遥感数据 第二章、预处理无人机遥感数据 第三章、定量估算农林植被关键性状 第四章、期刊论文插图精细制作与Appdesigner应用开发 更多推荐 遥感技术作为一种空间大数据手段&#xff0c;能够从多时、多维、多地等角度&#xff0c;获取大量的…

NAT 地址转换路由器配置命令(华为路由器)

#AR1路由器配置 # acl 2000 rule permit source any # interface GigabitEthernet0/0/1 nat outbound 2000 ip address 1.1.1.1 24 # interface GigabitEthernet0/0/0 ip address 172.16.1.1 255.255.255.0 # ip route-static 0.0.0.0 0.0.0.0 1.1.1.2 ip route-static …