【论文阅读 CIKM2014】Extending Faceted Search to the General Web

news2025/1/13 14:12:45

文章目录

    • Foreword
    • Motivation
    • Method
      • Query facet generation:
      • Facet feedback
    • Evaluation

Foreword

  • This paper is from CIKM 2014, so we only consider the insights
  • I have read this paper last month and today i share this blog
  • There are many papers that have not been shared. More papers can be found in: ShiyuNee/Awesome-Conversation-Clarifying-Questions-for-Information-Retrieval: Papers about Conversation and Clarifying Questions (github.com)

Motivation

Extend faceted search into open-domain web setting, which we call Faceted We Search

Method

two major components in a FWS(Faceted Web Search) system:

  • query facet generation
  • facet feedback

Query facet generation:

Facet generation is typically performed in advance for an entire corpus, an approach which is challenging when extended to the general web

  • So, we use query facet generation to generate facets for a query

  • Extracting Candidates: as before(Extracting Query Facets from Search Results)

  • Refining Candidates: re-cluster the query facets or their facet term`s into higher quality query facets.

    • topic modeling(unsupervised):
      • assumption: candidate facets are generated by a mixture of hidden topics(query facets). After training, the topics are returned as query facets, by using top terms in each topic.
      • apply both pLSA and LDA
      • only use term co-occurence information
    • QDMiner / QDM(unsupervised):
      • applies a variation of the Quality Threshold clustering algorithm to cluster the candidate facets with bias towards important ones. Then it ranks/selects the facet clusters and the terms in those clusters based on TF/IDF-like scores.
      • consider more information than just term co-occurence, but it’s not easy to add new features
    • QF-I and QF-J (supervised): based on a graphical model
      • learns how likely it is that a term in the candidate facets should be selected in the query facets, and how likely two terms are to be grouped together into a same query facet, using a rich set of features.
      • based on the likelihood scores, QF-I selects the terms and clusters the selected terms into query facets, while QF-J
        repeats the procedure, trying to performance joint inference.

Facet feedback

Re-rank the search results.

  • Boolean Filtering Model:

    • condition can be AND, OR , A + O, etc.
      • S ( D , Q ) S(D,Q) S(D,Q) is the score returned by the original retrieval model

    在这里插入图片描述

  • Soft Ranking Model(better):

    • expand the original query with feedback terms, using a linear combination as follows:

      在这里插入图片描述

      • S ( D , Q ) S(D,Q) S(D,Q) is the score from the original retrieval model

      • S E ( D , F u ) S_E(D,F^u) SE(D,Fu)​ is the expansion part which captures the relevance between the document D and feedback facet F u F_u Fu, using expansion model E.

        • use original retrieval model to get document scores when the feedback terms are used as query.

          在这里插入图片描述

          or

          在这里插入图片描述

    • the original retrieval model:

      • incorporates word unigrams, adjacent word bigrams, and adjacent word proximity.

      在这里插入图片描述

Evaluation

Intrinsic evaluation does not necessarily reflect the utility of the generated facets in assisting search

  • some annotator-selected facets may be of little value for the search task, and some good facet terms may be missed by annotators.

We propose an extrinsic evaluation method directly measures the utility based on a FWS task

Intrinsic Evaluation:

“gold standard” query facets are constructed by human annotators and used as the ground truth to be compared with facets generated by different systems.

  • Conventional clustering metrics: Purity and Normalized Mutual InformationNMI
  • Newly designed metrics for facet generation: w P R E α , β wPRE_{\alpha,\beta} wPREα,β, some variations of n D C G nDCG nDCG

The facet annotation is usually done by first pooling facets generated by the different systems. Then annotators are asked to group or re-group terms in the pool into preferred query facets, and to give ratings for each of them regarding how useful or important the facet is.

Extrinsic Evaluation:

Propose an extrinsic evaluation method which evaluated a system based on an interactive search task that incorporates FWS

  • simulate the user feedback process based on an interaction model, using oracle feedback terms and facet terms collected from annotators.
  • Both the oracle feedback and annotator feedback incrementally select all feedback terms that a user may select, which will
    then be used in simulation based on the user model to determine which subset of the oracle or annotator feedback terms are selected by a user and how much time is spent giving that feedback.
  • Finally, the systems are evaluated by the re-ranking performance together with the estimated time cost.

Oracle and Annotator Feedback:

  • Oracle feedback: presents an ideal case of facet feedback, in which only effective terms
    • is cheap to obtain for any facet system
    • but may be quite different from what actual users may select in a real interaction.
      • So, also collect feedback terms from annotators
  • Annotator feedback

User Model:

Describes how a user selects feedback terms from facets, based on which we can estimate the time cost for the user.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/162790.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Docker网络原理详解

文章目录理解Docker0Docker 是如何处理容器网络访问的&#xff1f;Docker0网络模型图容器互联--Link自定义网络网络连通理解Docker0 查看本机IP ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00…

application.properties的作用

springboot这个配置文件可以配置哪些东西 官方配置过多了解原理 这个properties文件其实是可以删掉的&#xff0c;官方是不推荐使用这个文件的&#xff0c;可以将其换成安排application.yaml。名字不能变&#xff0c;因为SpringBoot使用的是一个全局的配置文件 application.…

linux系统中使用QT实现CAN通信的方法

大家好&#xff0c;今天主要和大家分享一下&#xff0c;如何使用QT中的CAN Bus的具体实现方法。 目录 第一&#xff1a;CAN Bus的基本简介 第二&#xff1a;CAN通信应用实例 第三&#xff1a;程序的运行效果 第一&#xff1a;CAN Bus的基本简介 从QT5.8开始&#xff0c;提供…

C语言-柔性数组与几道动态内存相关的经典笔试题(12.2)

目录 思维导图&#xff1a; 1.柔性数组 1.1柔性数组的特点 1.2柔性数组的使用 1.3柔性数组的优势 2.几道经典笔试题 2.1题目1 2.2题目2 2.3题目3 2.4题目4 写在最后&#xff1a; 思维导图&#xff1a; 1.柔性数组 1.1柔性数组的特点 例&#xff1a; #include <…

javaEE 初阶 — java对于的操作文件

文章目录1. File 类概述2. 代码示例2.1 示例1&#xff1a;以绝对路径为例&#xff0c;演示获取文件路径2.2 示例2&#xff1a;以相对路径为例&#xff0c;演示获取文件路径2.3 示例3&#xff1a;测试文件是否存在、测试是不是文件、测试是不是目录2.4 示例4&#xff1a;创建文件…

27.函数指针变量的定义, 调用函数的方法,函数指针数组

函数指针变量的定义 返回值类型&#xff08;*函数指针变量名&#xff09;&#xff08;形参列表&#xff09;; int( *p )( int , int );//定义了一个函数指针变量p&#xff0c;p指向的函数必须有一个整型的返回值&#xff0c;有两个整型参数。 int max(int x, int y) { } int m…

AMR-IE:一种利用抽象语义表示(AMR)辅助图编码解码的联合信息抽取模型

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 论文&#xff1a;2210.05958.pdf (arxiv.org) 代码&#xff1a;zhangzx-uiuc/AMR-IE: The code repository for AMR guided joint information extraction model (NAAC…

【学习笔记】【Pytorch】七、卷积层

【学习笔记】【Pytorch】七、卷积层学习地址主要内容一、卷积操作示例二、Tensor&#xff08;张量&#xff09;是什么&#xff1f;三、functional.conv2d函数的使用1.使用说明2.代码实现四、torch.Tensor与torch.tensor区别五、nn.Conv2d类的使用1.使用说明2.代码实现六、卷积公…

C/C++ noexcept NRVO

为什么需要noexcept为了说明为什么需要noexcept&#xff0c;我们还是从一个例子出发&#xff0c;我们定义MyClass类&#xff0c;并且我们先不对MyClass类的移动构造函数使用noexceptclass MyClass { public:MyClass(){}MyClass(const MyClass& lValue){std::cout << …

使用语雀绘制 Java 中六大 UML 类图

目录 下载语雀 泛化关系&#xff08;Generalization&#xff09; 实现关系&#xff08;Realization&#xff09; 关联关系&#xff08;Association&#xff09; 依赖关系&#xff08;Dependency&#xff09; 聚合关系&#xff08;Aggregation&#xff09; 组合关系&…

【Python学习】列表和元组

前言 前四天每天更新了小白看的基础教程 今天开始就更新一下&#xff0c;深入一点的知识点吧 还是老话&#xff1a;刚接触python的宝子可以点击文章末尾名片进行交流学习的哦 什么是列表和元组 列表是动态的&#xff0c;长度大小不固定&#xff0c;可以随意地增加、删减或…

【软件测试】软件测试基础2

1. 软件测试的生命周期 软件测试的生命周期&#xff1a; 需求分析→测试计划→ 测试设计、测试开发→ 测试执行→ 测试评估 ● 需求分析&#xff1a;站在用户的角度&#xff1a;查看需求逻辑是否正确&#xff0c;是否符合用户的需求和行为习惯&#xff1b;站在开发人员的角度&…

Nexus使用

环境 apache-maven-3.5.4nexus-3.38.1-01 Android开发中常用的maven代理地址 阿里云&#xff1a;http://maven.aliyun.com/nexus/content/groups/public/google&#xff1a;https://dl.google.com/dl/android/maven2/jcenter&#xff1a;https://jcenter.bintray.com/mavenC…

Leetcode:236. 二叉树的最近公共祖先(C++)

目录 问题描述&#xff1a; 实现代码与解析&#xff1a; 原理思路: 问题描述&#xff1a; 给定一个二叉树, 找到该树中两个指定节点的最近公共祖先。 百度百科中最近公共祖先的定义为&#xff1a;“对于有根树 T 的两个节点 p、q&#xff0c;最近公共祖先表示为一个节点 x&…

Xmanager7远程登录ubuntu20.04

Xmanager7远程登录ubuntu20.04 本文不介绍Xmanager7的下载和安装方法&#xff0c;详细内容可以参考【实用软件】Xmanager 7.0安装教程 - 哔哩哔哩 (bilibili.com)。关于Xmanager7远程登录的教材参考于 (149条消息) Xmanager远程桌面教程_周先森爱吃素的博客-CSDN博客_xmanage…

代码随想录第60天|84.柱状图中最大的矩形

84.柱状图中的最大的图形 总体思路&#xff1a;找到左右两个方向第一个小于该柱子高度的下标&#xff0c;用右下标-左下标-1得到该柱子高度对应的宽度w,再用宽度w*高度h得到面积&#xff0c;返回面积最大值 双指针法&#xff08;超时&#xff09; for循环判断左右第一个小于…

【NI Multisim 14.0虚拟仪器设计——虚拟仪器的引入】

目录 序言 前言 &#x1f349;知识点 一、虚拟仪器的引入 &#x1f34a;1.工具栏 &#x1f34a; 2.基本操作 ①仪器的选用与连接 ②仪器参数的设置 序言 NI Multisim最突出的特点之一就是用户界面友好。它可以使电路设计者方便、快捷地使用虚拟元器件和仪器、仪表进行电…

Linux云服务器下的gitee提交代码方法

目录 创建一个gitee仓库 gitee提交代码三板斧 1. git add 提交的文件 2. git commit -m "提交日志" 3. git push 可能存在的问题 .gitignore介绍 如何删除文件 创建一个gitee仓库 gitee提交代码三板斧 1. git add 提交的文件 作用&#xff1a;添加我…

Electron + React 应用打包全流程

&#xff08;第一次用 Typora 写博&#xff0c;希望效果不错~&#xff09; 这几天有个创意编程比赛&#xff0c;要写一个电脑端应用。我准备用 React.js Electron 做&#xff08;因为熟悉~&#xff09;&#xff0c;编程部分一路风雨无阻&#xff0c;到了打包却出现了问题。El…

多轮对话(一):概述(意图识别+槽填充)

一、对话系统 基于流水线的面向任务的对话系统包含了四个关键部分&#xff1a; 语言理解。它被称为自然语言理解&#xff08;NLU&#xff09;&#xff0c;它把用户话语解析为预定义的语义槽。对话状态跟踪器。它管理每一轮的输入与对话历史&#xff0c;输出当前对话状态。对话…