文章目录
- Qulac
- qulac.json:
- qulac_hist012_dict.tar.gz:
- MIMICS
- ClariQ
- ConvAI3 Data Challenge
- Stage1: initial dataset
- Stage2: human-in-the-loop
- ClariQ Dataset
- File Format
- train.tsv and dev.tsv
- test.tsv
- question_bank.tsv
- dev_synthetic.pkl.tar.gz & train_synthetic.pkl.tar.gz
- single_turn_train_eval.pkl & multi_turn\_***_evla.pkl.tar.gz
- top10k_docs_dict.pkl.tar.gz
- train.qrel & dev.qrel
Qulac
aliannejadi/qulac: Qulac: A dataset on asking Questions for Lack of Clarity in open-domain information-seeking conversations. (github.com)
qulac.json:
qulac.json
contains the topics, facets, questions, and answers. This is the main file of Qulac. However, it may not be very straightforward to use this file for experiments directly. That is why we have provided some auxiliary data files which we describe in this document. In the qulac.json
file, you will find these fields:
topic_id
: the ID of the topic in TREC Web Track.facet_id
: the ID of the facet in TREC Web Track.topic_facet_id
: an ID corresponding to a topic and facet pair in the following format:%d-%d
. For example,21-1
corresponds to the first facet (facet_id
=1) of the 21st topic in TREC Web Track data.topic_facet_question_id
: an ID corresponding to a topic, facet, and question triplet in the following format:%d-%d-%d
. For example,21-1-5
corresponds to the fifth question of the first facet of the 21st topic. Each row of the data is identified by this ID.topic
: the TREC topic (query).topic_type
: anstr
value indicating the type of a topic. Possible values arefaceted
andambiguous
.facet_type
: anstr
value indicating the type of a facet. Possible values areinf
(i.e., informational) andnav
(i.e., navigational).topic_desc
: a full description of the topic as it appears in the TREC Web Track data.facet_desc
: a full description of the facet (information need) as it appears in the TREC Web Track data.question
: a clarifying question that the system can pose to the user for the current topic and facet.answer
: an answer to the clarifying question, assuming that the user is in the context of the current row (i.e., the user’s initial query istopic
, their information need isfacet
, andquestion
has been posed to the user).
topic_id | facet_id | topic_facet_id | topic_facet_question_id | topic | topic_type | facet_type | topic_desc | facet_desc | question | answer |
---|---|---|---|---|---|---|---|---|---|---|
193 | 2 | 193-2 | 193-2-5 | dog clean up bags | faceted | inf | Can I order dog clean-up bags online? | Are there biodegradable products for the dispo… | are you looking for a way to dispose your dog … | im looking for dog waste bags that are biodegr… |
144 | 2 | 144-2 | 144-2-5 | trombone for sale | ambiguous | inf | information on where I could buy a new or used… | good places to sell a used trombone | are you looking for a place to sell a used tro… | yes |
78 | 3 | 78-3 | 78-3-7 | dieting | ambiguous | inf | Find “reasonable” dieting advice, that is no… | Find crash diet plans that promise quick weigh… | do you want to know if dieting is safe | i would like to know more on quick and safe di… |
qulac_hist012_dict.tar.gz:
qulac_hist012_dict.tar.gz
can be used for experiments involving multi-turn conversations. As we have mentioned in [1], the conversations are artificially generated following the data that is available in qulac.json
. Hence, the structure of the dict
is as follows (after decompression):
{ <record_id>:
{
'history_id': <the ID of conversation history (context)>,
'history_list': [
{ 'question': <question1 string>,
'answer': <answer1 string> },
{ 'question': <question2 string>,
'answer': <answer2 string> },
{ 'question': <question2 string>,
'answer': <answer2 string> },
],
'query': <query (topic) string>,
'question': <current question string>,
'answer': <current answer string>
}
....
}
-
Record ID
:topic_id - facet_id - past_question_id_1 - past_question_id_2 - current_question_id - answer_flag
- The flag is used to indicate whether the record is referring to the results that are obtained with (=1) or without (=0) final answer
'18-2-1-2-10-1': {
'history_id': '18-2-1-2',
'history_list': [{'answer': 'no i just want to find spreadsheets and templates',
'question': 'are you interested in a service for wedding budgeting'},
{'answer': 'yes i want to find some spreadsheets to help me budget',
'question': 'are you looking for advice on wedding budgeting'}],
'query': 'wedding budget calculator',
'question': 'what is your projected budget for your wedding',
'answer': 'i need to find a spreadsheet to figure it out'},
'25-1-3-8-1' : {
'history_id': '25-1-3',
'history_list': [{'answer': 'no i am looking for information on the greek mathematician euclid',
'question': 'do you need directions to euclid ave'}],
'query': 'euclid',
'question': 'do you want to know related people',
'answer': 'no i only want to know about one particular person'}
MIMICS
microsoft/MIMICS: MIMICS: A Large-Scale Data Collection for Search Clarification (github.com)
Each clarification in MIMICS consists of a clarifying question and up to five candidate answers
query | headaches |
---|---|
question | What do you want to know about this medical condition? |
candidate answers (options) | symptom, treatment, causes, diagnosis, diet |
MIMICS contains three datasets:
-
MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks).
[‘#HASH#value excel’, ‘What version of Excel are you looking for?’, ‘2010’, ‘2013’, ‘2016’, ‘’, ‘’, ‘medium’, ‘0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’]
[‘%2f’, ‘What language are you looking for?’, ‘javascript’, ‘python’, ‘’, ‘’, ‘’, ‘medium’, ‘0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’]
[‘.net’, ‘Select one to refine your search’, ‘powershell .net’, ‘iis .net’, ‘windows .net’, ‘sql .net’, ‘exchange .net’, ‘high’, ‘0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’]
[‘.net 3.5 framework’, ‘Select one to refine your search’, ‘windows’, ‘powershell’, ‘xml’, ‘azure’, ‘json’, ‘high’, ‘3’, ‘0.8571428571428572’, ‘0.0’, ‘0.0’, ‘0.14285714285714285’, ‘0.0’]
-
MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes.
Column(s) Description query (string) The query text. question (string) The clarifying question. option_1, …, option_5 (string) Up to five candidate answers. impression_level (string) A three-level impression label (i.e., low, medium, or high). engagement_level (integer) A label in [0, 10] representing total user engagements. option_cctr_1, …, option_cctr_5 (real) The conditional click probability on each candidate answer. [‘0 degrees’, ‘Select one to refine your search’, ‘celsius’, ‘kelvin’, ‘fahrenheit’, ‘’, ‘’, ‘medium’, ‘0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’]
[‘0 degrees’, ‘Select one to refine your search’, ‘fahrenheit’, ‘celsius’, ‘kelvin’, ‘’, ‘’, ‘medium’, ‘4’, ‘1.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’]
[‘0 degrees’, ‘Select one to refine your search’, ‘boots for 0 degrees’, ‘gloves for 0 degrees’, ‘’, ‘’, ‘’, ‘medium’, ‘0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’, ‘0.0’] -
MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer.
Column(s) Description query (string) The query text. question (string) The clarifying question. option_1, …, option_5 (string) Up to five candidate answers. question_label (integer) A three-level quality label for the clarifying question options_overall_label (integer) A three-level quality label for the candidate answer set option_label_1, …, option_label_5 (integer) The conditional click probability on each candidate answer.
[‘multiple system atrophy’, ‘What do you want to know about this medical condition?’, ‘symptom’, ‘treatment’, ‘causes’, ‘diagnosis’, ‘diet’, ‘2’, ‘2’, ‘2’, ‘2’, ‘2’, ‘2’, ‘2’]
[‘team fortress 2’, ‘What would you like to know about this game?’, ‘team fortress 2 steam’, ‘team fortress 2 mods’, ‘team fortress 2 gameplay’, ‘team fortress 2 cheats’, ‘’, ‘1’, ‘2’, ‘2’, ‘2’, ‘2’, ‘2’, ‘’]
[‘google chrome exe’, ‘Select one to refine your search’, ‘64 bit’, ‘32 bit’, ‘’, ‘’, ‘’, ‘’, ‘2’, ‘2’, ‘2’, ‘’, ‘’, ‘’]
[‘google chrome exe’, ‘Select one to refine your search’, ‘32 bit’, ‘64 bit’, ‘’, ‘’, ‘’, ‘’, ‘2’, ‘2’, ‘2’, ‘’, ‘’, ‘’]
ClariQ
ConvAI3 Data Challenge
ClariQ
is a part of this challenge.
The challenge ran in two stages:
- stage1: participants were provided with a static dataset consisting mainly of an initial user request, clarifying question and user answer
- stage2: human-in-the-loop
Stage1: initial dataset
The dataset consist of:
- User Request: an initial user request in the conversational form with a label reflects if is needed ranged from 1 to 4
- 1: don’t need any clarification
- 4: need clarification (must)
- Clarification question: a set of possible clarifying questions
- User Answers: each questions is supplied with a user answer
Stage2: human-in-the-loop
Enables the top-performing teams of the first stage to evaluate their models with the help of human evaluators. We evaluate the performance of a system in two aspects:
- how much the conversation can help a user find the information they are looking for
- how natural and realistic does the conversation appear to a human evaluator
ClariQ Dataset
aliannejadi/ClariQ: ClariQ: SCAI Workshop data challenge on conversational search clarification. (github.com)
Feature | Value |
---|---|
# train (dev) topics | 187 (50) |
# faceted topics | 141 |
# ambiguous topics | 57 |
# single topics | 39 |
# facets | 891 |
# total questions | 3,929 |
# single-turn conversations | 11,489 |
# multi-turn conversations | ~ 1 million |
# documents | ~ 2 million |
File Format
train.tsv and dev.tsv
They have the same format, contain topics, facets, questions, answers and clarification need labels.
topic_id
: the ID of the topic (initial_request
).initial_request
: the query (text) that initiates the conversation.topic_desc
: a full description of the topic as it appears in the TREC Web Track data.clarification_need
: a label from 1 to 4, indicating how much it is needed to clarify a topic.facet_id
: the ID of the facet.facet_desc
: a full description of the facet (information need) as it appears in the TREC Web Track data.question_id
: the ID of the question as it appears inquestion_bank.tsv
.question
: a clarifying question that the system can pose to the user for the current topic and facet.answer
: an answer to the clarifying question, assuming that the user is in the context of the current row (i.e., the user’s initial query isinitial_request
, their information need isfacet_desc
, andquestion
has been posed to the user).
topic_id | initial_request | topic_desc | clarification_need | facet_id | facet_desc | question_id | question | answer |
---|---|---|---|---|---|---|---|---|
14 | I’m interested in dinosaurs | I want to find information about and pictures of dinosaurs. | 4 | F0159 | Go to the Discovery Channel’s dinosaur site, which has pictures of dinosaurs and games. | Q00173 | are you interested in coloring books | no i just want to find the discovery channels website |
14 | I’m interested in dinosaurs | I want to find information about and pictures of dinosaurs. | 4 | F0159 | Go to the Discovery Channel’s dinosaur site, which has pictures of dinosaurs and games. | Q03021 | which dinosaurs are you interested in | im not asking for that i just want to go to the discovery channel dinosaur page |
test.tsv
only contains the list of test topics, as well as their ID’s.
topic_id | initial_request |
---|---|
201 | I would like to know more about raspberry pi |
202 | Give me information on uss carl vinson. |
question_bank.tsv
Constitutes of all the questions in the collection. The TSV file has two columns: question_id
, question(txet)
question_id | question |
---|---|
Q00001 | |
Q02318 | what kind of medium do you want this information to be in |
Q02319 | what kind of penguin are you looking for |
Q02320 | what kind of pictures are you looking for |
Note: selecting
Q00001
means selecting no question
dev_synthetic.pkl.tar.gz & train_synthetic.pkl.tar.gz
These files contain dict
s of synthetically built multi-turn conversations (up to three turns).
{<record_id>: {'topic_id': <int>,
'facet_id': <str>,
'initial_request': <str>,
'question': <str>,
'answer': <str>,
'conversation_context': [{'question': <str>,
'answer': <str>},
{'question': <str>,
'answer': <str>}],
'context_id': <int>},
...
}
where
<record_id>
is anint
indicating the ID of the current conversation record.- While in the
dev
set there exists multiple<record_id>
values per<context_id>
, in thetest
file there would be only one.
- While in the
'topic_id'
,'facet_id'
, and'initial_request'
indicate the topic, facet, and initial request of the current conversation, according to the single turn dataset.'question'
: current clarifying question that is being posed to the user.'answer'
: user’s answer to the clarifying question.'conversation_context'
identifies the context of the current conversation. A context consists of previous turns in a conversation. As we see, it is a list of'question'
and'answer'
items. This list tells us which questions have been asked in the conversation so far, and what has been the answer to them.'context_id'
is the ID of the conversation context. Basically, participants should predict the next utternace for eachcontext_id
.
2288: {'topic_id': 8,
'facet_id': 'F0969',
'initial_request': 'I want to know about appraisals.',
'question': 'are you looking for a type of appraiser',
'answer': 'yes jewelry',
'conversation_context': [],
'context_id': 969},
1570812: {'topic_id': 293,
'facet_id': 'F0729',
'initial_request': 'Tell me about the educational advantages of social networking sites.',
'question': 'which social networking sites would you like information on',
'answer': 'i don have a specific one in mind just overall educational benefits to social media sites',
'conversation_context': [{'question': 'what level of schooling are you interested in gaining the advantages to social networking sites',
'answer': 'all levels'},
{'question': 'what type of educational advantages are you seeking from social networking',
'answer': 'i just want to know if there are any'}],
'context_id': 976573}
single_turn_train_eval.pkl & multi_turn_***_evla.pkl.tar.gz
These files are dict
s of pre-computed document relevance results after asking each question
{ <evaluation_metric>:
[
<context_id>:
{
<question_id> :
{
'no_answer': <float>,
'with_answer': <float>
}
, ... ,
'MAX':
{
'no_answer': <float>,
'with_answer: <float>
},
'MIN':
{
'no_answer: <float>,
'with_answer: <float>
}
}
]
...
}
MAX
andMIN
: These refer to the maximum and minimum performance that the retrieval model achieves by asking the “best” and “worst” questions among the candidate questions.
top10k_docs_dict.pkl.tar.gz
A dict
consisting of a list
of document ID’s for a given topic_id
, this dict
is useful for having the list of top 10,000 documents as an initial ranking.
train.qrel & dev.qrel
These files contain the relevance assessments of ClueWeb09 and ClueWeb12 collections for every facet in the train and dev sets, respectively
<facet_id> 0 <document_id> <relevance_score>
F0001 0 clueweb09-en0038-74-08250 1
F0001 0 clueweb09-enwp01-17-11113 1
F0002 0 clueweb09-en0001-02-21241 1
F0002 0 clueweb09-en0006-52-11056 1