0% found this document useful (0 votes)

37 views31 pages

Retrieving Data Tables Through Conversations

Uploaded by

tasmia.nova3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views31 pages

Retrieving Data Tables Through Conversations

Uploaded by

tasmia.nova3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

The StatCan Dialogue Dataset: Retrieving Data Tables

through Conversations with Genuine Intents

Xing Han Lu 1 2† Siva Reddy 1 2 3 Harm de Vries 2
1
Mila Quebec AI Institute, McGill University
2
ServiceNow Research
3
Facebook CIFAR AI Chair
statcan.dialogue.dataset@mila.quebec

Abstract U1 : Hi, I’m looking to obtain quarterly data in regards

to GDP grow (Canada), BC Housing STarts, Canada
We introduce the StatCan Dialogue Dataset1 Oil Price/BBL
A1 : Hello, my name is Kelly C. Give me one moment as
consisting of 19,379 conversation turns be- I search [...]
tween agents working at Statistics Canada and A1 : For GDP growth rates, please consult the following
online users looking for published data tables. link: [...]
The conversations stem from genuine intents, A1 : What do you mean by BC Housing Starts?
are held in English or French, and lead to agents
U2 : I’m required to research all of the housing starts for
retrieving one of over 5000 complex data tables. BC on a quarterly basis [...]
Based on this dataset, we propose two tasks: U2 : Housing starts are the number of new residential
(1) automatic retrieval of relevant tables based construction projects that have begun during any par-
on a on-going conversation, and (2) automatic ticular month [...]
generation of appropriate agent responses at A2 : I would have monthly data regarding new building
permits being issued [...]
each turn. We investigate the difficulty of each A2 : Building permits, by type of structure and type of
task by establishing strong baselines. Our ex- work: https[...]
periments on a temporal data split reveal that A2 : I’ll have a look for oil prices. One moment.
all models struggle to generalize to future con-
versations, as we observe a significant drop in U3 : Do you also have data to Canada’as oil Price/BBL
("WTI")? [...]
performance across both tasks when we move A3 : Are you looking for the retail prices of oil?
from the validation to the test set. In addition, A3 : If so, I found some data for smaller geographies.
we find that response generation models strug- A3 : Monthly average retail prices for gasoline and
gle to decide when to return a table. Consider- fuel oil, by geography (https[...])
ing that the tasks pose significant challenges to A3 : [...] Would those geographies be enough?
A3 : Or are you looking for Canada only?
existing models, we encourage the community
to develop models for our task, which can be U4 : [...] I would need something that pertains more to all
directly used to help knowledge workers find of canada
relevant tables for live chat users. A4 : What about this? Monthly average retail prices for
food and other selected products (https[...])

1 Introduction Table 1: An example of the StatCan Dialogue Dataset

in which a user (U) talks to a StatCan agent (A) to find
One of the longstanding goals in Natural Lan- a number of data tables. Text in bold indicates the title
guage Processing (NLP) is to develop conversa- of a table retrieved by the agent.
tional agents that assist people with concrete tasks,
such as finding information in large collections of natural use cases of task-oriented dialogue agents
documents or booking restaurants and hotels. To and collect conversations with a large number of
aid the development of such virtual assistants, the real users. Many labs have designed artificial tasks
research community is in need of benchmarks that and collected conversations from crowd workers
reflect the intents and linguistic phenomena found with simulated intents (Budzianowski et al., 2018;
in real-world applications. However, developing Adlakha et al., 2022; Lee et al., 2022), often lead-
such real-world conversational datasets is challenging to datasets that do not capture the linguistic
ing in the current research landscape. On the one challenges of production settings (de Vries et al.,
hand, academic labs often struggle to come up with 2020). On the other hand, industry labs might have
†
Work done as visiting researcher at ServiceNow Research access to users with genuine intents (e.g., through
1
Website: mcgill-nlp.github.io/statcan-dialogue-dataset Siri or Alexa) but rarely release such conversational
2799
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2799–2829
May 2-6, 2023 ©2023 Association for Computational Linguistics
Dataset Intent Dialogue Query Result(s) Source(s) Lang.
Our work Genuine ✓ Question, Request Table link, Dial. act StatCan En, Fr
NQ (2019) Mixed ✗ Question Span excerpt Google, Wiki. En
DuReader (2018) Mixed ✗ Question Span excerpt Baidu Zh
OTT-QA (2020) Simulated ✗ Question Table/Span excerpt Wikipedia En
TAPAS-NQ (2021) Mixed ✗ Question Table excerpt Google, Wiki. En
CoQA (2019) Simulated ✓ Question Span excerpt* Multiple En
QuAC (2018) Simulated ✓ Question Dial. act Wiki. En
ATIS (1990) Genuine ✓ Request SQL query, Command TI Corp. En
SGD-X (2022) Simulated ✓ Request API call, Dial. act Dial. Simulator En

Table 2: Comparison with related datasets (see Section 2). (*) CoQA uses rationales to support extracted answers.

datasets due to their commercial value and user pri- dialogue history. For both tasks, we investigate
vacy concerns. Hence, we argue that the research its difficulty by establishing strong baselines and
community would benefit from a task-oriented dia- evaluating them on various metrics.
logue environment where findings can be validated We stress that both tasks are immediately useful
with real users, and, to that effect, present a unique in a real-world setting. The table retrieval task can
dataset in collaboration with Statistics Canada. help agents find relevant tables faster while the re-
Statistics Canada (StatCan) is a national statis- sponse generation task may lead to a virtual agent
tics agency commissioned with collecting key in- that can return relevant tables through an online
formation on Canada’s economy, society, and envi- conversation. We hope that this tight connection
ronment. Statistics Canada conducts hundreds of with a real-world scenario will bring the research
surveys on virtually all aspects of Canadian life and community more insight into the challenges of de-
publishes the resulting data tables on statcan.gc.ca. veloping practical dialogue agents and lead to faster
This website currently features 5K+ of such com- transfer of research ideas and findings.
plex and often large data tables. Canadian citizens—
and other interested individuals—come to this web- 2 Related Work
site to find the statistics they are looking for. The This section presents various directions related to
StatCan website offers a chat functionality (avail- our work. See Table 2 for a comparative summary.
able in English and French) to help users in case
they can not find the appropriate information. Open-domain QA This is the task of answer-
ing questions using a large and diverse collection
Sourcing from these live chats, we present the
of text documents. One of the first large-scale
StatCan Dialogue Dataset, a collection of 20K+
evaluations in open-domain QA was presented
English and French conversations between visitors
at TREC-8 (Voorhees, 2001). Since then, many
of statcan.gc.ca and agents working at Statistics
studies have released large-scale open-domain QA
Canada. Before releasing this dataset, StatCan has
datasets: WikiQA (Yang et al., 2015) and MS
ran several procedures to remove Personally Identi-
MARCO (Bajaj et al., 2018) source questions from
fiable Information (PII). While we observe a wide
the Bing search engine, Natural Questions (NQ)
variety of user intents, ranging from table manip-
(Kwiatkowski et al., 2019) from Google search,
ulation to navigation instructions, a large number
and DuReader (He et al., 2018) source questions
of visitors use the chat functionality to find data
in Chinese from Baidu. The questions come from
tables on the StatCan website. Specifically, we ob-
real users and the answers are collected from the
serve 6.6K instances where agent returns a link to
search results through crowd workers. Although
a data table across 4.4K conversations. In Table 1,
those datasets have questions with genuine intent
we provide an example conversation in which an
and the answer must be retrieved from a collection
online user is looking for specific data tables.
of documents, our dataset emphasizes the retrieval
In this work, we develop two novel tasks cen-
of tables (in a conversational setting) rather than
tered on helping users find specific tables. First, we
free-form documents.
introduce the table retrieval task, which requires
a model to predict the table returned by the agent Table retrieval and QA Following works on tab-
given the messages sent so far. Second, we intro- ular pre-training (Yin et al., 2020), table-to-text
duce the response generation task, which requires generation (Parikh et al., 2020) and weak super-
a model to predict the agent’s response given the vision for semantic parsing (Herzig et al., 2020),
2800
Dataset Train Valid Test All
split train valid test
# Conv. 2573 545 557 3675
# Turns 11382 2339 2600 16321 8
# Messages 36147 7385 8340 51872 6

percent
# Queries 3782 799 870 5451
# Tokens / Msg 32.83 33.51 29.32 32.36 4
# Turns / Conv. 4.42 4.29 4.67 4.44
# Msg / Conv. 14.05 13.55 14.97 14.11 2
# Queries / Conv. 1.48 1.47 1.57 1.49 0
# Tables 778 349 388 959 20 40 60
# New tables 0 41 145 181 messages
# Dims / Table 3.5 3.5 3.6 3.6
# Mbrs / Table 185.5 210.8 175.6 172.1
# Notes / Table 21.1 22.7 23 20.4 Figure 1: Histogram of messages by conversation in the
both tasks (French split in Figure 7).
Table 3: Statistics of English conversations and tables
in the retrieval and generation tasks. New tables are specific goals, such as booking a restaurant or
calculated with respect to training set (see Table 11). resolving a customer issue. While early work
has collected a dataset in a genuine information
Chen et al. (2020) and Herzig et al. (2021) respec- seeking setup (Hemphill et al., 1990), many
tively propose OTT-QA and TAPAS-NQ, two novel recent datasets has collected them through a
approaches that extend open-domain QA to retriev- simulated setup (Budzianowski et al., 2018;
ing tables instead of documents. The former col- Rastogi et al., 2020; Feng et al., 2020, 2021;
lects both the questions and answers from crowd Chen et al., 2021; Lee et al., 2022). Task-
workers and the latter extends Natural Questions oriented models usually track the dialogue
by using tables from the article where the answer state by predicting dialogue acts that are spec-
was taken. In both cases, the tables being retrieved ified through intents and slot-value pairs, e.g.,
are sourced from Wikipedia articles. Although our findRestaurants(cuisine=Italian).
data also incorporate tabular retrieval, the tables While our dataset does not provide turn-based
are sourced from statcan.gc.ca, they can be signif- annotations, the released conversations come with
icantly larger (as discussed in Appendix A.2.1), an annotated goal i.e., which data table the user
and they are being retrieved in an interactive and was looking for. Like other goal-oriented dialogue
conversational setting. tasks, this annotation enables us to automatically
Conversational QA Several works extended evaluate the dialogue models through a task
question answering to the conversational setting. completion metric.
CoQA (Reddy et al., 2019) and QuAC (Choi et al., Chit-chat Dialogue The goal for chit-chat sys-
2018) introduced datasets in which multiple rounds tems is to engage in a open-ended conversation
of questions are asked about a reference passage with an end-user (Lowe et al., 2015; Dinan et al.,
taken from a document (such as a Wikipedia arti- 2018). Unlike our dataset, such conversations do
cle). Subsequent works extended this setup to an not intend to assist the user with a specific task.
open-domain setting where the reference passage
is not known beforehand (Qu et al., 2020; Anantha 3 Dataset
et al., 2021; Adlakha et al., 2022). Saeidi et al.
The StatCan Dialogue Dataset consist of conversa-
(2018) proposed a conversational QA task about
tions collected from the live chat between March 1,
regulatory texts. Aforementioned datasets are all
2019 till March 8, 2021. Although a variety of user
structured in the same way: at every turn, the first
intents can be found in the broader dataset of over
speaker will ask a question, and the other speaker
25K conversations, we focus on a single intent by
will give an answer. In contrast, the queries in our
selecting all conversations where the agent returns
conversations are not restricted to questions, and
a data table. We use this subset to develop and test
the answers can be either a table, metadata, or a
models for the two tasks that we introduce in Sec-
dialogue act.
tion 4. In Section 3.1, we provide basic statistics
Task-oriented Dialogue Our work is related about this subset of the data and present a dialogue
to work on task-oriented dialogue where users analysis for a small number of conversations in Sec-
converse with virtual agents to accomplish tion 3.2. In Section 3.3, we turn our attention to the
2801
Merged Acts (Example) % Title: Production and value of maple products
Answer (You can obtain on our...) 50 Date range: 1924-01-01 to 2020-01-01
Request (...please help me retrieve data...) 31 Dimensions: Geography, Maple products
Time Mgmt (Please hold ) 28 Subject: Agriculture
Inform (Please take note that...) 63 Survey: Maple Products
Info Seeking Ques. (Do you have any other...?) 25 Frequency: Annual
Promise (...please contact the Education Ministry...) 18
Auto Feedback (Sure) 25 Table 5: Basic information (including title) of table
Offer (...how may I help you?) 11
Instruct (Select at least one...) 18 in Section 3.3. Full version in Table 20. This can be
Clarif. Ques. (Which of these lines would direct...) 16 accessed at doi.org/10.25318/3210035401-eng.
Greeting (Hi) 28
Self Introduction (My name is...) 17
Thanking (Thanks a lot!) 47 per conversation, with over 30 tokens for each mes-
Accept Thanking (you’re welcome) 15
sage. This indicates that the speakers will express
Table 4: Frequency of merged speech acts occurring in multiple sequential thoughts before the addressees
100 turns in conversations from the validation set. respond. For the French split, we analyzed the
basic statistics in Appendix A.5.

data tables and explain what kind of information Frequently requested tables In total, 6 tables
is available for them. Finally, we explain how the make up 13.4% of tables retrieved, covering sub-
dataset is split into a train, validation, and test set in jects like inflation and household spending. Supple-
Section 3.4. For technical specifications, a dataset mentary details can be found in Appendix A.3.1.
card is provided in Appendix B.
3.2 Dialogue Analysis
User intents The live chat was designed to fulfill We categorize 100 turns (306 messages) from 24
specific user intents. The main intent of the chat conversations in the English validation set accord-
functionality is to help users find specific data ing to the speech acts defined by Bunt et al. (2010,
tables. For example, in Table 1, the agent helps 2020), which is also known as ISO standard 24617-
the user find tables about building permit, gaso- 2. We follow their taxonomy but merge some fine-
line price, and retail prices for food. Although, grained acts with their broader concepts (e.g., cor-
users might also be interested in obtaining meta- rection, agreement, disagreement with inform). We
information, receive help in manipulating a table present the speech act frequencies and examples in
or with the user interface. In some cases, the user Table 4. See the Appendix for more information on
will make out of domain requests. Those auxil- how we merged the original acts and supplementary
iary intents are described in Appendix A.1 since examples (Table 15 and Table 22, respectively).
the focus of this work is on the main intent. We notice that answers appear twice as frequent
as information seeking questions because an inter-
Messages and turns Each conversation is bro- locutor may provide an answer to both clarification
ken down in turns, which is a pair of user-agent questions and requests. Additionally, inform acts
responses. Each response can have multiple mes- appears 63% of the time because agents need to
sages sent sequentially (e.g., in Table 1, the first expand upon their answer and users tend to clarify
agent response contains 3 back-to-back messages). their initial requests by informing the other. Al-
though less frequent, auto feedback and time man-
3.1 Basic statistics agement are still relevant because interlocutors can-
In total, 25397 conversations will be made avail- not rely on visual feedback like nodding or typing.
able. Based on our main intent, we focus on a sub- Naturally, time management often co-occurs with
set of 4468 (3675 in English and 793 in French). promises because the agent tends to put the user on
Out of a total of 5907 tables available in both En- hold while promising to fulfill their request.
glish and French, the agents returned 959 unique
tables in English 285 in French. The number of 3.3 Table specifications
messages by conversation varies between 2 and To explain the specifications, we examine a sam-
78 with a median of 12 for the English split (see ple table with title Production and value of maple
Figure 1 for the distribution). Based on Table 3, products (shown in Table 5). The table has two
there’s on average 4.4 turns but 14.12 messages dimensions, which are groups of member items;
2802
Subject Agriculture 4 Tasks
Based on the conversational and tabular data, we
Honey Maple
Survey
Production...
...
Products propose two tasks: (i) a table retrieval task, which
requires a model to use a partial conversation to
Production and
Table ... value of maple predict the table an agent will return, and (ii) a re-
products
sponse generation task, which requires a model to
Dimension Geography
Maple use a partial conversation to generate the most prob-
products
able response by the agent. The conversations in
Member
Maple products Gross value of the tasks are available in both English and French.
Canada expressed as maple
items syrup products
4.1 Retrieval task
Level Ontario Quebec
...
For this task, we truncate every conversation right
before a link to a relevant table is shared by the
Figure 2: Diagram of the hierarchical relationship be- agent. As a result, the product ID (PID) corre-
tween metadata components, discussed in Section 3.3.
sponding to that link becomes the objective of the
retrieval task, as shown in Table 6. When the agent
for geography, the members are the provinces pro- shares multiple non-repeating PIDs within a con-
ducing maple syrup (Quebec, Ontario, etc.), and versation, each unique occurrence is treated as a
for Maple products the members are the produc- separate sample.
tion types (maple sugar, syrup, taffy and butter).
A member item generalizes the concept of rows Recall@k To evaluate models for retrieval, we
and columns as they are interchangeable via pivot- compute the recall at k (R@k) score for k ∈
ing. Sometimes, details about those members are {1, 10, 20}, which corresponds to the rate where
provided as footnotes at the end the page. the correct table is among the k tables retrieved by
the model (usually ranked by a relevance score).
Basic Information This is the core metadata and We choose k = 1 for real-time automatic retrieval
consists of the title, dimensions, subject, survey and k ∈ {10, 20} for scenarios where humans or
and update frequency (member items are excluded). automatic rerankers would like to use the retriever
Supplementary details are in Appendix A.2.2. to query tables and select the best option.
Hierarchical relation The metadata can be
4.2 Response generation task
viewed hierarchically. As shown in Figure 2, each
subject encompasses different surveys, each survey In the first task, only the messages leading to a
can be used to generate one or more tables, and table retrieval are considered. For this task, each
so on. A member item that can be nested under message sent by an agent is considered as a target
another member item is called Level. and everything before is the source. Thus, the goal
of this task is to use the source text to generate a re-
3.4 Dataset splits sponse that matches the target (see Table 7). Since
We group the conversations into a train (70%), a dialogue responses are challenging to evaluate, we
validation (15%) and a test (15%) set. The test set report a wide variety of metrics for this task.
was specifically selected to be the most recent con- ROUGE-L and METEOR ROUGE-L (Lin,
versations by date (covering Sept 8, 2020 to Mar 8, 2004; Lin and Och, 2004) is a common text eval-
2021), whereas the training and validation set were uation metric which naturally takes into account
randomly selected from the remaining data (cover- sentence level structure by identifying the longest
ing Mar 1, 2019 to Sept 8, 2020). This lets us test a overlapping word sequence between two sentences.
model’s capability to adapt to temporal shifts in the METEOR (Banerjee and Lavie, 2005) is a word-
data (such as new data releases and novel events). level precision and recall scoring method that en-
This is useful to understand a model’s capability to compasses different ways to represent a word, in-
generalize beyond the training distribution, but it is cluding stems and synonyms.
also a better reflection of real-world applications of
a model (which will be used for future data). The BERTScore and MoverScore Various methods
same splits are used for all tasks. were developed to leverage contextual embeddings
2803
Source text (on-going conversation) Source text (on-going conversation)
[...] U: Hi, I’m looking to obtain quarterly data in regards
A: What do you mean by BC Housing Starts? to GDP grow (Canada), BC Housing STarts, Canada
Oil Price/BBL
U: I’m required to research all of the housing starts for A: Hello, my name is Kelly C. Give me one moment as
BC on a quarterly basis[...] I search [...]
U: Housing starts are the number of new residential A: For GDP growth rates, please consult [...]
construction projects that have begun during any par- U: I’m required to research all of the housing starts for
ticular month BC on a quarterly basis [...]
A: [...] I would have monthly data regarding new build-
ing permits being issued. Generation target (next response by agent)
A: I would have monthly data regarding new building
Retrieval target (StatCan table) permits being issued. [...]
Table 34-10-0066 (Building permits, by type of struc-
ture and type of work) Table 7: Source and targets of the response generation
task, based on Table 1. Given the on-going conversation,
Table 6: Source and targets of the retrieval task, based the goal is to generate the agent’s response.
on Table 1. Given the on-going conversation, the goal
is to retrieve a StatCan table.
DPR Proposed by Karpukhin et al. (2020), Dense
Passage Retrieval (DPR) is a pair of transformer
from BERT (Devlin et al., 2019) to evaluate simi- models that separately encode a query and a pas-
larity between two sentences. BERTScore (Zhang sage, and the dot product of the resulting vectors
et al., 2019) computes the cosine similarity at the to- will have a higher score if the passage is relevant
ken level, whereas MoverScore (Zhao et al., 2019) to the query. We finetune this model to retrieve the
computes the earth mover distance (Rubner et al., metadata of a table (passage) given the on-going
2000) at the word or sentence level, thus capturing conversation (query).
the cost of transforming the distribution of the gen-
erated responses into the distribution of the original TAPAS and TAPAS-NQ Herzig et al. (2020) in-
responses. troduced a model that learned to encode flattened
tables cells in a self-supervised manner during pre-
Title accuracy In addition to the general met- training. We finetuned it to retrieve the truncated
rics for text generation, we also explored this task- content of a table given an on-going conversa-
specific metric. We define it as the proportion of tion. Subsequently, Herzig et al. (2021) finetuned
generated messages that contain the title of a table TAPAS to perform open-domain table retrieval on
shared in the reference messages. Consequently, 12K questions-answer-table triplets extracted from
this metric only includes turns where a table is NQ; we further finetune this variant in the same
shared by an agent. To compute this, we (i) find way and report the results as TAPAS-NQ.
the product ID in the reference message, (ii) look
up the title, (iii) check if that title appears exactly Exploring table representation In the simplest
in the generated and reference text. scenario, only the title is given to BM25 and DPR.
Moreover, we evaluate variants that encode the
5 Models basic information, member items, footnotes, or a
combination of them. For TAPAS and TAPAS-NQ,
To help understand the performance of finetuned we also finetuned a variant that retrieves the title,
models on our tasks, this section presents an dimensions and member items, since the original
overview of the methods for the English splits, TAPAS could attend titles and column names.
whereas the implementation details are covered
5.2 Response generation
in Appendix A.4. Similar architectures were used
for French (described in Appendix A.5). T5 We finetuned the large variant of T5 (Raffel
et al., 2020) (named No aug. in Table 10) to auto-
5.1 Retrieval regressively decode the target (agent reply) after
first encoding the source (on-going conversation).
BM25 We use Robertson and Zaragoza (2009)’s
algorithm to retrieve the metadata of a table (pas- Augmenting T5 with top-k title(s) For every
sage) similar to a given query by weighting the partial conversation, we use DPR (basic+member)
idf -scaled term frequency of query words with re- to retrieve the top-k tables (where k ∈ {1, 5}), and
spect to the passages. append their titles to the partial conversation. This
2804
Metadata R@1 R@10 R@20 Metrics No aug. Top-1 Title Top-5 Titles
Basic 14.7 45.0 55.0 METEOR 23.35 24.07 24.41
Basic + member 15.7 46.2 56.3 ROUGE-L 30.65 30.76 30.88
Basic + footnotes 13.9 44.4 54.2 MoverScore 59.82 60.23 60.31
Member 10.7 35.0 46.3 BERTScore 86.04 86.11 86.17
Title 13.9 43.8 53.4 Title Acc. 6.96 7.99 10.82

Table 8: Retrieval results of DPR for the English test Table 10: Response generation results for the English
split with varying table representations. Overview of test split.
metadata in Section 3.3.
Transfer to table retrieval task Our experi-
Model R@1 R@10 R@20
ments allow us to analyze the effectiveness of
BM25 0.3 2.3 3.8
DPR 14.3 45.1 54.2
open-domain QA fine-tuning (NQ) and tabular pre-
TAPAS 6.1 22.1 31.5 training when transferring to our table retrieval
TAPAS-NQ 7.4 30.0 39.3 task. We observe in Table 9 that DPR outperforms
TAPAS and TAPAS-NQ by respectively 23.0% and
Table 9: Retrieval results for the English test split when
15.1% in test recall@10. Moreover, TAPAS-NQ
encoding title and member items. DPR and TAPAS
were run 3 times and averaged. achieves a better performance when it only re-
trieves the title and member items instead of the
allows T5 to decide between using one of the sug- full table (p = 0.016), likely due to repetitions
gested titles and generating something else (e.g., and truncation due to context size limits. Although
clarification question). This is similar to the agents’ both DPR and TAPAS-NQ were trained on NQ, the
behavior, as they tend to return a title with the URL latter was trained on a small subset (12K vs 320K)
when sharing a relevant table. Furthermore, super- that contains tables. Our experiments indicate that
vising T5 to ignore or return a title is equivalent to TAPAS transfers poorly from one task (NQ-Tables)
an implicit binary classification. to another (StatCan).
Response generation We compare the perfor-
6 Results and Discussions
mance of fine-tuned T5-large models with and with-
Based on our baselines and data, we report the re- out DPR-augmented table titles. In Table 10, we
sults and analyze the challenges that our dataset notice that retrieval-augmented models show mod-
and tasks pose for existing models. For the En- est improvements on the 4 non-task specific metrics.
glish splits, the main retrieval results are reported However, the top-5 augmented model achieves an
in Table 9 and Table 8, and main generation results absolute improvement of 3.86% in title accuracy,
are in Table 10. Full results can be found in Ap- indicating that the information provided by DPR
pendix A, respectively in Table 16 and Table 33, does help T5 in generating the desired title. It is
and relevant statistical tests in Appendix A.7. nevertheless surprising that T5 without augmen-
tation achieves a score of 6.96%, suggesting that
Impact of table representation In Table 9, we
the T5 model is capable of storing the titles seen
observe that the metadata representation affects
at training time, and, to a limited extent, is able
the retrieval recall. Although DPR can achieves
to recall and return them at test time. Finally, we
respectable results when it only retrieves the ti-
point out that the title accuracy is still 5.91% lower
tle, including basic information (defined in Sec-
than top-1 recall of the DPR retriever (Table 8),
tion 3.3) yields slight improvements, and further
indicating that T5 fails to learn when to return a
adding member items results in a significant dif-
table (despite the agent retrieving a table in 23.4%
ference from only using title (p = 0.014). How-
of all turns). In the case of top-5 titles, T5 struggles
ever, only using member item result in drastic de-
to decide which table title to return.
crease in recall (p = 0.00086), indicating the im-
portance of the title. Moreover, footnotes do not Qualitative analysis of generated responses We
yield any improvement, which may be because examine various conversations to understand what
they often exceed the maximum context span (see type of responses are generated by T5. We find that
Table 20). Thus, concisely but meaningfully repre- it can generate simple speech acts like greetings,
senting metadata will be crucial to achieve a good but can struggle with context-specific speech acts
recall on the retrieval task. such as clarification questions. Moreover, it can
2805
Split Valid Test Split Valid Test

Appearance Unseen TAPAS

Model
Infreq. DPR Title

Frequent DPR B+M

0 20 40 60 0 20 40 60
Recall@5 Recall@10

Figure 3: Results for DPR B+M for tables appearing Figure 4: Validation and test recall for a selected set
in the training set frequently (10+ times), infrequently of retrieval models. We observe a significant drop in
(1-10 times), and unseen at train time. performance. B+M denotes Basic + member.

reliably reply with the title of a common table, spurious correlation with hidden factors). Thus,
struggles for uncommon ones, and is sometimes future models should aim to close the gap between
capable of generating unseen titles with the help of unseen and frequent tables and within the temporal
DPR. The full analysis is in Appendix A.6. spectrum of infrequent tables.

Temporal drifts As explained in Section 3.4, we French results In both tasks, we see a drop
use a temporal split to test the model’s ability to across all metrics for all models, likely due to the
generalize to future conversations. We observe smaller dataset size. Some observations remain
a significant drop in recall (13%-28.3%) in Fig- valid: temporal drift, poor BM25 performance,
ure 4 when we compare the validation and test set and augmentations benefit mT5 for certain met-
performance, even when the models are trained rics. However, others differ: adding member items
with varying metadata representations. Similarly, hurts test results and mT5 performs poorly on ti-
T5 achieves low scores on the test split for the re- tle accuracy. Modeling details and results can be
sponse generation task (Table 10). This large gap found in the Appendix A.5.
suggests that trained models struggle to generalize
to future conversations. First, we found that this 7 Conclusion
is likely caused by the number of new tables that
appear in the test split (145) compared to validation In this paper, we introduce the StatCan Dialogue
(41), as shown in Table 3. Moreover, the subjects of Dataset, a novel corpus consisting of 20K+ English
the conversations have significantly changed: users and French conversations between online visitors
started to care more about businesses, health and IT, of statcan.gc.ca and operators of Statistics Canada.
and less about demography, income and pensions. Based on this dataset, we propose two tasks cen-
This is likely motivated by real-world events affect- tered on helping users find specific data tables: the
ing the users, which are more difficult to implicitly table retrieval task and the response generation
capture from simulated environments, but desir- task. For the table retrieval task, we experiment
able in order to understand a model’s robustness with various DPR and TAPAS variants, finding that
in temporal shift and for real world applications. DPR strongly outperforms its TAPAS counterpart,
In the Appendix, Figure 8 displays the differences as well as the BM25 baseline. For the response
between the training and test splits for all subjects. generation task, we investigate fine-tuned T5-large
models and explore variants where the input is aug-
Generalizing to unseen tables As shown in Fig- mented with table titles from DPR. We find that
ure 3, DPR performs well for tables appearing fre- retrieval-augmented T5 models more frequently re-
quently in the validation split, but poorly in the test turn the correct tables, although its title accuracy is
split, which could be caused by temporal drift. As still lower than the corresponding recall of the DPR
expected, tables that were not seen during training retriever. This result suggests that the generation
resulted in poor recall@5 in either splits. More- models struggle to decide when to return a table.
over, the difference in recall between valid and test We also find that retrieval and generation have dif-
for infrequent tables could be caused by many po- ficulty generalizing to future conversations, as our
tential reasons (learning bias, temporal overfitting, temporal test split revealed a big performance gap
2806
between the validation and test set. All in all, we to be published in the same data repository with
believe that our tasks pose significant challenges the original terms of use and licenses preserved or
to currently available models and encourage the extended.
research community to further explore this dataset
and build conversational models that help users of Risk of toxicity in online discourse StatCan
Statistics Canada. agents are trained to work with online users in
a professional manner. Moreover, since the users
Limitations access statcan.gc.ca anonymously and virtually, it
is more likely to observe toxic online disinhibi-
Tasks and models limitations The tables in the tion (Lapidot-Lefler and Barak, 2012), which could
retrieval task are sourced from statcan.gc.ca, which translate to toxicity in users’ utterances. Thus, we
means that the content is primarily about Cana- request dataset users to report any instance of toxi-
dian demographics2 and are professionally edited city in conversations, which will be reviewed in the
by StatCan employees. Moreover, the generation same manner as PII leaks.
task is specifically designed to model responses
with high fidelity based on retrieved tables, so this Acknowledgement
task should not be directly used in an unintended
or non-research setting (e.g., deploying a virtual We thank our collaborators at Statistics Canada
assistant) as they pose risks of hallucination that for providing us the dataset, guiding us through the
could negatively impact stakeholders. Furthermore, technical aspects of the tables, and sharing valuable
those limitations can be reflected in the models we feedback on the project. We thank Sivan Milton
trained, so we will share those limitations in the for the helpful discussions on dialogue acts and
model cards (Mitchell et al., 2019) on release. analysis.

Environment impact We acknowledge the mod-

els in Section 5 used hardware with significant en- References
ergy consumption. We purposefully chose models
of reasonable sizes that can be reproduced on one Vaibhav Adlakha, Shehzaad Dhuliawala, Kaheer Sule-
man, Harm de Vries, and Siva Reddy. 2022. Topi-
GPU. Additionally, our hardware is powered by OCQA: Open-domain Conversational Question An-
renewable energy. swering with Topic Switching. Transactions of the
Association for Computational Linguistics.
Artifacts and computational experiments We
trained models using libraries based on their in- Raviteja Anantha, Svitlana Vakulenko, Zhucheng Tu,
tended use and we will release the relevant arti- Shayne Longpre, Stephen Pulman, and Srinivas
Chappidi. 2021. Open-domain question answering
facts following the original licenses. The compu- goes conversational via question rewriting. In Pro-
tational details of the experiments are described in ceedings of the 2021 Conference of the North Amer-
Appendix A.4. ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages
Ethics Statement 520–534, Online. Association for Computational Lin-
guistics.
Privacy and data access As discussed in Sec-
tion 1, significant efforts were made to remove Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng,
Jianfeng Gao, Xiaodong Liu, Rangan Majumder, An-
Personally Identifiable Information (PII). However, drew McNamara, Bhaskar Mitra, Tri Nguyen, Mir
we do not rule out the possibility that certain de- Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary,
tails could have been missed in that process. Thus, and Tong Wang. 2018. MS MARCO: A Human Gen-
any user that wishes to use the data will need to erated MAchine Reading COmprehension Dataset.
arXiv:1611.09268 [cs]. ArXiv: 1611.09268.
authenticate and accept the terms of use through
an institutional data repository; the terms will re- Satanjeev Banerjee and Alon Lavie. 2005. METEOR:
quire the user to report any instance of PII leak, An Automatic Metric for MT Evaluation with Im-
which will be removed with a dataset update. Addi- proved Correlation with Human Judgments. In Pro-
tionally, we request any derivative or modifications ceedings of the ACL Workshop on Intrinsic and Ex-
trinsic Evaluation Measures for Machine Transla-
2
More information can be found here: https: tion and/or Summarization, pages 65–72, Ann Arbor,
//www.statcan.gc.ca/en/subjects-start/ Michigan. Association for Computational Linguis-
population_and_demography tics.

2807
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Findings of the Association for Computational Lin-
Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra- guistics: EMNLP 2020, pages 1193–1208, Online.
madan, and Milica Gašić. 2018. MultiWOZ - a large- Association for Computational Linguistics.
scale multi-domain Wizard-of-Oz dataset for task-
oriented dialogue modelling. In Proceedings of the Emily Dinan, Stephen Roller, Kurt Shuster, Angela
2018 Conference on Empirical Methods in Natural Fan, Michael Auli, and Jason Weston. 2018. Wizard
Language Processing, pages 5016–5026, Brussels, of Wikipedia: Knowledge-Powered Conversational
Belgium. Association for Computational Linguistics. Agents. In International Conference on Learning
Representations.
Harry Bunt, Jan Alexandersson, Jean Carletta, Jae-
Woong Choe, Alex Chengyu Fang, Koiti Hasida, Etalab Lab-IA. 2021. DPR CamemBERT trained on
Kiyong Lee, Volha Petukhova, Andrei Popescu-Belis, French Q&A.
Laurent Romary, Claudia Soria, and David Traum.
2010. Towards an ISO Standard for Dialogue Act Song Feng, Siva Sankalp Patel, Hui Wan, and Sachindra
Annotation. In Seventh conference on International Joshi. 2021. MultiDoc2Dial: Modeling dialogues
Language Resources and Evaluation (LREC’10). grounded in multiple documents. In Proceedings of
the 2021 Conference on Empirical Methods in Natu-
Harry Bunt, Volha Petukhova, Emer Gilmartin, Cather- ral Language Processing, pages 6162–6176, Online
ine Pelachaud, Alex Fang, Simon Keizer, and Laurent and Punta Cana, Dominican Republic. Association
Prevot. 2020. The iso standard for dialogue act anno- for Computational Linguistics.
tation. In 12th Edition of its Language Resources and
Evaluation Conference (LREC 2020), pages 549–558. Song Feng, Hui Wan, Chulaka Gunasekara, Siva Patel,
European Language Resources Association (ELRA). Sachindra Joshi, and Luis Lastras. 2020. doc2dial: A
goal-oriented document-grounded dialogue dataset.
Derek Chen, Howard Chen, Yi Yang, Alexander Lin, In Proceedings of the 2020 Conference on Empirical
and Zhou Yu. 2021. Action-based conversations Methods in Natural Language Processing (EMNLP),
dataset: A corpus for building more in-depth task- pages 8118–8128, Online. Association for Computa-
oriented dialogue systems. In Proceedings of the tional Linguistics.
2021 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu- Timnit Gebru, Jamie H. Morgenstern, Briana Vecchione,
man Language Technologies, pages 3002–3017, On- Jennifer Wortman Vaughan, Hanna M. Wallach, Hal
line. Association for Computational Linguistics. Daumé, and Kate Crawford. 2021. Datasheets for
datasets. Communications of the ACM, 64:86 – 92.
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos
Guestrin. 2016. Training Deep Nets with Sublin- Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao,
ear Memory Cost. arXiv:1604.06174 [cs]. ArXiv: Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu,
1604.06174. Qiaoqiao She, Xuan Liu, Tian Wu, and Haifeng
Wang. 2018. DuReader: a Chinese Machine Read-
Wenhu Chen, Ming-Wei Chang, Eva Schlinger, ing Comprehension Dataset from Real-world Appli-
William Yang Wang, and William W. Cohen. 2020. cations. In Proceedings of the Workshop on Ma-
Open Question Answering over Tables and Text. In chine Reading for Question Answering, pages 37–46,
International Conference on Learning Representa- Melbourne, Australia. Association for Computational
tions. Linguistics.

Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen- Charles T. Hemphill, John J. Godfrey, and George R.
tau Yih, Yejin Choi, Percy Liang, and Luke Zettle- Doddington. 1990. The ATIS spoken language sys-
moyer. 2018. QuAC: Question Answering in Context. tems pilot corpus. In Proceedings of the workshop
In Proceedings of the 2018 Conference on Empiri- on Speech and Natural Language - HLT ’90, pages
cal Methods in Natural Language Processing, pages 96–101, Hidden Valley, Pennsylvania. Association
2174–2184, Brussels, Belgium. Association for Com- for Computational Linguistics.
putational Linguistics.
Jonathan Herzig, Thomas Müller, Syrine Krichene, and
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Julian Eisenschlos. 2021. Open Domain Question
Kristina Toutanova. 2019. BERT: Pre-training of Answering over Tables via Dense Retrieval. In Pro-
Deep Bidirectional Transformers for Language Un- ceedings of the 2021 Conference of the North Amer-
derstanding. In Proceedings of the 2019 Conference ican Chapter of the Association for Computational
of the North American Chapter of the Association for Linguistics: Human Language Technologies, pages
Computational Linguistics: Human Language Tech- 512–519, Online. Association for Computational Lin-
nologies, Volume 1 (Long and Short Papers), pages guistics.
4171–4186, Minneapolis, Minnesota. Association for
Computational Linguistics. Jonathan Herzig, Pawel Krzysztof Nowak, Thomas
Müller, Francesco Piccinno, and Julian Eisenschlos.
Martin d’Hoffschmidt, Wacim Belblidia, Quentin 2020. TaPas: Weakly Supervised Table Parsing via
Heinrich, Tom Brendlé, and Maxime Vidal. 2020. Pre-training. In Proceedings of the 58th Annual Meet-
FQuAD: French Question Answering Dataset. In ing of the Association for Computational Linguistics,

2808
pages 4320–4333, Online. Association for Computa- Chin-Yew Lin and Franz Josef Och. 2004. Auto-
tional Linguistics. matic evaluation of machine translation quality using
longest common subsequence and skip-bigram statis-
Armand Joulin, Edouard Grave, Piotr Bojanowski, and tics. In Proceedings of the 42nd Annual Meeting
Tomas Mikolov. 2017. Bag of tricks for efficient on Association for Computational Linguistics - ACL
text classification. In Proceedings of the 15th Con- ’04, pages 605–es, Barcelona, Spain. Association for
ference of the European Chapter of the Association Computational Linguistics.
for Computational Linguistics: Volume 2, Short Pa-
pers, pages 427–431, Valencia, Spain. Association Ilya Loshchilov and Frank Hutter. 2017. Decoupled
for Computational Linguistics. weight decay regularization. In International Confer-
ence on Learning Representations.
Ali Kabbadj. 2021. French-SQuAD : French Machine
Reading for Question Answering. Original-date: Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle
2019-09-22T17:54:45Z. Pineau. 2015. The Ubuntu Dialogue Corpus: A
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Large Dataset for Research in Unstructured Multi-
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Turn Dialogue Systems. In Proceedings of the 16th
Wen-tau Yih. 2020. Dense Passage Retrieval for Annual Meeting of the Special Interest Group on Dis-
Open-Domain Question Answering. In Proceedings course and Dialogue, pages 285–294, Prague, Czech
of the 2020 Conference on Empirical Methods in Republic. Association for Computational Linguistics.
Natural Language Processing (EMNLP), pages 6769–
6781, Online. Association for Computational Lin- Marco Lui and Timothy Baldwin. 2011. Cross-domain
guistics. Feature Selection for Language Identification. In
Proceedings of 5th International Joint Conference
Rachel Keraron, Guillaume Lancrenon, Mathilde Bras, on Natural Language Processing, pages 553–561,
Frédéric Allary, Gilles Moyse, Thomas Scialom, Chiang Mai, Thailand. Asian Federation of Natural
Edmundo-Pavel Soriano-Morales, and Jacopo Sta- Language Processing.
iano. 2020. Project PIAF: Building a native French
question-answering dataset. In Proceedings of the Louis Martin, Benjamin Muller, Pedro Javier Or-
Twelfth Language Resources and Evaluation Confer- tiz Suárez, Yoann Dupont, Laurent Romary, Éric
ence, pages 5481–5490, Marseille, France. European de la Clergerie, Djamé Seddah, and Benoît Sagot.
Language Resources Association. 2020. CamemBERT: a tasty French language model.
In Proceedings of the 58th Annual Meeting of the As-
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red- sociation for Computational Linguistics, pages 7203–
field, Michael Collins, Ankur Parikh, Chris Alberti, 7219, Online. Association for Computational Lin-
Danielle Epstein, Illia Polosukhin, Jacob Devlin, Ken- guistics.
ton Lee, Kristina Toutanova, Llion Jones, Matthew
Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Margaret Mitchell, Simone Wu, Andrew Zaldivar,
Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Parker Barnes, Lucy Vasserman, Ben Hutchinson,
Questions: A Benchmark for Question Answering Elena Spitzer, Inioluwa Deborah Raji, and Timnit
Research. Transactions of the Association for Com- Gebru. 2019. Model Cards for Model Reporting. In
putational Linguistics, 7:453–466. Proceedings of the Conference on Fairness, Account-
ability, and Transparency, FAT* ’19, pages 220–229,
Noam Lapidot-Lefler and Azy Barak. 2012. Effects New York, NY, USA. Association for Computing
of anonymity, invisibility, and lack of eye-contact Machinery.
on toxic online disinhibition. Computers in Human
Behavior, 28(2):434–443. Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann, Man-
Harrison Lee, Raghav Gupta, Abhinav Rastogi, Yuan aal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipan-
Cao, Bin Zhang, and Yonghui Wu. 2022. Sgd-x: jan Das. 2020. ToTTo: A controlled table-to-text
A benchmark for robust generalization in schema- generation dataset. In Proceedings of the 2020 Con-
guided dialogue systems. In Proceedings of the AAAI ference on Empirical Methods in Natural Language
Conference on Artificial Intelligence. Processing (EMNLP), pages 1173–1186, Online. As-
sociation for Computational Linguistics.
Quentin Lhoest, Albert Villanova del Moral, Yacine
Jernite, Abhishek Thakur, Patrick von Platen, Suraj Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce
Patil, Julien Chaumond, Mariama Drame, Julien Plu, Croft, and Mohit Iyyer. 2020. Open-Retrieval Con-
Lewis Tunstall, et al. 2021. Datasets: A community versational Question Answering. In Proceedings of
library for natural language processing. In Proceed- the 43rd International ACM SIGIR Conference on
ings of the 2021 Conference on Empirical Methods Research and Development in Information Retrieval,
in Natural Language Processing: System Demonstra- SIGIR ’20, pages 539–548, New York, NY, USA.
tions, pages 175–184. Association for Computing Machinery.

Chin-Yew Lin. 2004. ROUGE: A Package for Auto- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
matic Evaluation of Summaries. In Text Summariza- Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
tion Branches Out, pages 74–81, Barcelona, Spain. Wei Li, and Peter J. Liu. 2020. Exploring the Lim-
Association for Computational Linguistics. its of Transfer Learning with a Unified Text-to-Text

2809
Transformer. Journal of Machine Learning Research, Le Scao, Sylvain Gugger, Mariama Drame, Quentin
21(140):1–67. Lhoest, and Alexander Rush. 2020. Transformers:
State-of-the-Art Natural Language Processing. In
Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Proceedings of the 2020 Conference on Empirical
Raghav Gupta, and Pranav Khaitan. 2020. Towards Methods in Natural Language Processing: System
scalable multi-domain conversational agents: The Demonstrations, pages 38–45, Online. Association
schema-guided dialogue dataset. In Proceedings of for Computational Linguistics.
the AAAI Conference on Artificial Intelligence, vol-
ume 34, pages 8689–8696. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale,
Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and
Siva Reddy, Danqi Chen, and Christopher D. Manning. Colin Raffel. 2021. mT5: A Massively Multilingual
2019. CoQA: A Conversational Question Answer- Pre-trained Text-to-Text Transformer. In Proceed-
ing Challenge. Transactions of the Association for ings of the 2021 Conference of the North American
Computational Linguistics, 7:249–266. Chapter of the Association for Computational Lin-
guistics: Human Language Technologies, pages 483–
Radim Rehurek and Petr Sojka. 2010. Software Frame-
498, Online. Association for Computational Linguis-
work for Topic Modelling with Large Corpora. In
tics.
In Proceedings of the Lrec 2010 Workshop on New
Challenges for Nlp Frameworks, pages 45–50. Yi Yang, Wen-tau Yih, and Christopher Meek. 2015.
WikiQA: A Challenge Dataset for Open-Domain
Stephen Robertson and Hugo Zaragoza. 2009. The
Question Answering. In Proceedings of the 2015
Probabilistic Relevance Framework: BM25 and Be-
Conference on Empirical Methods in Natural Lan-
yond. Foundations and Trends® in Information Re-
guage Processing, pages 2013–2018, Lisbon, Portu-
trieval, 3(4):333–389.
gal. Association for Computational Linguistics.
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas.
2000. The Earth Mover’s Distance as a Metric for Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se-
Image Retrieval. International Journal of Computer bastian Riedel. 2020. TaBERT: Pretraining for Joint
Vision, 40(2):99–121. Understanding of Textual and Tabular Data. In Pro-
ceedings of the 58th Annual Meeting of the Asso-
Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer ciation for Computational Linguistics, pages 8413–
Singh, Tim Rocktäschel, Mike Sheldon, Guillaume 8426, Online. Association for Computational Lin-
Bouchard, and Sebastian Riedel. 2018. Interpretation guistics.
of natural language rules in conversational machine
reading. In Proceedings of the 2018 Conference on Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q.
Empirical Methods in Natural Language Processing, Weinberger, and Yoav Artzi. 2019. BERTScore:
pages 2087–2097, Brussels, Belgium. Association Evaluating Text Generation with BERT. In Inter-
for Computational Linguistics. national Conference on Learning Representations.

Noam Shazeer and Mitchell Stern. 2018. Adafactor: Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Chris-
Adaptive learning rates with sublinear memory cost. tian M. Meyer, and Steffen Eger. 2019. MoverScore:
In Proceedings of the 35th International Conference Text Generation Evaluating with Contextualized Em-
on Machine Learning, volume 80 of Proceedings beddings and Earth Mover Distance. In Proceedings
of Machine Learning Research, pages 4596–4604. of the 2019 Conference on Empirical Methods in
PMLR. Natural Language Processing and the 9th Interna-
tional Joint Conference on Natural Language Pro-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob cessing (EMNLP-IJCNLP), pages 563–578, Hong
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kong, China. Association for Computational Lin-
Kaiser, and Illia Polosukhin. 2017. Attention is All guistics.
you Need. In Advances in Neural Information Pro-
cessing Systems, volume 30. Curran Associates, Inc. A Appendices
Ellen M. Voorhees. 2001. The TREC question answer- A.1 Complete User Intents
ing track. Natural Language Engineering, 7(4):361–
378. To provide insight into what kind of help is offered
by StatCan’s live chat, we qualitatively analyze the
Harm de Vries, Dzmitry Bahdanau, and Christo-
pher Manning. 2020. Towards Ecologically conversations and highlight examples of the main
Valid Research on Language User Interfaces. user intents below.
arXiv:2007.14435 [cs]. ArXiv: 2007.14435.
Finding a table The main intent of the chat func-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien tionality is to help users find specific data tables.
Chaumond, Clement Delangue, Anthony Moi, Pier-
ric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz,
For example, one user was looking for the popu-
Joe Davison, Sam Shleifer, Patrick von Platen, Clara lation numbers in certain regions of Montreal for
Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven 2012-2016. This intent is the focus of our work.
2810
Obtaining meta-information of table Instead A.2.2 Detailed Specifications
of finding data tables some users are interested in This section provides supplementary details for
meta information of a specific table. For example, Section 3.3.
one visitor wanted to know when the next Census
is released. Another user was interested in under- Product ID (PID) Unique 8 to 10-digit identifier
standing the definition of Workforce Availability given to each published data table. Although other
(WFA) and Labour Market Availability (LMA). types of tables might be shared by the agent, the
PID will always be given for the official data tables;
Manipulating a data table Some users would as a result, any table that does not have a PID in
like to obtain the data tables in a different format or the URL is not considered for this task. The first
representation. For example, one user was looking two digits (1-2) represent the code of the subject
at a specific data table and asked if they can see associated with the table (this can be found in the
annual instead of monthly values. basic information), then 3-4 represent the product
type, which in our case are tables and are common
coded as “10". Digits 4-8 is a unique identifier
Help with user interface Some users are looking
representing that table for the given subject. Digits
for help with the user interface. For example, one
9-10 indicate the view of the table and will com-
user wanted to download a specific data table but
monly be “01", which represents the default view;
they were unable to find the download link.
they are needed when constructing the URL but
they are otherwise optional, and are omitted in the
Out of domain requests We find many conver-
released dataset.
sations that are outside of the scope of StatCan’s
live chat. For example, some user asked what Member items Labels for individual tables, and
documentation needs to be provided to ship a can either be represented as a column or a row
specific product to a foreign country. index through a pivot operation. Each table will
have member items different from other tables.
The first intent is covered in Table 1, and sub-
Dimensions Non-overlapping sets of member
sequent intents are in Table 18. While we believe
items; each member item must belong to a dimen-
all intents are interesting directions for dialogue
sion. For a table with d dimensions, each data
research, we focus on the table retrieval intent be-
value is associated with a single d-tuple in the d-
cause (i) there are many conversations available
fold Cartesian product of the dimensions, but not
for them and (ii) there is a clear measure of task
every tuple in the product will have an associated
success i.e., whether the correct table is retrieved.
data value (if it was not recorded or if the tuple is
Throughout the rest of this paper, we work with
invalid).
conversations where the agent returns a table URL.
Basic information Metadatum consisting of the
A.2 Supplementary Table Description title, the date range, the frequency, the dimensions,
the subject category, and the survey3 from which
A.2.1 Formatting and size the data was sourced.
Full tables are stored in a long format, where each Footnotes Unstructured comments often in-
column corresponds to a dimension, except the last cluded with a table if supplementary details need to
one which corresponds to the value. The number be given; usually, those notes are associated with a
of rows corresponds to the Cartesian product of specific member item that requires more explana-
the dimensions’ members. This means that for ni tions.
members
Qd across d dimensions, we have a total of
n i rows in the full table. When a table has Full table For all tables with a PID, the full tables
i=0
many member items, the full table can become (as a CSV file), their complete metadata, and their
extremely large. Moreover, tables shown on the basic information are made available as part of the
statcan.gc.ca are usually a pivoted and filtered view data release and can be used in our proposed tasks.
of the full table, which means certain members will A sample table can be found in Section 3.3.
become columns, others will become rows, and 3
The list of surveys and link to detailed information can
many are simply omitted. be found at https://www.statcan.gc.ca/en/survey/list

2811
A.2.3 Table updates and archives although most messages will have around 32 to-
Tables on Statcan will sometimes be updated reg- kens, the longest message can have up to 1374
ularly, whereas in other times they may only be tokens; in those scenarios, we will see agents write
released once. When they stop being updated and a large body of text, and sometimes also copy and
the information becomes outdated, they will be paste large amount of text (for example, from a
marked as archived. In some cases, a new version database of templates) when responding to the user.
of an archive table may be created with substantial Although those are usually sent in multiple consec-
changes (such as new columns). As shown in Ta- utive messages within a turn, they may decide to
ble 14, tables are released without a predetermined send everything all at once.
schedule, but has been on average increasing since
A.4 Implementation Details
2000, with major peaks in 2017, which was caused
by many health-related tables, and 2021, which This section provides the details for implementing
was caused by the release of many labour, science, the models in Section 5.
and income tables, as shown in Figure 6. Moreover,
Implementing transformer models All models
Figure 5 shows that a majority of tables released
based on the transformer architecture (Vaswani
between 2019 and 2021 are still up-to-date (cur-
et al., 2017) were implemented using Hugging-
rent), whereas most of the tables before then have
Face’s library (Wolf et al., 2020).
been archived.
BM25 To facilitate reproducibility, we imple-
A.3 Supplementary Statistics mented the model in Gensim (Rehurek and Sojka,
A.3.1 Frequently requested tables 2010).
The most frequent tables are summarized in Ta- DPR and TAPAS-NQ We used the base variant
ble 21. Whereas 2 of them are sourced from the of DPR and the large variant of TAPAS-NQ. We
consumer price index survey (commonly used to use the DPR checkpoints that were trained on 320K
track inflation), the 4 other tables cover more gen- questions from Natural Questions (Kwiatkowski
eral and broad subjects like income, demography, et al., 2019) (NQ). During training, the networks
business performance and crime. Each of those 4 were optimized with AdamW (Loshchilov and Hut-
tables are sourced from different surveys. One ta- ter, 2017) at a learning rate of 10−5 and zero weight
ble is updated monthly, another semi-annually, and decay. Based on the original work, the networks
the rest are updated annually. The oldest table was were trained for 30 epochs, with a batch size of
updated in 2019, which is when the conversations 64 queries, positive passages, and hard negative
started being recorded. passages (the latter are retrieved with BM25). Neg-
A.3.2 Table Frequency Statistics ative in-batch sampling was used to increase neg-
ative examples. To ensure reproducibility, the net-
In tables 3 and 12, we can calculate that a table works were trained on a single 32GB GPU and used
is returned on average 5.68 times (with standard gradient checkpointing (Chen et al., 2016) to re-
deviation of 12.86) in English conversations and duce memory usage. The conversation lengths was
4.25 times (standard deviation of 9.73) in French 512 tokens, and the metadata token lengths were
conversations. Thus, the most requested tables are 128 for title, 256 for basic information (defined in
disproportionately represented compared to less Section 3.3), and 512 for the rest.
popular tables, and there’s a very high variance in
the number of time a table is used. In fact, there T5 We used an Adafactor optimizer (Shazeer and
are 294 tables that appear only once in either splits Stern, 2018) with a learning rate of 0.001. We used
(i.e., 28.9%). batch sizes of 16 with 8 steps of gradient accumu-
lation and gradient checkpointing to reproduce the
A.3.3 Fine-grained Conversation Statistics batch size of 128 samples in the original implemen-
In Table 13, we observe that the number of mes- tation. The models were trained on a single 32GB
sages and turns will vary significantly around the GPU for 10 epochs. The source and target lengths
mean, with over 68% conversations lasting between were respectively 512 and 256 tokens, where the
2 and 7 turns. In extreme cases, a conversation can source was truncated from the right to ensure that
last up to 28 turns. Moreover, we also notice that, the latest messages remained after truncation. We
2812
used a beam size of 4 and length penalty of 0.6 Generation results In Table 33, we notice a sig-
following the original implementation. nificant decrease across all metrics, with the title
accuracy being consistently 0%. This is likely be-
Training time For the English split, each variant
cause the French split is significantly smaller, yet
of DPR can be trained in 68 minutes on a A100
remains as complex as the English split, which
GPU. Each large variant of TAPAS-NQ takes 15h
becomes challenging for mT5 to model. In the
to train on a V100 GPU. Each variant of T5 can
case of title accuracy, we found 55 instances in
be trained in 16h on a V100 GPU. All DPR results
the French test split where the title is in the target
can be reproduced in 8h, TAPAS in 90h, and T5 in
text (i.e., returned by an agent). However, in 54
96h, and proportionally less time would be needed
cases, the augmented mT5 returned a generic reply
for the French split.
(e.g., “Veuillez patienter pendant que j’effectue une
A.5 Modeling the French subsets recherche.") instead of the expected title, which in-
Basic statistics The number of messages by con- dicates that mT5 is incapable of determining when
versation varies between 2 and 59 with a median it is relevant to return a title and can’t generate
of 11 for the English split (see Figure 7 for the non-templated responses.
distribution). Based on Table 12, there’s on aver-
A.6 Responses generated by T5
age 3.9 turns but 12.3 messages. On average, there
are over 30 tokens for each message (using the T5 In this section, we select a few conversations from
tokenizer). the validation set and examine the messages gener-
ated by T5 and T5 augmented with DPR-retrieved
Language splitting In order to determine the lan-
titles (T5+D).
guage of each conversation, we used two popular
language identification libraries: langid.py (Lui
Common and uncommon responses In Ta-
and Baldwin, 2011) and a fasttext network fine-
ble 23, we notice that both T5 and T5+D are capa-
tuned for language detection (Joulin et al., 2017).
ble of generating common speech acts like “Thank
After apply the models on every conversation, we
you” and “Please wait...”, but struggles when faced
only retain the conversations with matching lan-
with an unfamiliar situation (having to ask for clar-
guage labels (both English or both French).
ification for a user that has been accidentally dis-
Training and evaluation The training procedure connected).
and evaluation on the French subsets follow exactly
the tasks specified in Section 4. Common table In Table 24, among the retrieved
tables (Ri ) titles, the first one was partially correct.
Modeling response generation Instead of T5, Both T5 and T5+D extended the title and also out-
we used the multilingual T5 model by Xue et al. put the desired ID, matching the expected agent’s
(2021) as it naturally handles text in French. response, which is one of the most popular table in
Modeling retrieval We used a variant of DPR the training set (see Table 21).
derived from CamemBERT (Martin et al., 2020)
and trained on three French Q&A datasets (Keraron Multiple tables, date selection In Table 28, we
et al., 2020; d’Hoffschmidt et al., 2020; Kabbadj, notice that T5 only returns one of the two tables
2021) by Etalab Lab-IA (2021). that the agent returned. On the other hand, the
correct tables were retrieved by DPR, but T5+D
Retrieval results In Table 17, we observe that, failed to select the ones with the correct dates (it
unlike the English split, adding member items to selected June 2019 instead of December 2019) but
the basic information or to the title improves val- the select were otherwise relevant.
idation results but not test results, which likely
indicates overfitting. However, we notice a high Verbosity of explanations T5+D additionally
variance between the runs, which makes it diffi- provided a paragraph of explanation while linking
cult to determine whether member items is helpful. to relevant resources, both on the StatCan website
Both overfitting and high variance are likely caused (non-tabular) and external resources. This is be-
by the smaller size of the training set. Moreover, cause T5+D memorized this information during
BM25 perform extremely poorly on any metadata training, and simply replaced the tables with the
view, which can also be linked to the dataset size. updated dates (Table 27).
2813
Uncommon tables In the conversations shown 3. Claim: TAPAS-NQ performs better with title
in Tables 25, 26, 29, the retrieved table appears and member items compared to the full table.
10 times in the training set, which is significantly With A being the model using the full table,
less common than the table retrieved in Table 24. and B using title + member, our p-value is
For Table 25, the table returned by the agent was 0.016.
not retrieved by DPR, leading to T5+D returning
the first title retrieved. However, in Table 26, the 4. Claim: In Figure 4, the validation recall@10
correct title was retrieved by DPR (title #4), yet are higher than the test split for TAPAS, DPR
T5+D failed to use that correct title in the gener- Title and DPR Basic + member. With A be-
ated message. As for Table 29, the agent gave a ing the validation score and B the test scores,
hint by stating “As a standard product, we have the p-values are respectively 0.00197, 2.18 ×
tables about employment by industry”, which was 10−5 , 0.00014.
correctly acknowledged by DPR as the second re-
trieved title perfectly matches the PID of the table
in the agent’s response. However, T5+D fails again
at selecting the correct title, instead opting to return
a generic response (“Please hold while I find the
information”), and T5 hallucinates a PID that is
different from the title it generated (both of which
are wrong).

Tables unseen during training Among the ta-

bles that do not appear in the training set (see Ta-
ble 11 for more information), there are seven that
appear 3 or more times in the validation or test
sets (Table 32). In Table 30, we see a conversa-
tion where DPR retrieves the correct title, which
is correctly returned by T5+D, whereas T5 fails
to return it. On the other hand, when DPR also
correctly retrieves the title in Table 31, T5+D fails
to return it, as it was likely mislead by the agent
saying “Unfortunately,...”.

A.7 Statistics Tests

To back the claims in Section 6, we performed
multiple single-tailed Welch t-tests, using the mean
and corrected standard deviation from 16. The null
hypotheses are that means of experiments A are
different from the means of experiments B, across
3 runs. Unless otherwise specified, we use R@1
on the test split.

1. Claim: Adding basic information and member

items to title results in a significant difference
for DPR. With A being the model using only
title, and B using basic + member, our p-value
is 0.014.

2. Claim: For DPR, using member item result in

drastic decrease. With A being the model us-
ing only member and B using title + member,
our p-value is 0.00086.
2814
B Dataset Card Tables The complete metadata of a table can
be found in Table 20, which can be access
This section presents a dataset card that follows the
at doi.org/10.25318/3210035401-eng. Table 21
format proposed by Lhoest et al. (2021), which was
shows the basic information for the most popular
inspired by Mitchell et al. (2019) and Gebru et al.
tables.
(2021).
B.1.2 Data Fields
Summary The StatCan Dialogue Dataset con-
sists of over 20K+ conversations between agents Full dataset A CSV file with the following fields
working at Statistics Canada (StatCan) and users is provided:
who are visiting StatCan’s website and need sup- • conversation: The partial conversation
port via the official live chat system. (before a table is returned) in JSON format.
Tasks A subset of 19K conversation turns is used
• conversation_index: A unique index
to build two tasks:
that serves at identifying the conversation out-
1. Automatic retrieval of relevant tables based side of this task.
on a on-going conversation. For each partial
• conversation_processed: The con-
conversation, the task is to return the ID of the
versation converted into a readable text format,
most likely table returned by an agent. This is
with extra information (such as timestamp) re-
evaluated using the recall@k metric.
moved, the URLs replaced with a special tag,
2. Automatic generation of appropriate agent re- and separation tags (</s>) added.
sponses at each turn. For each partial con-
Retrieval task CSV files with the following
versation, the task is to return the most likely
fields is provided for each split:
response by an agent, including link to a rele-
vant table. This is evaluated using four metrics • conversation
described in Section 4.2.
• conversation_index
Leaderboard The leaderboard and submission
instructions can be found on the project webpage. • conversation_processed
Each submission will be accompanied with a tag
indicating if: • target_pid: The product ID of the table
that is returned by the agent
• It was self-reported;
• language: The language reported by the
• The submissions were externally evaluated; live chat system, which may not always be
• The inference was reproduced following pro- accurate due to mislabeling.
vided instructions; • ft_detected_lang: The language pre-
• The complete training process was indepen- dicted by fastText.
dently reproduced. • ft_detected_lang: The score output by
Languages The conversations were held in Cana- fastText.
dian English (en-CA) and Canadian French (fr-
• lid_detected_lang: The language pre-
CA).
dicted by langid.py.
B.1 Dataset Structure
• lid_detected_prob: The score output
B.1.1 Data Instances by langid.py.
Conversation A full example of a conversation
can be found in Table 19. Instances for each user Metadata The metadata that was used during re-
intent can be found in Table 18, and two conver- trieval is provided as a CSV file with the following
sations with annotated dialogue acts can be found fields (one for each of the 5907 tables):
in Table 22. For our case study in A.6, we show • pid: The product ID of the table
partial conversations in Tables 24, 25, 26, 27, 28,
29, 30, 31. • title: The title of the table
2815
• basic_info: The basic information in the ’actor’: ’operator’,
textual format ’actor_name’: ’Kelly C’,
’text’: ’Building permits...’,
’urls’: [’https:...’]},
• member_info: The member items enumer- ...
ated as text ]

• footnote_info: The footnotes enumer- B.1.3 Data Split

ated as text The retrieval splits has the following number of
samples:
• full_info: The basic information, mem-
ber items and footnotes all in a single entry • Train: 3782 (en); 869 (fr)

• x_and_y: An combination of two • Validation: 799 (en); 201 (fr)

items above, for example x=title and
• Test: 870 (en); 141 (fr)
y=footnotes.
They correspond to the number of queries in
• *_fr: All of the above are also available in
Table 3 and Table 12 because each query results
French, indicated by the suffix _fr.
in a table being retrieved, which can happen more
Generation task CSV files with the following than once in a conversation. In such cases, the
fields is provided for each split: partial conversations will be truncated at different
turns in the conversation.
• source: Equivalent to conversation. The generation splits have the following number
of samples:
• source_processed: Equivalent to
conversation_processed. • Train: 21582 (en); 3977 (fr)
• target: The message written by the agent • Validation: 4464 (en); 861 (fr)
following the conversation.
• Test: 4850 (en); 884 (fr)
• target_processed: The message writ-
There are fewer samples than the number of mes-
ten by the agent following the conversation,
sages in Table 3 and Table 12 because the goal of
with URLs replaced with a special tag.
the task is only to predict the messages that will be
• conversation_index written by the agent. Just like the retrieval task, the
partial conversations will be truncated at different
An augmented variant of each CSV file for the turns in the conversation.
conversation task is provided with all of the above
as well as the following fields: B.2 Dataset Creation
B.2.1 Curation Rationale
• source_augmented: The same content as
conversation, appended with the title of Section 1 extensively motivates the curation of the
the top-5 tables retrieved by best DPR variant dataset. To summarize, we enumerate the major
trained on the basic information. points:

• target_augmented_1: The same con- • Data from real users: We wanted a dataset that
tent as source_augmented, but using captures the linguistic challenges that exist in
only the first table instead of top-5. the real world

Conversation JSON formatting A conversation • Task-oriented dialogue: We wanted tasks with

follows the following JSON format: the specific goal of helping live chat users in
their search of statistics.
[
...
{’timestamp’: ’13.03.2019 17:03:22’, • Real-world applications: Our model can be di-
’actor’: ’user’, rectly applied other statistics offices that want
’actor_name’: ’<NAME>’, to set up a chat system, and our dataset will
’text’: "I’ll take a look at that",
’urls’: []}, be useful for any organization that has (1) a
{’timestamp’: ’13.03.2019 17:04:12’, chat system, and (2) a database of tables.
2816
• Multilingual dialogue: We wanted to build B.2.4 Personal and Sensitive Information
models that can handle languages beyond En- Personal and sensitive information were removed
glish, which is why we also offer French ver- programmatically from the conversations, and of-
sions of our tasks. ficially published tables only contain aggregated
information that preserve the confidentiality of
B.2.2 Source Data the participants. Although the removal process
Conversations The data was retrieved from the is highly advanced, there is a non-zero chance that
live chat system on statcan.gc.ca, which was some information can be used to reconstruct the
anonymized by the development team at StatCan. profile of a user. For this reason, the access to the
The conversations happened between March 2019 data will require researchers to sign-up and agree to
and March 2021. the terms of use, and any derivative must be shared
on the same platform and include the same terms.
Tables The tables are publicly available
B.3 Considerations for Using the Data
and were downloaded following the instruc-
tions in the Web Data Service user guide: B.3.1 Social Impact of Dataset
statcan.gc.ca/en/developers/wds/user-guide. The The purpose of this dataset is develop and evaluate
tables were released by Statistics Canada between models that can assist knowledge workers in find-
2000 and 2021. The data are either collected ing relevant tables from a data source. By providing
directly by Statistics Canada (e.g., through a a specialized retrieval system capable of returning
census or a survey) or were compiled from existing more relevant results compared to general purpose,
sources (such as private sector organizations the productivity of the knowledge workers can be
and government agencies) into official statistics. increased. For public agencies and statistics offices,
Existing sources include: this would benefit many live chat users interested
in statistics related to a certain community.
• Administrative data: Collected by government
B.3.2 Discussion of Biases
or the private sector as part of ongoing opera-
tions, and include records of birth and death, As discussed in Section 7, there are always risks
taxes, border control, and satellite data. of toxicity in online discourses, which means that
the live users may exhibit negative biases in their
• Microdata Linkage: Existing information is messages. However, the StatCan agents are tasked
linked to create new data. The existing infor- to communicate with online users in a professional
mation may not always be available publicly manner. Thus, researchers should not use this
(for privacy purposes), thus linkage could add dataset to build models that generate messages
new information that was previously unavail- written by live users, and any model trained on
able, while protecting the confidentiality of the dataset should not be used in scenarios where
the public. biases can negatively impact stakeholders.
B.3.3 Other Known Limitations
• Open data: Machine-readable and freely avail- Section 7 describes other known limitations.
able data sourced from various channels (e.g.,
OpenStreetMap). B.4 Additional Information
B.4.1 Dataset Curators
• Web scraping: Data from the internet that
The dataset was curated by the authors of this paper
were scraped by Statistics Canada (this ex-
based on the original data collected and processed
cludes personal information and “any informa-
by StatCan developers and agents.
tion that will not be used to produce statistical
output"). B.4.2 Licensing Information
The conversations use a custom license, which
B.2.3 Annotations needs to be accepted by researchers interested in
The dataset does not contain any additional anno- accessing the conversation. The tables are released
tations beyond the ones collected through the live under the the Statistics Canada Open Licence: stat-
chat system and included in the table metadata. can.gc.ca/en/reference/licence.

2817
Status Archived Current

1500
Tables released

1000

500

0
2000 2005 2010 2015 2020
Year

Figure 5: The release year of all tables available on statcan.gc.ca

Subject Other Income, pensions, spending and... Health Labour

Business and consumer services... Science and technology Agriculture and food
Economic accounts

1500
Tables released

1000

500

0
2000 2005 2010 2015 2020
Year

Figure 6: The release year of all tables by subject. Only the top 8 subjects are shown for readability.

split train valid test

10
percent

0
20 40

messages

Figure 7: Histogram of messages by conversation in the French task splits. English split in Figure 1.

2818
split train valid test
Labour

Health

Income, pensions, spending and wealth

Business performance and ownership

Population and demography

Economic accounts

Prices and price indexes

Crime and justice

Retail and wholesale

Agriculture

Education, training and learning

Construction

Environment

Government

International trade

Information and communications technology

Transportation

Energy

Travel and tourism

Business and consumer services and culture

Science and technology

Aboriginal peoples

Housing

Manufacturing

Society and community

Immigration and ethnocultural diversity

Families and households

Languages

Children and youth

0 5 10 15
percent

Figure 8: Distribution of subject categories for tables in the retrieval task, colored by the split.

2819
Train Valid Test Overall
Train 0 470 535 0 Dataset Train Valid Test All
Valid 41 0 199 0
Test 145 238 0 0 Max Tokens / Msg 1243 857 912 1243
Overall 181 610 571 0 Max Turns / Conv. 28 18 21 28
Max Msg / Conv. 78 58 73 78
(a) Differences Max Qs / Conv. 15 11 11 15
Std Tokens / Msg 1 1 1 1
Train Valid Test Overall Std Turns / Conv. 2.58 2.36 2.62 2.56
Train 778 308 243 778 Std Msg / Conv. 8.53 7.55 8.98 8.47
Valid 308 349 150 349 Std Qs / Conv. 1.27 1.24 1.18 1.25
Test 243 150 388 388 Max Tokens / Msg 1374 1373 352 1374
Overall 778 349 388 959 Max Turns / Conv. 15 10 15 15
Max Msg / Conv. 59 38 47 59
(b) Overlaps Max Qs / Conv. 15 15 5 15
Train Valid Test Overall Std Tokens / Msg 1 2 2 1
Std Turns / Conv. 2.21 2.08 2.1 2.17
Train 0 154 184 0 Std Msg / Conv. 7.02 6.74 6.61 6.91
Valid 18 0 64 0 Std Qs / Conv. 1.84 2.18 0.77 1.79
Test 44 60 0 0
Overall 61 197 201 0
Table 13: More statistics (max and standard deviation)
(c) Differences (French) at the conversation level (top: English, bottom: French),
Train Valid Test Overall following Table 3.
Train 224 70 40 224
Valid 70 88 24 88
Test 40 24 84 84
Overall 224 88 84 285

(d) Overlaps (French)

Table 11: Number of tables (a,c) differing and (b,d)

overlapping between each split (subset used for both
tasks). The difference is computed as row - column.
Summarized results in Table 3. Year Released # Archived # Up-to-date
2000 145 0
2001 39 0
2002 14 5
2003 8 0
2004 31 2
2005 1 0
2006 18 0
Dataset Train Valid Test All 2007 142 11
2008 85 7
# Conv. 562 122 109 793 2009 72 5
# Turns 2147 472 439 3058 2010 243 8
# Messages 6807 1492 1389 9688 2011 45 3
# Queries 869 201 141 1211 2012 248 8
# Tokens / Msg 47.25 49.86 40.14 46.63 2013 49 40
# Turns / Conv. 3.82 3.87 4.03 3.86 2014 142 145
# Msg / Conv. 12.11 12.23 12.74 12.22 2015 247 264
# Queries / Conv. 1.55 1.65 1.31 1.53 2016 52 134
2017 710 136
# Tables 224 88 84 285
2018 148 122
# New tables 0 18 44 61
2019 27 203
# Dims / Table 3.4 3.5 3.5 3.4
2020 43 640
# Mbrs / Table 205.8 249.9 164.1 190
2021 52 1633
# Notes / Table 20.3 26.3 23.4 21

Table 14: Release of Statcan tables over the years

Table 12: Statistics for the retrieval and generation tasks
for the French split. See Table 3 for the English split.

2820
Merged Acts Count Original Speech Acts
answer 50 answer, confirm, disconfirm
request 31 request, acceptOffer, declineOffer
timeManagement 28 stalling, pausing
inform 63 inform, agreement, disagreement, correction
infoSeekingQuestions 25 propositionalQuestion, checkQuestion, setQuestion, choiceQuestion, directQuestion
promise 18 promise, acceptRequest, declineRequest
autoFeedback 25 positiveAutoFeedback, negativeAutoFeedback
offer 11 offer
instruct 18 instruct
clarificationQuestion 16 clarificationQuestion
greeting 28 initialGreeting, returnGreeting
selfIntroduction 17 initialSelfIntroduction, returnSelfIntroduction
thanking 47 thanking
acceptThanking 15 acceptThanking

Table 15: Merged and original speech acts occurring in 100 turns in conversations from the validation set. This
table is summarized in Table 4.

Valid Test
Model Encoded R@1 R@10 R@20 R@1 R@10 R@20
DPR Basic 37.8 ± 1.82 73.1 ± 1.02 80.8 ± 0.51 14.7 ± 1.13 45.0 ± 0.24 55.0 ± 1.67
Basic + footnote 40.0 ± 0.64 73.3 ± 0.85 81.3 ± 1.19 13.9 ± 0.37 44.4 ± 2.73 54.2 ± 2.65
Basic + member 40.0 ± 0.44 74.5 ± 1.38 81.6 ± 0.55 15.7 ± 0.7 46.2 ± 0.44 56.3 ± 0.58
Member 28.0 ± 0.47 62.1 ± 1.07 71.4 ± 1.06 10.7 ± 0.88 35.0 ± 0.57 46.3 ± 0.83
Title 37.8 ± 0.21 70.9 ± 0.67 78.2 ± 1.18 13.9 ± 0.59 43.8 ± 1.22 53.4 ± 1.42
Title + footnote 39.5 ± 0.69 73.2 ± 0.81 80.8 ± 0.47 13.3 ± 0.7 43.6 ± 0.46 53.6 ± 1.5
Title + member 38.6 ± 1.19 72.1 ± 0.92 79.4 ± 0.72 14.3 ± 0.59 45.1 ± 0.44 54.2 ± 1.07
TAPAS Table 16.9 ± 4.26 45.9 ± 5.46 55.5 ± 3.13 3.9 ± 0.65 17.2 ± 2.83 24.6 ± 3.45
Title 21.5 ± 0.33 54.7 ± 3.39 63.7 ± 1.38 4.7 ± 0.53 20.2 ± 0.85 28.7 ± 1.98
Title + member 24.6 ± 3.6 54.9 ± 3.34 63.7 ± 2.74 6.1 ± 0.98 22.1 ± 2.66 31.5 ± 3.02
TAPAS-NQ Table 25.3 ± 3.46 60.0 ± 2.43 70.0 ± 2.89 5.3 ± 0.8 22.8 ± 2.32 32.0 ± 2.48
Title 26.5 ± 1.85 64.4 ± 1.95 71.6 ± 2.32 6.3 ± 0.64 27.4 ± 1.84 36.1 ± 2.53
Title + member 29.8 ± 2.74 62.9 ± 1.35 71.5 ± 0.63 7.4 ± 0.73 30.0 ± 1.46 39.3 ± 0.86
BM25 Basic 1.0 7.5 10.3 1.0 7.2 9.8
Basic + footnote 0.9 2.4 3.1 0.9 2.8 3.9
Basic + member 0.1 1.3 2.6 0.2 2.4 3.8
Full 0.3 1.3 1.9 0.2 1.1 2.0
Title 2.0 10.3 13.8 2.5 10.9 14.0
Title + footnote 0.8 2.3 3.1 0.7 2.9 4.0
Title + member 0.0 1.4 2.4 0.3 2.3 3.8

Table 16: Full retrieval results for the English splits. The values reported are in recall % at k. DPR and TAPAS were
run 3 times and averaged (standard deviation given after ±). Selected results in Table 8.

Valid Test
Model Encoded R@1 R@10 R@20 R@1 R@10 R@20
DPR Basic 19.7 ± 1.52 57.4 ± 2.55 67.0 ± 5.01 12.5 ± 1.78 40.0 ± 1.08 48.5 ± 4.27
Basic + footnote 23.2 ± 3.8 55.4 ± 1.88 66.8 ± 1.04 13.9 ± 1.48 35.9 ± 0.82 49.9 ± 3.91
Basic + member 22.7 ± 4.35 57.9 ± 2.45 67.7 ± 3.48 12.8 ± 2.13 38.1 ± 2.17 47.0 ± 1.78
Title 23.7 ± 2.92 58.5 ± 2.24 69.8 ± 2.35 14.7 ± 2.28 35.7 ± 3.2 48.2 ± 2.84
Title + footnote 23.2 ± 2.74 54.7 ± 3.59 64.7 ± 2.28 14.4 ± 1.48 36.6 ± 2.69 45.4 ± 2.13
Title + member 24.2 ± 1.52 55.4 ± 4.51 65.3 ± 2.01 14.4 ± 2.28 39.5 ± 0.82 48.0 ± 2.05
BM25 Basic 0.0 0.0 0.0 0.0 0.0 0.0
Basic + footnote 0.0 0.0 0.0 0.0 0.0 0.0
Basic + member 0.5 1.0 1.0 0.0 0.7 0.7
Full 0.0 0.0 0.5 0.0 0.0 0.0
Title 0.0 0.5 0.5 0.0 0.0 0.7
Title + footnote 0.0 0.0 0.0 0.0 0.0 0.0
Title + member 0.0 0.5 1.0 0.0 0.7 0.7

Table 17: Full retrieval results for the French splits. The values reported are in recall % at k ∈ {1, 10, 20}. DPR
was run 3 times and averaged (standard deviation given after ±).

2821
Obtaining meta-information of table
U: Hi I have a question on the date of new census
A: Hello my name is Graham, how may I help you?
U: Hi Graham
U: I am currently looking at : Census of Canada and National Household Survey
U: I see that the last census were made in 2011 and 2016
U: should I expect a new census in the coming month in 2020 ?
A: no - the next Census is going to be taking place in May 2021
A: it’s every 5 years
U: Ok perfect thank you
A: you’re welcome! have a good day
Help with user interface
U: Hello Graham - is there way I can see total annual values instead of monthly?
U: for - https://www150.statcan.gc.ca/t1/tbl1/en/cv!recreate.action?...
A: please hold - I will see if a table like this with an annual frequency is available
U: wonderful thank you
A: unfortunately no, the only active table on construction investment is the monthly one I sent you - there is an inactive
quarterly table Investment in Building Construction (https://www150.statcan.gc.ca/n1/en/surveys/5014)
A: it is also possible to download the monthly data in .CSV form, add together the months in a spreadsheet to get
annual totals
A: or if you’d prefer, you can make a custom request to our analysts (a fee may apply).Please send us your request,
clearly indicating the variables, geographic areas, time frame and data frequency (i.e., annually, monthly or quarterly)
needed at infostats@canada.ca (mailto:infostats@canada.ca) and refer to Case number: 986005 so we can access the
transcript of this live chat. Your request must include your name, address, postal code, and phone number (please do
not share your personal information on the live chat.)An agent will get in touch with you to provide an estimate and
proceed with the payment, if applicable.You may also submit your request by telephone at 1-800-263-1136.
U: thats fine! thank you very much :)
U: very helpful
U: :)!
U: have a good day!
A: glad I could help! you too
Help with user interface
U: Hello, Do you know where I can download the dataset for General Social Survey Cycle 32?
A: Please wait while I try to find this information for you.
U: Thank you
U: (I’m referring to this one: https://www23.statcan.gc.ca/imdb/p2SV.pl?... )
A: General Social Survey: Caregiving and Care Receiving, Public Use Microdata File (https://www150.statcan.gc.ca/...)
U: I’m sorry, but im having trouble finding the download link for cycle 32 (2018)
A: you have to click on ’More information’ and that will bring you to the order form
U: ohhhh, I see
U: thank you for letting me know
U: thank you for your help!
A: you’re welcome - happy to help!
Out of domain requests
U: I’m wondering how to obtain a copy of an autopsy that’s been done for a family member. Doctor office says they
don’t get the autopsy report.
A: Statistics Canada does not issue certificates for birth, death or marriage. Civil registration is a provincial jurisdiction.
Therefore, you must contact the government ministry, registry or vital statistics office of the province or territory
in which the birth, death or marriage occurred. To obtain the contact information for your provincial or territorial
government, please call Service Canada at 1-800-O-Canada (1-800-622-6232) or visit the following page: Vital
Statistics contacts (https://www.statcan.gc.ca/eng/health...) .
A: also: Coroner’s report (https://spvm.qc.ca/en/...)

Table 18: Examples of user intents described in Section A.1.

2822
U: Hi, I’m looking to obtain quarterly data in regards to GDP grow (Canada), BC Housing STarts, Canada Oil Price/BBL
A: Hello, my name is Kelly C. Give me one moment as I search

U: thanks!
A: For GDP growth rates, please consult the following link: Gross domestic product (GDP) at basic prices, by industry,
monthly, growth rates
A: I’ll continue searching for the rest. One moment
A: What do you mean by BC Housing Starts?

U: I’m required to research all of the housing starts for BC on a quarterly basis
A: Define "starts".

U: unit basis
U: Housing starts are the number of new residential construction projects that have begun during any particular month
A: Perfect. Give me one moment.
A: I’m still searching. Sorry about the wait.

U: no worries
A: I would have monthly data regarding new building permits being issued.

U: I’ll take a look at that

A: Building permits, by type of structure and type of work (https://www150.statcan.gc.ca/t1/...)
A: Do play with the variables to get what you are looking for.

U: thank you
A: To view/ manipulate the variables available in a data table: • Click on "Add/Remove data" • Select at least one variable in
each tab and click on “Apply” in order to view your customized table To download the data: • Click on “Download” •
Select one of the three output formats then click on the hyperlink and save the table The following video may also be
helpful to you: How to use the data tables (https://www.statcan.gc.ca/eng/sc/video/howto) .
A: I’ll have a look for oil prices. One moment.

U: Do you also have data to Canada’as oil Price/BBL ("WTI")?

U: okay thanks!
A: Are you looking for the retail prices of oil?
A: If so, I found some data for smaller geographies.
A: Monthly average retail prices for gasoline and fuel oil, by geography (https://www150.statcan.gc.ca/t1/...)

U: yes
A: Would those geographies be enough?
A: Or are you looking for Canada only?

U: hmm
U: I would need something that pertains more to all of canada
A: What about this? Monthly average retail prices for food and other selected products (https://www150.statcan.gc.ca/t1/...)
A: Is there anything else I can do for you?

U: I thinks that’s all

U: I
U: i’ll try and use this thanks!
A: No problem. Have a good one!

Table 19: Full Example (Sample #42) taken from the conversations dataset. URLs were updated to link to default
view (rather than filtered). See Table 1 for truncated conversation.

2823
Title: Production and value of maple products
Date range: 1924-01-01 to 2020-01-01
Dimensions: Geography, Maple products
Subject: Agriculture
Survey: Maple Products
Frequency: Annual
Geography:
ID: 1, Parent: None, Name: Canada
ID: 2, Parent: 1, Name: Nova Scotia
ID: 3, Parent: 1, Name: New Brunswick
ID: 4, Parent: 1, Name: Quebec
ID: 5, Parent: 1, Name: Ontario
Maple products:
ID: 9, Parent: None, Name: Maple products expressed as syrup, total
ID: 10, Parent: None, Name: Gross value of maple products
ID: 1, Parent: None, Name: Production of maple sugar
ID: 2, Parent: None, Name: Value of maple sugar
ID: 3, Parent: None, Name: Production of maple syrup
ID: 4, Parent: None, Name: Value of maple syrup
ID: 5, Parent: None, Name: Production of maple taffy
ID: 6, Parent: None, Name: Value of maple taffy
ID: 7, Parent: None, Name: Production of maple butter
ID: 8, Parent: None, Name: Value of maple butter
ID: 0, Note: Conversion factors: 1 gallon of syrup equals 10.0 pounds of maple sugar. One gallon of syrup weighs
13.24760 pounds. One gallon of syrup equals 10.4 pounds of taffy. Maple taffy is reported by Quebec and Nova
Scotia only and commenced reporting in 1965 and 1983 respectively. The conversion of maple taffy to syrup varies
with the density of syrup that year.
ID: 0, Note: Commercial production and value figures exclude inventory.
ID: 2, Note: Estimates produced by Nova Scotia horticulture industry specialists.
ID: 3, Note: Beginning in 1986, survey data for New Brunswick and Ontario are weighted using the number of taps
made on maple trees in the spring of the year of the most recently completed census of agriculture.
ID: 4, Note: Estimates produced by l’Institut de la statistique du Québec, Groupe AGÉCO and Fédération des
producteurs acéricoles du Québec.
ID: 5, Note: Beginning in 1986, survey data for New Brunswick and Ontario are weighted using the number of taps
made on maple trees in the spring of the year of the most recently completed census of agriculture.
ID: 9, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136
Maple syrup and other maple products. Data collected for maple butter, maple taffy and maple sugar are converted
into a maple syrup equivalent and are included in this category.”
ID: 10, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136 – Maple syrup and other maple products. Data collected for maple butter, maple taffy and
maple sugar are converted into a maple syrup equivalent and are included in this category.”
ID: 0, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136 – Maple syrup and other maple products. Data collected for maple butter, maple taffy and
maple sugar are converted into a maple syrup equivalent and are included in this category.”

Table 20: Full metadata of table in Section 3.3. Single lines were added to delimit the scope: Title, basic information,
item names, and footnotes. The double line was added to delimit the truncation limit of the DPR and TAPAS model
(512 tokens). The basic information is presented in Table 5.

2824
PID 18100004 (count = 170) U: hi, i was wondering if you vae any statistics on video
Title Consumer Price Index, monthly, not seasonally game sales, ot high school drop out rates? infoSeek-
adjusted ingQuestions; request;
Dates 1914-01-01 to 2021-08-01 A: Hello my name is Sylvain, how may I help you?
Dim. Geography, Products and product groups offer;
Subject Prices and price indexes U: hello, I am looking for high school drop out r ates and
Survey Consumer Price Index teen depression rates, but cant seem to find anything
Freq. Monthly that goes back more than a year request;
A: Data for High School dropouts is compiled by the
PID 11100222 (count = 125)
Provincial Education Ministry. To obtain this in-
Title Household spending, Canada, regions and
for please contact the Education Ministry of your
provinces
province inform; promise;
Dates 2010-01-01 to 2019-01-01
A: Data for mood disorder (depression) is available on
Dim. Geography, Statistic, Household expenditures,
our website from the Data Table 13- 10-0096-18
summary-level categories
(https://www150.statcan.gc.ca/...) answer;
Subject Income, pensions, spending and wealth
A: To view/ manipulate the variables available in a data
Survey Survey of Household Spending
table: • Click on "Add/Remove data" • Select at
Freq. Annual
least one variable in each tab and click on “Apply”
PID 17100005 (count = 123) in order to view your customized table To down-
Title Population estimates on July 1st, by age and load the data: • Click on “Download” • Select one
sex of the three output formats then click on the hyper-
Dates 1971-01-01 to 2021-01-01 link and save the table The following video may
Dim. Geography, Sex, Age group also be helpful to you: How to use the data tables
Subject Population and demography (https://www.statcan.gc.ca/eng/sc/video/...). inform;
Survey Annual Demographic Estimates Canada, request;
Provinces and Territories
U: Hi team, can you please help me retrieve data of pub-
Freq. Annual
lic deposits of chartered banks? infoSeekingQues-
PID 33100214 (count = 117) tions; request;
Title Canadian Business Counts, with employees, A: Please hold timeManagement;
June 2019 U: Surely. autoFeedback;
Dates 2019-01-01 to 2019-01-01 A: You can obtain this info on our web-
Dim. Geography, Employment size, North Ameri- site from our Data Table 10-10-0116-01
can Industry Classification System (NAICS) (https://www150.statcan.gc.ca/t1/...) answer;
Subject Business performance and ownership A: To view/ manipulate the variables available in a data
Survey Business Register table: • Click on "Add/Remove data" • Select at
Freq. Semi-annual least one variable in each tab and click on “Apply"
in order to view your customized table To down-
PID 35100177 (count = 104) load the data: • Click on “Download" • Select one
Title Incident-based crime statistics, by detailed vio- of the three output formats then click on the hyper-
lations, Canada, provinces, territories and Cen- link and save the table The following video may
sus Metropolitan Areas also be helpful to you: How to use the data tables
Dates 1998-01-01 to 2020-01-01 (https://www.statcan.gc.ca/eng/sc/video/...) . inform;
Dim. Geography, Violations, Statistics instruct;
Subject Crime and justice A: Please take note that this is a non-Statistics Canada
Survey Uniform Crime Reporting Survey Table. This info is provide from The Bank of Canada
Freq. Annual inform;
PID 18100005 (count = 91) U: Thanks a lot! So, how should I get down to Public
Title Consumer Price Index, annual average, not deposits? or rather which of these lines would direct
seasonally adjusted me to Public deposits? clarificationQuestion;
Dates 1914-01-01 to 2020-01-01 A: Under the category Chartered bank deposits. For
Dim. Geography, Products and product groups more info please contact The Bank of Canada on
Subject Prices and price indexes their website by using their Online Request Form
Survey Consumer Price Index (https://www.bankofcanada.ca/...) answer; inform;
Freq. Annual
Table 22: Samples of the dialogue analysis in Sec-
Table 21: Most frequently retrieved tables across all tion 3.2, which are summarized in Table 15. The speech
splits in the retrieval task. The basic information is acts are in blue at the end of each message. The re-
provided for each table. Analysis presented in Sec- maining annotations are in the supplementary materials.
tion A.3.1. Sample examples are given in Table 4.

2825
U: Thank you for your information, but I have done that already. Your previous website had a "manipu-
late" function to calculate the percentage change. I don’t see the button anymore, and wondered if you
have other schedule prepared as that function is gone.
A: Hello, my name is Jimmy N., how may I help you?
T5: Hello, my name is Jimmy N., how may I help you?
T5+D: Hello, my name is Jimmy N., how may I help you?
U: I was talking with Sylvain, will you be able to see the log?
A: Yes, please hold while I verify.
T5: Yes, please hold while I verify.
T5+D: Yes, please hold while I verify.
A: Just to clarify, you’re looking for all-items ?
T5: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index
by product group, monthly, percentage change, not seasonally adjusted, Canada, provinces, Whitehorse,
Yellowknife and Iqaluit ( )
T5+D: You can obtain this information at the following hyperlink : Table: 18-10-0004-01 Consumer Price Index by
product group, monthly, [...] and Iqaluit ( )

Table 23: Conversation #4890. Each message is separated by a horizontal line. The original conversation, in
bold, only contains messages by the user (U) and agent (A). The generated responses by T5 and T5 with top-5
DPR-retrieved titles (T5+D) predict the corresponding agent’s message (in bold).

[...]
U: Ideally, I want the percentage change for these items
R1 : Consumer Price Index, monthly, not seasonally adjusted
R2 : Monthly adjusted price index, provisional
R3 : Consumer Price Index, monthly, seasonally adjusted
R4 : Historical (real-time) releases of the industrial product price index, by major product group, monthly, inactive
R5 : Consumer Price Index (CPI) statistics, alternative measures, unadjusted and seasonally adjusted, Bank of
Canada
A: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price
Index by product group, monthly, percentage change, not seasonally adjusted, Canada, provinces,
Whitehorse, Yellowknife and Iqaluit (https://www150.statcan.gc.ca/t1/tbl1/...)
T5: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index [...]
and Iqaluit ( )
T5+D: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index [...]
and Iqaluit ( )

Table 24: The second part of conversation #4890, immediately after Table 23. Ri indicates the i-th table retrieved
by DPR, which is only seen by T5+D.

U: This is pretty cool!

A: Hello, my name is Martine, how may I help you?
U: I have a question - how would I be able to access data for 2019 on the amount of money the federal government
spent (money out) to Canadians through social programs like EI / OAS etc...
A: Letme check..
U: Thank you
Ri : Federal extramural expenditures on science and technology, type of science, performing sector, type of
payment and geography
Ri : Historical: Current transfers to other levels of government, 1968 System of National Accounts (SNA), annual,
1926 - 1986
Ri : Private non-profit organization outsourced research and development expenditures
Ri : Capital expenditures, sources of funding, by geography
Ri : Historical: Government transfer payments to persons, 1968 System of National Accounts (SNA), annual,
1926 - 1986
A: Revenue, expenditure and budgetary balance - General governments (x 1,000,000)
(https://www150.statcan.gc.ca/t1/...3610047701)
T5: Data Tables (https://www150.statcan.gc.ca/n1/en/type/data?text=federal+debt)
T5+D: Federal extramural expenditures on science and technology, type of science, performing sector, type of
payment and geography ( )

Table 25: Conversation #8960.

2826
U: How can I find information relating to Kelowna population estimates, demography, population projection
estimates, population by age and sex
A: hello how may i help you today?
U: How can I find information relating to Kelowna population estimates, demography, population projection
estimates, population by age and sex
A: let me see what i can find on the subject for the projections information our tables only provide it to the
provincial level it does not go lowe then that <extra_id_0> ( <extra_id_0> )
R1 : Estimates and projections of population aged 0 to 29, by age group, Canada, provinces and territories
R2 : Projected population, by projection scenario, age and sex, as of July 1
R3 : Population estimates on July 1st, by age and sex
R4 : Components of projected population growth, by projection scenario
R5 : Population estimates, quarterly
A: https://www150.statcan.gc.ca/t1/tbl1/en/cv.action?pid=1710005801
T5: https://www150.statcan.gc.ca/n1/daily-quotidien/190619/dq190619f-cansim-eng.htm
T5+D: https://www150.statcan.gc.ca/n1/pub/71-607-x/71-607-x2018005-eng.htm

Table 26: Conversation #21533.

U: Hi there, do you know approximately how many craft distillers are operating in Canada?
A: Hello my name is Sylvain, how may I help you?
U: I was wondering if you had any information about craft distillers in Canada. Specifically, how many craft distillers
are operating in Can
A: The industries are classified according to the North American Industry Classification System (NAICS). To find
the NAICS code for an indu refer to this page (http://www23.statcan.gc.ca/imdb/...1181553) . You may start your
search by selecting one of the 20 sectors, until a six-d obtained. [...] You will find the number of business locations
by NAICS code in the following data tables: Table 33-10-0105-01 Canadian Business Counts, with employees,
December 2018 (https://www150.statcan.gc.ca/t1/...3310010501) [...] I also suggest consulting the financial data of
small and medium enterprise in Canada; these organizations have annual earning revenues ranges of $30,000 to
$20 million. T information is available by industry, province, territory and legal status of companies (incorporated
or unincorporated). This data is not available on our website; however, i available on Innovation, Science and
Economic Development Canada’s website, using their Financial Performance Data (http://www.ic.gc.ca/...)[...]

Table 27: Conversation #1628

U: Hi. I want to know how many small, medium and big companies are there in canada. Thank you
A: Hello, my name is Olivier C. Please wait while i find the information you’re looking for.
R1 : Canadian Business Counts, without employees, December 2019
R2 : Canadian Business Counts, with employees, December 2019
R3 : Canadian Business Counts, without employees, June 2019
R4 : Canadian Business Counts, with employees, June 2019
R5 : Canadian Business Counts, without employees, December 2017
A: Thank you for waiting, you will find the information about the number of business per business size
if you click the following hyperlinks : Canadian Business Counts, with employees, December 2019
(https://www150.statcan.gc.ca/t1/...pid=3310022201) Canadian Business Counts, without employees,
December 2019 (https://www150.statcan.gc.ca/t1/...pid=3310022301)
T5: Thank you for waiting. You will find the information you’re looking for if you click the following hyperlink :
Canadian Business Counts, with employees, June 2019 ( )
T5+D: Thank you for waiting. The industries are classified according to the North American Industry
Classification System (NAICS). To find the NAICS code for an industry, please refer to this page
(http://www23.statcan.gc.ca/imdb/...1181553). You may start your search by selecting one of the 20 sectors,
until a six-digit NAICS is obtained. You will find the number of business locations by NAICS code in the
following data tables: Table 33-10-0214-01 Canadian Business Counts, with employees, June 2019 ( ) Table
33-10-0215-01 Canadian Business Counts, without employees, June 2019 ( ) I also suggest consulting the
financial data of small and medium enterprise in Canada; these organizations have annual earning revenues
ranges of $30,000 to $20 million. This information is available by industry, province, territory and legal
status of companies (incorporated or unincorporated). This data is not available on our website; however, it is
freely available on Innovation, Science and Economic Development Canada’s website, using their Financial
Performance Data (http://www.ic

Table 28: Conversation #8500.

2827
U: Do you have any data showing labour in manufacturing companies and how it is allocated between direct and
indirect labour?
A: Hello, my name is Jimmy N. Could you please specify what you mean by direct and indirect labour.
U: Direct = labour for building product Indirect = administrative, supervision, etc.
A: The data you are requesting is not available as a standard product. However, it may be available through a
custom order (a fee may apply). [...] Would you like to make this request?
A: As a standard product, we have tables about employment by industry.
U: I would just like to look at the standard product. Can you tell me where I can find it, or what it is called?
R1 : Labour force characteristics by industry, monthly, seasonally adjusted, last 5 months
R2 : Employment by industry, monthly, seasonally adjusted and unadjusted, and trend-cycle, last 5 months
R3 : Employment by industry, annual
R4 : Employment by industry, three-month moving average, unadjusted for seasonality, provinces and economic
regions, inactive
R5 : Usual hours worked by industry, annual
A: You can obtain this information at the following hyperlinks : Table: 14-10-0355-02 Employment by
industry, monthly, seasonally adjusted (x 1,000)Table: 14-10-0023-01 Labour force characteristics
by industry, annual (x 1,000)Employed labour force who worked full year, full time and reported
employment income in 2015, 2016 Census (https://www150.statcan.gc.ca/t1/...1410035502)
T5: Table: 14-10-0296-01 Labour force characteristics by industry, annual (x 1,000) ( )
T5+D: Please hold while I find the information.

Table 29: Conversation #19568.

U: I am on webpage https://www150.statcan.gc.ca/n1/daily-quotidien/201224/dq201224beng.htm where can I

find the same data set for previous years, for example 2019 2018 2017 etc
A: Hello, can you tell me on which table you are : https://www150.statcan.gc.ca/n1/daily-
quotidien/201224/dq201224b-cansim-eng.htm
U: Provisional death counts and excess mortality, January to October 2020 Adjusted number of deaths, expected
number of deaths and estimates of excess mortality, by week
A: That is the report, but there is 5 different tables: https://www150.statcan.gc.ca/n1/daily-
quotidien/201224/dq201224b-cansim-eng.htm Let me verify this for you. One moment please. To
view/ manipulate the variables available in a data table: • Click on "Add/Remove data" • [...] Select at least
one variable in each tab and click on “Apply” in order to view your customized table Data are available for
2016 up to 2020.
U: The reference period stops at January 2020 I can’t seem to get data for past years
A: Click on the schedule icon and select the year you want. (2016 - 2020)
U: I did and all years are ’greyed out’ meaning they do not populate
A: Let me verify this. One moment please.
U: screenshot.lzsc
R1 : Weekly death counts, by age group and sex
R2 : Previous releases and revisions to weekly death counts
R3 : Adjusted number of deaths, expected number of deaths and estimates of excess mortality, by week, age group
and sex
R4 : Death counts, by week, inactive
R5 : Labour force characteristics by sex and detailed age group, monthly, unadjusted for seasonality
A: Have you already consulter the table that replaces table# 13-10-0785-01 [...]
https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310081001 - On this table you can go
back to 2016.
T5: You can manipulate the table by following these steps: • Click on "Add/Remove data" • Click on the “ + ”
icon next to a checkbox to expand the list of variables (if applicable) • Select at least one variable in each tab
and click on “Apply” in order to view your customized table To download the data: • Click on “Download” •
Select one of the three output formats then click on the hyperlink and save the table The following video may
also be helpful to you: How to use the data tables (https://www.statcan.gc.ca/eng/sc/video/howto).
T5+D: Please consult the following data table(s): Weekly death counts, by age group and sex ( )

Table 30: Conversation #15376.

2828
U: Hello
A: Hello, my name is Jimmy N., how may I help you?
U: [...] We saw this article from the Vancouver Sun which mentioned StatsCan data on business closures. [...] is
there any way for us to collect localized data for the Township, or at least the Fraser Valley
A: Unfortunately, the information is available on a national and provincial/territorial level : And for some Census
metropolitan areas
Ri : Experimental estimates for business openings and closures for Canada, provinces and territories, census
metropolitan areas, seasonally adjusted
Ri : Quarterly estimates of business entry and exit
Ri : Of businesses where at least one employee was laid off, percentage of workforce laid off and rehired due to
COVID-19, by business characteristics
Ri : Canadian Business Counts, with employees, census metropolitan areas and census subdivisions, December
2020
Ri : Manufacturing sales, by industry for 12 Selected Census Metropolitan Areas
A: Table: 33-10-0270-01 - Experimental estimates for business openings and closures
for Canada, provinces and territories, census metropolitan areas, seasonally adjusted
(https://www150.statcan.gc.ca/t1/...3310027001)
T5: However, it may be available through a custom order (a fee may apply). To obtain a free cost estimate
for this service, please send us your request, clearly indicating the variables, geographic areas, time frame
and data frequency needed, making sure to include your complete contact information (name of orga-
nization, address, postal code, and phone number) at the following email address: infostats@canada.ca
(mailto:infostats@canada.ca) or by telephone at 1-800-263-1136. Our agents are available Monday to Friday
(except holidays) from 7:30 am to 7:30 pm (Eastern Time). An agent will get in touch with you to verify the
information, provide an estimate and proceed with the payment, if applicable.
T5+D: However, it may be available through a custom order (a fee may apply). To obtain a free [...] An agent will get
in touch with you to verify the information, provide an estimate and proceed with the payment, if applicable.
Please do not share any personal information through the live chat. Would you like to make this request?

Table 31: Conversation #11869.

PID Title Train Valid Test Total

13100810 Selected grouped causes of death, by week 0 0 6 6
33100270 Experimental estimates for business openings and closures for Canada, 0 1 3 4
provinces and territories, census metropolitan areas, seasonally adjusted
13100763 Health characteristics of children and youth aged 1 to 17 years, Canadian 0 0 4 4
Health Survey on Children and Youth 2019
35100060 Number of homicide victims and persons accused of homicide, by Indigenous 0 1 2 3
identity, age group and gender
32100042 Estimated areas, yield, production of corn for grain and soybeans, using 0 1 2 3
genetically modified seed, Quebec and Ontario, in metric and imperial units
14100084 Labour force characteristics of immigrants by sex and age group, three-month 0 0 3 3
moving average, unadjusted for seasonality
13100143 Deaths, by cause, Chapter V: Mental and behavioural disorders (F00 to F99) 0 0 3 3

Table 32: List of tables that do not appear in the training set, but appear at least 3 times in the validation or test sets.

Language Split Augmentation METEOR ROUGE-L BERTScore MoverScore Title Acc.

English Test None 23.35 30.65 86.04 59.82 6.96
Top-1 Title 24.07 30.76 86.11 60.23 7.99
Top-5 Titles 24.41 30.88 86.17 60.31 10.82
Valid None 31.95 43.31 88.31 65.50 30.39
Top-1 Title 33.11 44.22 88.60 66.14 33.66
Top-5 Titles 33.38 44.20 88.55 65.94 38.89
French Test None 5.98 7.21 61.74 51.88 0.00
Top-1 Title 9.71 11.14 65.86 54.81 0.00
Top-5 Titles 8.96 10.75 64.76 54.38 0.00
Valid None 6.84 8.85 60.97 52.21 0.00
Top-1 Title 8.09 9.03 64.22 53.62 0.00
Top-5 Titles 7.37 8.75 62.55 52.97 0.00

Table 33: Full response generation results. Selected results in Table 10.

2829

Navigating Data Abundance Generative Conversational AI Agents in Information Analysis
No ratings yet
Navigating Data Abundance Generative Conversational AI Agents in Information Analysis
6 pages
Bayesian
No ratings yet
Bayesian
1 page
Research Onion File
100% (3)
Research Onion File
10 pages
Chatbot Tabular Data Cui
No ratings yet
Chatbot Tabular Data Cui
6 pages
2e. Edge Detection
No ratings yet
2e. Edge Detection
39 pages
(Original PDF) M: Information Systems 4th Edition by Paige Baltzan Ebook All Chapters PDF
100% (1)
(Original PDF) M: Information Systems 4th Edition by Paige Baltzan Ebook All Chapters PDF
30 pages
Lecture 10
No ratings yet
Lecture 10
10 pages
The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations
From Everand
The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations
Gene Kim
4/5 (53)
2023 Ijcnlp-Main 65
No ratings yet
2023 Ijcnlp-Main 65
14 pages
1 s2.0 S0957417422001452 Main
No ratings yet
1 s2.0 S0957417422001452 Main
41 pages
549 - 2023150032 A Brief Survey On Contextual Question Answering Systems IEEE
No ratings yet
549 - 2023150032 A Brief Survey On Contextual Question Answering Systems IEEE
6 pages
Semi-Automatic Domain Ontology Creation From Text Resources
No ratings yet
Semi-Automatic Domain Ontology Creation From Text Resources
8 pages
Building A Secure Geospatial Semantic Web Long
No ratings yet
Building A Secure Geospatial Semantic Web Long
5 pages
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Lecture 7 - 1-Spark - Streaming
No ratings yet
Lecture 7 - 1-Spark - Streaming
25 pages
SAQA 114050 Learner Guide
No ratings yet
SAQA 114050 Learner Guide
31 pages
Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services
From Everand
Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services
Trâm Ngọc Phạm
No ratings yet
HEALTH INsurnc
No ratings yet
HEALTH INsurnc
5 pages
Intelligent Question Answering System
No ratings yet
Intelligent Question Answering System
50 pages
Alesis Data Disk PDF
No ratings yet
Alesis Data Disk PDF
70 pages
Res 510 Module 2 Updated
No ratings yet
Res 510 Module 2 Updated
27 pages
Bert-Coqac: Bert-Based Conversational Question Answering in Context
No ratings yet
Bert-Coqac: Bert-Based Conversational Question Answering in Context
11 pages
BC0058 SLM Unit 02
No ratings yet
BC0058 SLM Unit 02
13 pages
Conditional Formatting Visible and Intuitive
No ratings yet
Conditional Formatting Visible and Intuitive
2 pages
Mastering Data Engineering and Analytics with Databricks
From Everand
Mastering Data Engineering and Analytics with Databricks
Manoj Kumar
No ratings yet
Castor Reference Guide 1.3.1
No ratings yet
Castor Reference Guide 1.3.1
143 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Azure Data Engineer Associate Certification Guide: Ace the DP-203 exam with advanced data engineering skills
From Everand
Azure Data Engineer Associate Certification Guide: Ace the DP-203 exam with advanced data engineering skills
Giacinto Palmieri
No ratings yet
Ict 2
No ratings yet
Ict 2
2 pages
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Secondary Storage Devices
No ratings yet
Secondary Storage Devices
36 pages
Social Oriented Quality: From Quality 4.0 Towards Quality 5.0
No ratings yet
Social Oriented Quality: From Quality 4.0 Towards Quality 5.0
8 pages
Applying The Game Simon Says To Improve Students Listening Comprehension Class Vii SMP Negeri 18 Lau Maros
No ratings yet
Applying The Game Simon Says To Improve Students Listening Comprehension Class Vii SMP Negeri 18 Lau Maros
15 pages
Mastering Microsoft Dynamics 365 Business Central: The complete guide for designing and integrating advanced Business Central solutions
From Everand
Mastering Microsoft Dynamics 365 Business Central: The complete guide for designing and integrating advanced Business Central solutions
Stefano Demiliani
No ratings yet
Choosing A GIS: T Bernhardsen
No ratings yet
Choosing A GIS: T Bernhardsen
12 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
SRF Push
No ratings yet
SRF Push
6 pages
Prakt2 4311901028
No ratings yet
Prakt2 4311901028
6 pages
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Microsoft Big Data Solutions
From Everand
Microsoft Big Data Solutions
Adam Jorgensen
No ratings yet
Daniel Gebriel
No ratings yet
Daniel Gebriel
112 pages
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
Entity SQL Injection Attacks
No ratings yet
Entity SQL Injection Attacks
2 pages
Database Development: By: Miss Ruzanna Binti Abu Bakar
No ratings yet
Database Development: By: Miss Ruzanna Binti Abu Bakar
22 pages
Data Fluency: Empowering Your Organization with Effective Data Communication
From Everand
Data Fluency: Empowering Your Organization with Effective Data Communication
Zach Gemignani
3/5 (7)
Impact of Artificial Intelligence On Marketing: Corresponding Author: Mahabub Basha
No ratings yet
Impact of Artificial Intelligence On Marketing: Corresponding Author: Mahabub Basha
12 pages
Rise of the Data Cloud
From Everand
Rise of the Data Cloud
Frank Slootman
5/5 (1)
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
DBMS Unit 1 Notes
No ratings yet
DBMS Unit 1 Notes
36 pages
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
The Cloud-Based Demand-Driven Supply Chain
From Everand
The Cloud-Based Demand-Driven Supply Chain
Vinit Sharma
No ratings yet
Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework
From Everand
Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework
Mik Kersten
No ratings yet
What Is A Trigger
No ratings yet
What Is A Trigger
5 pages
The Turning Point: A Novel about Agile Architects Building a Digital Foundation: The Open Group Series
From Everand
The Turning Point: A Novel about Agile Architects Building a Digital Foundation: The Open Group Series
Kees van den Brink
No ratings yet
IGNOU MCA Data Warehousing and Data Mining Previous Years Unsolved Papers MCS 221
From Everand
IGNOU MCA Data Warehousing and Data Mining Previous Years Unsolved Papers MCS 221
Manish Soni
No ratings yet
Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI
From Everand
Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI
Mariano F. Scandizzo CFA CQF
No ratings yet
V2 Progression Pathways by CS IT and DL-with-CT-refs PDF
No ratings yet
V2 Progression Pathways by CS IT and DL-with-CT-refs PDF
1 page
RP - Dissertation - Assessment - Brief - 2021
No ratings yet
RP - Dissertation - Assessment - Brief - 2021
21 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Building AI Applications with Microsoft Semantic Kernel: Easily integrate generative AI capabilities and copilot experiences into your applications
From Everand
Building AI Applications with Microsoft Semantic Kernel: Easily integrate generative AI capabilities and copilot experiences into your applications
Lucas A. Meyer
No ratings yet
Google Visualization API Essentials
From Everand
Google Visualization API Essentials
Traci L. Ruthkoski
3/5 (1)
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Technical Writing for Business People
From Everand
Technical Writing for Business People
Carrie Marshall
No ratings yet
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
From Everand
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Joshua N. Milligan
No ratings yet
Learning Qlik Sense®: The Official Guide - Second Edition
From Everand
Learning Qlik Sense®: The Official Guide - Second Edition
Ilacqua Christopher
3.5/5 (2)
Extending Dynamics 365 Finance and Operations Apps with Power Platform: Integrate Power Platform solutions to maximize the efficiency of your Finance & Operations projects
From Everand
Extending Dynamics 365 Finance and Operations Apps with Power Platform: Integrate Power Platform solutions to maximize the efficiency of your Finance & Operations projects
Adrià Ariste Santacreu
No ratings yet
What Is A Database
No ratings yet
What Is A Database
3 pages
IGNOU PGDCA MCS 207 Database Management Systems Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 207 Database Management Systems Previous Years Unsolved Papers
Manish Soni
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Microsoft Azure: Enterprise Application Development
From Everand
Microsoft Azure: Enterprise Application Development
Richard J. Dudley
1/5 (1)
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
QlikView Essentials
From Everand
QlikView Essentials
Sinha Chandraish
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Learning .NET High-performance Programming
From Everand
Learning .NET High-performance Programming
Antonio Esposito
No ratings yet
Senior High School Department: Practical Research 1
No ratings yet
Senior High School Department: Practical Research 1
7 pages
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
From Everand
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
Dmitry Anoshin
No ratings yet
Learning Azure DocumentDB
From Everand
Learning Azure DocumentDB
Becker Riccardo
No ratings yet
Learning Qlik® Sense: The Official Guide
From Everand
Learning Qlik® Sense: The Official Guide
Christopher Ilacqua
No ratings yet
When We Use Extension Ledger
No ratings yet
When We Use Extension Ledger
1 page
IT Architect Series: Foundation In the Art of Infrastructure Design: A Practical Guide for IT Architects
From Everand
IT Architect Series: Foundation In the Art of Infrastructure Design: A Practical Guide for IT Architects
John Yani Arrasjid, VCDX-001
No ratings yet
IGNOU PGDCA All in One Previous Years Unsolved Papers
From Everand
IGNOU PGDCA All in One Previous Years Unsolved Papers
Manish Soni
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
From Everand
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Dr Mehmet Yildiz
4.5/5 (2)
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
Low-Code/No-Code: Citizen Developers and the Surprising Future of Business Applications
From Everand
Low-Code/No-Code: Citizen Developers and the Surprising Future of Business Applications
Phil Simon
2.5/5 (2)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
From Everand
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Matt Allington
5/5 (5)
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
From Everand
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
Mark Polino
No ratings yet
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
From Everand
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
Georgio Daccache
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Retrieving Data Tables Through Conversations

Uploaded by

Retrieving Data Tables Through Conversations

Uploaded by

The StatCan Dialogue Dataset: Retrieving Data Tables

through Conversations with Genuine Intents

Abstract U1 : Hi, I’m looking to obtain quarterly data in regards

1 Introduction Table 1: An example of the StatCan Dialogue Dataset

Appearance Unseen TAPAS

Frequent DPR B+M

Environment impact We acknowledge the mod-

Tables unseen during training Among the ta-

A.7 Statistics Tests

1. Claim: Adding basic information and member

2. Claim: For DPR, using member item result in

• footnote_info: The footnotes enumer- B.1.3 Data Split

• x_and_y: An combination of two • Validation: 799 (en); 201 (fr)

Conversation JSON formatting A conversation • Task-oriented dialogue: We wanted tasks with

Figure 5: The release year of all tables available on statcan.gc.ca

Subject Other Income, pensions, spending and... Health Labour

split train valid test

Income, pensions, spending and wealth

Business performance and ownership

Population and demography

Prices and price indexes

Crime and justice

Retail and wholesale

Education, training and learning

Information and communications technology

Travel and tourism

Business and consumer services and culture

Science and technology

Society and community

Immigration and ethnocultural diversity

Families and households

Children and youth

(d) Overlaps (French)

Table 11: Number of tables (a,c) differing and (b,d)

Table 14: Release of Statcan tables over the years

Table 18: Examples of user intents described in Section A.1.

U: I’ll take a look at that

U: Do you also have data to Canada’as oil Price/BBL ("WTI")?

U: I thinks that’s all

U: This is pretty cool!

Table 25: Conversation #8960.

Table 26: Conversation #21533.

Table 27: Conversation #1628

Table 28: Conversation #8500.

Table 29: Conversation #19568.

U: I am on webpage https://www150.statcan.gc.ca/n1/daily-quotidien/201224/dq201224beng.htm where can I

Table 30: Conversation #15376.

Table 31: Conversation #11869.

PID Title Train Valid Test Total

Language Split Augmentation METEOR ROUGE-L BERTScore MoverScore Title Acc.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.