Retrieving Data Tables Through Conversations
Retrieving Data Tables Through Conversations
Table 2: Comparison with related datasets (see Section 2). (*) CoQA uses rationales to support extracted answers.
datasets due to their commercial value and user pri- dialogue history. For both tasks, we investigate
vacy concerns. Hence, we argue that the research its difficulty by establishing strong baselines and
community would benefit from a task-oriented dia- evaluating them on various metrics.
logue environment where findings can be validated We stress that both tasks are immediately useful
with real users, and, to that effect, present a unique in a real-world setting. The table retrieval task can
dataset in collaboration with Statistics Canada. help agents find relevant tables faster while the re-
Statistics Canada (StatCan) is a national statis- sponse generation task may lead to a virtual agent
tics agency commissioned with collecting key in- that can return relevant tables through an online
formation on Canada’s economy, society, and envi- conversation. We hope that this tight connection
ronment. Statistics Canada conducts hundreds of with a real-world scenario will bring the research
surveys on virtually all aspects of Canadian life and community more insight into the challenges of de-
publishes the resulting data tables on statcan.gc.ca. veloping practical dialogue agents and lead to faster
This website currently features 5K+ of such com- transfer of research ideas and findings.
plex and often large data tables. Canadian citizens—
and other interested individuals—come to this web- 2 Related Work
site to find the statistics they are looking for. The This section presents various directions related to
StatCan website offers a chat functionality (avail- our work. See Table 2 for a comparative summary.
able in English and French) to help users in case
they can not find the appropriate information. Open-domain QA This is the task of answer-
ing questions using a large and diverse collection
Sourcing from these live chats, we present the
of text documents. One of the first large-scale
StatCan Dialogue Dataset, a collection of 20K+
evaluations in open-domain QA was presented
English and French conversations between visitors
at TREC-8 (Voorhees, 2001). Since then, many
of statcan.gc.ca and agents working at Statistics
studies have released large-scale open-domain QA
Canada. Before releasing this dataset, StatCan has
datasets: WikiQA (Yang et al., 2015) and MS
ran several procedures to remove Personally Identi-
MARCO (Bajaj et al., 2018) source questions from
fiable Information (PII). While we observe a wide
the Bing search engine, Natural Questions (NQ)
variety of user intents, ranging from table manip-
(Kwiatkowski et al., 2019) from Google search,
ulation to navigation instructions, a large number
and DuReader (He et al., 2018) source questions
of visitors use the chat functionality to find data
in Chinese from Baidu. The questions come from
tables on the StatCan website. Specifically, we ob-
real users and the answers are collected from the
serve 6.6K instances where agent returns a link to
search results through crowd workers. Although
a data table across 4.4K conversations. In Table 1,
those datasets have questions with genuine intent
we provide an example conversation in which an
and the answer must be retrieved from a collection
online user is looking for specific data tables.
of documents, our dataset emphasizes the retrieval
In this work, we develop two novel tasks cen-
of tables (in a conversational setting) rather than
tered on helping users find specific tables. First, we
free-form documents.
introduce the table retrieval task, which requires
a model to predict the table returned by the agent Table retrieval and QA Following works on tab-
given the messages sent so far. Second, we intro- ular pre-training (Yin et al., 2020), table-to-text
duce the response generation task, which requires generation (Parikh et al., 2020) and weak super-
a model to predict the agent’s response given the vision for semantic parsing (Herzig et al., 2020),
2800
Dataset Train Valid Test All
split train valid test
# Conv. 2573 545 557 3675
# Turns 11382 2339 2600 16321 8
# Messages 36147 7385 8340 51872 6
percent
# Queries 3782 799 870 5451
# Tokens / Msg 32.83 33.51 29.32 32.36 4
# Turns / Conv. 4.42 4.29 4.67 4.44
# Msg / Conv. 14.05 13.55 14.97 14.11 2
# Queries / Conv. 1.48 1.47 1.57 1.49 0
# Tables 778 349 388 959 20 40 60
# New tables 0 41 145 181 messages
# Dims / Table 3.5 3.5 3.6 3.6
# Mbrs / Table 185.5 210.8 175.6 172.1
# Notes / Table 21.1 22.7 23 20.4 Figure 1: Histogram of messages by conversation in the
both tasks (French split in Figure 7).
Table 3: Statistics of English conversations and tables
in the retrieval and generation tasks. New tables are specific goals, such as booking a restaurant or
calculated with respect to training set (see Table 11). resolving a customer issue. While early work
has collected a dataset in a genuine information
Chen et al. (2020) and Herzig et al. (2021) respec- seeking setup (Hemphill et al., 1990), many
tively propose OTT-QA and TAPAS-NQ, two novel recent datasets has collected them through a
approaches that extend open-domain QA to retriev- simulated setup (Budzianowski et al., 2018;
ing tables instead of documents. The former col- Rastogi et al., 2020; Feng et al., 2020, 2021;
lects both the questions and answers from crowd Chen et al., 2021; Lee et al., 2022). Task-
workers and the latter extends Natural Questions oriented models usually track the dialogue
by using tables from the article where the answer state by predicting dialogue acts that are spec-
was taken. In both cases, the tables being retrieved ified through intents and slot-value pairs, e.g.,
are sourced from Wikipedia articles. Although our findRestaurants(cuisine=Italian).
data also incorporate tabular retrieval, the tables While our dataset does not provide turn-based
are sourced from statcan.gc.ca, they can be signif- annotations, the released conversations come with
icantly larger (as discussed in Appendix A.2.1), an annotated goal i.e., which data table the user
and they are being retrieved in an interactive and was looking for. Like other goal-oriented dialogue
conversational setting. tasks, this annotation enables us to automatically
Conversational QA Several works extended evaluate the dialogue models through a task
question answering to the conversational setting. completion metric.
CoQA (Reddy et al., 2019) and QuAC (Choi et al., Chit-chat Dialogue The goal for chit-chat sys-
2018) introduced datasets in which multiple rounds tems is to engage in a open-ended conversation
of questions are asked about a reference passage with an end-user (Lowe et al., 2015; Dinan et al.,
taken from a document (such as a Wikipedia arti- 2018). Unlike our dataset, such conversations do
cle). Subsequent works extended this setup to an not intend to assist the user with a specific task.
open-domain setting where the reference passage
is not known beforehand (Qu et al., 2020; Anantha 3 Dataset
et al., 2021; Adlakha et al., 2022). Saeidi et al.
The StatCan Dialogue Dataset consist of conversa-
(2018) proposed a conversational QA task about
tions collected from the live chat between March 1,
regulatory texts. Aforementioned datasets are all
2019 till March 8, 2021. Although a variety of user
structured in the same way: at every turn, the first
intents can be found in the broader dataset of over
speaker will ask a question, and the other speaker
25K conversations, we focus on a single intent by
will give an answer. In contrast, the queries in our
selecting all conversations where the agent returns
conversations are not restricted to questions, and
a data table. We use this subset to develop and test
the answers can be either a table, metadata, or a
models for the two tasks that we introduce in Sec-
dialogue act.
tion 4. In Section 3.1, we provide basic statistics
Task-oriented Dialogue Our work is related about this subset of the data and present a dialogue
to work on task-oriented dialogue where users analysis for a small number of conversations in Sec-
converse with virtual agents to accomplish tion 3.2. In Section 3.3, we turn our attention to the
2801
Merged Acts (Example) % Title: Production and value of maple products
Answer (You can obtain on our...) 50 Date range: 1924-01-01 to 2020-01-01
Request (...please help me retrieve data...) 31 Dimensions: Geography, Maple products
Time Mgmt (Please hold ) 28 Subject: Agriculture
Inform (Please take note that...) 63 Survey: Maple Products
Info Seeking Ques. (Do you have any other...?) 25 Frequency: Annual
Promise (...please contact the Education Ministry...) 18
Auto Feedback (Sure) 25 Table 5: Basic information (including title) of table
Offer (...how may I help you?) 11
Instruct (Select at least one...) 18 in Section 3.3. Full version in Table 20. This can be
Clarif. Ques. (Which of these lines would direct...) 16 accessed at doi.org/10.25318/3210035401-eng.
Greeting (Hi) 28
Self Introduction (My name is...) 17
Thanking (Thanks a lot!) 47 per conversation, with over 30 tokens for each mes-
Accept Thanking (you’re welcome) 15
sage. This indicates that the speakers will express
Table 4: Frequency of merged speech acts occurring in multiple sequential thoughts before the addressees
100 turns in conversations from the validation set. respond. For the French split, we analyzed the
basic statistics in Appendix A.5.
data tables and explain what kind of information Frequently requested tables In total, 6 tables
is available for them. Finally, we explain how the make up 13.4% of tables retrieved, covering sub-
dataset is split into a train, validation, and test set in jects like inflation and household spending. Supple-
Section 3.4. For technical specifications, a dataset mentary details can be found in Appendix A.3.1.
card is provided in Appendix B.
3.2 Dialogue Analysis
User intents The live chat was designed to fulfill We categorize 100 turns (306 messages) from 24
specific user intents. The main intent of the chat conversations in the English validation set accord-
functionality is to help users find specific data ing to the speech acts defined by Bunt et al. (2010,
tables. For example, in Table 1, the agent helps 2020), which is also known as ISO standard 24617-
the user find tables about building permit, gaso- 2. We follow their taxonomy but merge some fine-
line price, and retail prices for food. Although, grained acts with their broader concepts (e.g., cor-
users might also be interested in obtaining meta- rection, agreement, disagreement with inform). We
information, receive help in manipulating a table present the speech act frequencies and examples in
or with the user interface. In some cases, the user Table 4. See the Appendix for more information on
will make out of domain requests. Those auxil- how we merged the original acts and supplementary
iary intents are described in Appendix A.1 since examples (Table 15 and Table 22, respectively).
the focus of this work is on the main intent. We notice that answers appear twice as frequent
as information seeking questions because an inter-
Messages and turns Each conversation is bro- locutor may provide an answer to both clarification
ken down in turns, which is a pair of user-agent questions and requests. Additionally, inform acts
responses. Each response can have multiple mes- appears 63% of the time because agents need to
sages sent sequentially (e.g., in Table 1, the first expand upon their answer and users tend to clarify
agent response contains 3 back-to-back messages). their initial requests by informing the other. Al-
though less frequent, auto feedback and time man-
3.1 Basic statistics agement are still relevant because interlocutors can-
In total, 25397 conversations will be made avail- not rely on visual feedback like nodding or typing.
able. Based on our main intent, we focus on a sub- Naturally, time management often co-occurs with
set of 4468 (3675 in English and 793 in French). promises because the agent tends to put the user on
Out of a total of 5907 tables available in both En- hold while promising to fulfill their request.
glish and French, the agents returned 959 unique
tables in English 285 in French. The number of 3.3 Table specifications
messages by conversation varies between 2 and To explain the specifications, we examine a sam-
78 with a median of 12 for the English split (see ple table with title Production and value of maple
Figure 1 for the distribution). Based on Table 3, products (shown in Table 5). The table has two
there’s on average 4.4 turns but 14.12 messages dimensions, which are groups of member items;
2802
Subject Agriculture 4 Tasks
Based on the conversational and tabular data, we
Honey Maple
Survey
Production...
...
Products propose two tasks: (i) a table retrieval task, which
requires a model to use a partial conversation to
Production and
Table ... value of maple predict the table an agent will return, and (ii) a re-
products
sponse generation task, which requires a model to
Dimension Geography
Maple use a partial conversation to generate the most prob-
products
able response by the agent. The conversations in
Member
Maple products Gross value of the tasks are available in both English and French.
Canada expressed as maple
items syrup products
4.1 Retrieval task
Level Ontario Quebec
...
For this task, we truncate every conversation right
before a link to a relevant table is shared by the
Figure 2: Diagram of the hierarchical relationship be- agent. As a result, the product ID (PID) corre-
tween metadata components, discussed in Section 3.3.
sponding to that link becomes the objective of the
retrieval task, as shown in Table 6. When the agent
for geography, the members are the provinces pro- shares multiple non-repeating PIDs within a con-
ducing maple syrup (Quebec, Ontario, etc.), and versation, each unique occurrence is treated as a
for Maple products the members are the produc- separate sample.
tion types (maple sugar, syrup, taffy and butter).
A member item generalizes the concept of rows Recall@k To evaluate models for retrieval, we
and columns as they are interchangeable via pivot- compute the recall at k (R@k) score for k ∈
ing. Sometimes, details about those members are {1, 10, 20}, which corresponds to the rate where
provided as footnotes at the end the page. the correct table is among the k tables retrieved by
the model (usually ranked by a relevance score).
Basic Information This is the core metadata and We choose k = 1 for real-time automatic retrieval
consists of the title, dimensions, subject, survey and k ∈ {10, 20} for scenarios where humans or
and update frequency (member items are excluded). automatic rerankers would like to use the retriever
Supplementary details are in Appendix A.2.2. to query tables and select the best option.
Hierarchical relation The metadata can be
4.2 Response generation task
viewed hierarchically. As shown in Figure 2, each
subject encompasses different surveys, each survey In the first task, only the messages leading to a
can be used to generate one or more tables, and table retrieval are considered. For this task, each
so on. A member item that can be nested under message sent by an agent is considered as a target
another member item is called Level. and everything before is the source. Thus, the goal
of this task is to use the source text to generate a re-
3.4 Dataset splits sponse that matches the target (see Table 7). Since
We group the conversations into a train (70%), a dialogue responses are challenging to evaluate, we
validation (15%) and a test (15%) set. The test set report a wide variety of metrics for this task.
was specifically selected to be the most recent con- ROUGE-L and METEOR ROUGE-L (Lin,
versations by date (covering Sept 8, 2020 to Mar 8, 2004; Lin and Och, 2004) is a common text eval-
2021), whereas the training and validation set were uation metric which naturally takes into account
randomly selected from the remaining data (cover- sentence level structure by identifying the longest
ing Mar 1, 2019 to Sept 8, 2020). This lets us test a overlapping word sequence between two sentences.
model’s capability to adapt to temporal shifts in the METEOR (Banerjee and Lavie, 2005) is a word-
data (such as new data releases and novel events). level precision and recall scoring method that en-
This is useful to understand a model’s capability to compasses different ways to represent a word, in-
generalize beyond the training distribution, but it is cluding stems and synonyms.
also a better reflection of real-world applications of
a model (which will be used for future data). The BERTScore and MoverScore Various methods
same splits are used for all tasks. were developed to leverage contextual embeddings
2803
Source text (on-going conversation) Source text (on-going conversation)
[...] U: Hi, I’m looking to obtain quarterly data in regards
A: What do you mean by BC Housing Starts? to GDP grow (Canada), BC Housing STarts, Canada
Oil Price/BBL
U: I’m required to research all of the housing starts for A: Hello, my name is Kelly C. Give me one moment as
BC on a quarterly basis[...] I search [...]
U: Housing starts are the number of new residential A: For GDP growth rates, please consult [...]
construction projects that have begun during any par- U: I’m required to research all of the housing starts for
ticular month BC on a quarterly basis [...]
A: [...] I would have monthly data regarding new build-
ing permits being issued. Generation target (next response by agent)
A: I would have monthly data regarding new building
Retrieval target (StatCan table) permits being issued. [...]
Table 34-10-0066 (Building permits, by type of struc-
ture and type of work) Table 7: Source and targets of the response generation
task, based on Table 1. Given the on-going conversation,
Table 6: Source and targets of the retrieval task, based the goal is to generate the agent’s response.
on Table 1. Given the on-going conversation, the goal
is to retrieve a StatCan table.
DPR Proposed by Karpukhin et al. (2020), Dense
Passage Retrieval (DPR) is a pair of transformer
from BERT (Devlin et al., 2019) to evaluate simi- models that separately encode a query and a pas-
larity between two sentences. BERTScore (Zhang sage, and the dot product of the resulting vectors
et al., 2019) computes the cosine similarity at the to- will have a higher score if the passage is relevant
ken level, whereas MoverScore (Zhao et al., 2019) to the query. We finetune this model to retrieve the
computes the earth mover distance (Rubner et al., metadata of a table (passage) given the on-going
2000) at the word or sentence level, thus capturing conversation (query).
the cost of transforming the distribution of the gen-
erated responses into the distribution of the original TAPAS and TAPAS-NQ Herzig et al. (2020) in-
responses. troduced a model that learned to encode flattened
tables cells in a self-supervised manner during pre-
Title accuracy In addition to the general met- training. We finetuned it to retrieve the truncated
rics for text generation, we also explored this task- content of a table given an on-going conversa-
specific metric. We define it as the proportion of tion. Subsequently, Herzig et al. (2021) finetuned
generated messages that contain the title of a table TAPAS to perform open-domain table retrieval on
shared in the reference messages. Consequently, 12K questions-answer-table triplets extracted from
this metric only includes turns where a table is NQ; we further finetune this variant in the same
shared by an agent. To compute this, we (i) find way and report the results as TAPAS-NQ.
the product ID in the reference message, (ii) look
up the title, (iii) check if that title appears exactly Exploring table representation In the simplest
in the generated and reference text. scenario, only the title is given to BM25 and DPR.
Moreover, we evaluate variants that encode the
5 Models basic information, member items, footnotes, or a
combination of them. For TAPAS and TAPAS-NQ,
To help understand the performance of finetuned we also finetuned a variant that retrieves the title,
models on our tasks, this section presents an dimensions and member items, since the original
overview of the methods for the English splits, TAPAS could attend titles and column names.
whereas the implementation details are covered
5.2 Response generation
in Appendix A.4. Similar architectures were used
for French (described in Appendix A.5). T5 We finetuned the large variant of T5 (Raffel
et al., 2020) (named No aug. in Table 10) to auto-
5.1 Retrieval regressively decode the target (agent reply) after
first encoding the source (on-going conversation).
BM25 We use Robertson and Zaragoza (2009)’s
algorithm to retrieve the metadata of a table (pas- Augmenting T5 with top-k title(s) For every
sage) similar to a given query by weighting the partial conversation, we use DPR (basic+member)
idf -scaled term frequency of query words with re- to retrieve the top-k tables (where k ∈ {1, 5}), and
spect to the passages. append their titles to the partial conversation. This
2804
Metadata R@1 R@10 R@20 Metrics No aug. Top-1 Title Top-5 Titles
Basic 14.7 45.0 55.0 METEOR 23.35 24.07 24.41
Basic + member 15.7 46.2 56.3 ROUGE-L 30.65 30.76 30.88
Basic + footnotes 13.9 44.4 54.2 MoverScore 59.82 60.23 60.31
Member 10.7 35.0 46.3 BERTScore 86.04 86.11 86.17
Title 13.9 43.8 53.4 Title Acc. 6.96 7.99 10.82
Table 8: Retrieval results of DPR for the English test Table 10: Response generation results for the English
split with varying table representations. Overview of test split.
metadata in Section 3.3.
Transfer to table retrieval task Our experi-
Model R@1 R@10 R@20
ments allow us to analyze the effectiveness of
BM25 0.3 2.3 3.8
DPR 14.3 45.1 54.2
open-domain QA fine-tuning (NQ) and tabular pre-
TAPAS 6.1 22.1 31.5 training when transferring to our table retrieval
TAPAS-NQ 7.4 30.0 39.3 task. We observe in Table 9 that DPR outperforms
TAPAS and TAPAS-NQ by respectively 23.0% and
Table 9: Retrieval results for the English test split when
15.1% in test recall@10. Moreover, TAPAS-NQ
encoding title and member items. DPR and TAPAS
were run 3 times and averaged. achieves a better performance when it only re-
trieves the title and member items instead of the
allows T5 to decide between using one of the sug- full table (p = 0.016), likely due to repetitions
gested titles and generating something else (e.g., and truncation due to context size limits. Although
clarification question). This is similar to the agents’ both DPR and TAPAS-NQ were trained on NQ, the
behavior, as they tend to return a title with the URL latter was trained on a small subset (12K vs 320K)
when sharing a relevant table. Furthermore, super- that contains tables. Our experiments indicate that
vising T5 to ignore or return a title is equivalent to TAPAS transfers poorly from one task (NQ-Tables)
an implicit binary classification. to another (StatCan).
Response generation We compare the perfor-
6 Results and Discussions
mance of fine-tuned T5-large models with and with-
Based on our baselines and data, we report the re- out DPR-augmented table titles. In Table 10, we
sults and analyze the challenges that our dataset notice that retrieval-augmented models show mod-
and tasks pose for existing models. For the En- est improvements on the 4 non-task specific metrics.
glish splits, the main retrieval results are reported However, the top-5 augmented model achieves an
in Table 9 and Table 8, and main generation results absolute improvement of 3.86% in title accuracy,
are in Table 10. Full results can be found in Ap- indicating that the information provided by DPR
pendix A, respectively in Table 16 and Table 33, does help T5 in generating the desired title. It is
and relevant statistical tests in Appendix A.7. nevertheless surprising that T5 without augmen-
tation achieves a score of 6.96%, suggesting that
Impact of table representation In Table 9, we
the T5 model is capable of storing the titles seen
observe that the metadata representation affects
at training time, and, to a limited extent, is able
the retrieval recall. Although DPR can achieves
to recall and return them at test time. Finally, we
respectable results when it only retrieves the ti-
point out that the title accuracy is still 5.91% lower
tle, including basic information (defined in Sec-
than top-1 recall of the DPR retriever (Table 8),
tion 3.3) yields slight improvements, and further
indicating that T5 fails to learn when to return a
adding member items results in a significant dif-
table (despite the agent retrieving a table in 23.4%
ference from only using title (p = 0.014). How-
of all turns). In the case of top-5 titles, T5 struggles
ever, only using member item result in drastic de-
to decide which table title to return.
crease in recall (p = 0.00086), indicating the im-
portance of the title. Moreover, footnotes do not Qualitative analysis of generated responses We
yield any improvement, which may be because examine various conversations to understand what
they often exceed the maximum context span (see type of responses are generated by T5. We find that
Table 20). Thus, concisely but meaningfully repre- it can generate simple speech acts like greetings,
senting metadata will be crucial to achieve a good but can struggle with context-specific speech acts
recall on the retrieval task. such as clarification questions. Moreover, it can
2805
Split Valid Test Split Valid Test
Model
Infreq. DPR Title
0 20 40 60 0 20 40 60
Recall@5 Recall@10
Figure 3: Results for DPR B+M for tables appearing Figure 4: Validation and test recall for a selected set
in the training set frequently (10+ times), infrequently of retrieval models. We observe a significant drop in
(1-10 times), and unseen at train time. performance. B+M denotes Basic + member.
reliably reply with the title of a common table, spurious correlation with hidden factors). Thus,
struggles for uncommon ones, and is sometimes future models should aim to close the gap between
capable of generating unseen titles with the help of unseen and frequent tables and within the temporal
DPR. The full analysis is in Appendix A.6. spectrum of infrequent tables.
Temporal drifts As explained in Section 3.4, we French results In both tasks, we see a drop
use a temporal split to test the model’s ability to across all metrics for all models, likely due to the
generalize to future conversations. We observe smaller dataset size. Some observations remain
a significant drop in recall (13%-28.3%) in Fig- valid: temporal drift, poor BM25 performance,
ure 4 when we compare the validation and test set and augmentations benefit mT5 for certain met-
performance, even when the models are trained rics. However, others differ: adding member items
with varying metadata representations. Similarly, hurts test results and mT5 performs poorly on ti-
T5 achieves low scores on the test split for the re- tle accuracy. Modeling details and results can be
sponse generation task (Table 10). This large gap found in the Appendix A.5.
suggests that trained models struggle to generalize
to future conversations. First, we found that this 7 Conclusion
is likely caused by the number of new tables that
appear in the test split (145) compared to validation In this paper, we introduce the StatCan Dialogue
(41), as shown in Table 3. Moreover, the subjects of Dataset, a novel corpus consisting of 20K+ English
the conversations have significantly changed: users and French conversations between online visitors
started to care more about businesses, health and IT, of statcan.gc.ca and operators of Statistics Canada.
and less about demography, income and pensions. Based on this dataset, we propose two tasks cen-
This is likely motivated by real-world events affect- tered on helping users find specific data tables: the
ing the users, which are more difficult to implicitly table retrieval task and the response generation
capture from simulated environments, but desir- task. For the table retrieval task, we experiment
able in order to understand a model’s robustness with various DPR and TAPAS variants, finding that
in temporal shift and for real world applications. DPR strongly outperforms its TAPAS counterpart,
In the Appendix, Figure 8 displays the differences as well as the BM25 baseline. For the response
between the training and test splits for all subjects. generation task, we investigate fine-tuned T5-large
models and explore variants where the input is aug-
Generalizing to unseen tables As shown in Fig- mented with table titles from DPR. We find that
ure 3, DPR performs well for tables appearing fre- retrieval-augmented T5 models more frequently re-
quently in the validation split, but poorly in the test turn the correct tables, although its title accuracy is
split, which could be caused by temporal drift. As still lower than the corresponding recall of the DPR
expected, tables that were not seen during training retriever. This result suggests that the generation
resulted in poor recall@5 in either splits. More- models struggle to decide when to return a table.
over, the difference in recall between valid and test We also find that retrieval and generation have dif-
for infrequent tables could be caused by many po- ficulty generalizing to future conversations, as our
tential reasons (learning bias, temporal overfitting, temporal test split revealed a big performance gap
2806
between the validation and test set. All in all, we to be published in the same data repository with
believe that our tasks pose significant challenges the original terms of use and licenses preserved or
to currently available models and encourage the extended.
research community to further explore this dataset
and build conversational models that help users of Risk of toxicity in online discourse StatCan
Statistics Canada. agents are trained to work with online users in
a professional manner. Moreover, since the users
Limitations access statcan.gc.ca anonymously and virtually, it
is more likely to observe toxic online disinhibi-
Tasks and models limitations The tables in the tion (Lapidot-Lefler and Barak, 2012), which could
retrieval task are sourced from statcan.gc.ca, which translate to toxicity in users’ utterances. Thus, we
means that the content is primarily about Cana- request dataset users to report any instance of toxi-
dian demographics2 and are professionally edited city in conversations, which will be reviewed in the
by StatCan employees. Moreover, the generation same manner as PII leaks.
task is specifically designed to model responses
with high fidelity based on retrieved tables, so this Acknowledgement
task should not be directly used in an unintended
or non-research setting (e.g., deploying a virtual We thank our collaborators at Statistics Canada
assistant) as they pose risks of hallucination that for providing us the dataset, guiding us through the
could negatively impact stakeholders. Furthermore, technical aspects of the tables, and sharing valuable
those limitations can be reflected in the models we feedback on the project. We thank Sivan Milton
trained, so we will share those limitations in the for the helpful discussions on dialogue acts and
model cards (Mitchell et al., 2019) on release. analysis.
2807
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Findings of the Association for Computational Lin-
Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra- guistics: EMNLP 2020, pages 1193–1208, Online.
madan, and Milica Gašić. 2018. MultiWOZ - a large- Association for Computational Linguistics.
scale multi-domain Wizard-of-Oz dataset for task-
oriented dialogue modelling. In Proceedings of the Emily Dinan, Stephen Roller, Kurt Shuster, Angela
2018 Conference on Empirical Methods in Natural Fan, Michael Auli, and Jason Weston. 2018. Wizard
Language Processing, pages 5016–5026, Brussels, of Wikipedia: Knowledge-Powered Conversational
Belgium. Association for Computational Linguistics. Agents. In International Conference on Learning
Representations.
Harry Bunt, Jan Alexandersson, Jean Carletta, Jae-
Woong Choe, Alex Chengyu Fang, Koiti Hasida, Etalab Lab-IA. 2021. DPR CamemBERT trained on
Kiyong Lee, Volha Petukhova, Andrei Popescu-Belis, French Q&A.
Laurent Romary, Claudia Soria, and David Traum.
2010. Towards an ISO Standard for Dialogue Act Song Feng, Siva Sankalp Patel, Hui Wan, and Sachindra
Annotation. In Seventh conference on International Joshi. 2021. MultiDoc2Dial: Modeling dialogues
Language Resources and Evaluation (LREC’10). grounded in multiple documents. In Proceedings of
the 2021 Conference on Empirical Methods in Natu-
Harry Bunt, Volha Petukhova, Emer Gilmartin, Cather- ral Language Processing, pages 6162–6176, Online
ine Pelachaud, Alex Fang, Simon Keizer, and Laurent and Punta Cana, Dominican Republic. Association
Prevot. 2020. The iso standard for dialogue act anno- for Computational Linguistics.
tation. In 12th Edition of its Language Resources and
Evaluation Conference (LREC 2020), pages 549–558. Song Feng, Hui Wan, Chulaka Gunasekara, Siva Patel,
European Language Resources Association (ELRA). Sachindra Joshi, and Luis Lastras. 2020. doc2dial: A
goal-oriented document-grounded dialogue dataset.
Derek Chen, Howard Chen, Yi Yang, Alexander Lin, In Proceedings of the 2020 Conference on Empirical
and Zhou Yu. 2021. Action-based conversations Methods in Natural Language Processing (EMNLP),
dataset: A corpus for building more in-depth task- pages 8118–8128, Online. Association for Computa-
oriented dialogue systems. In Proceedings of the tional Linguistics.
2021 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu- Timnit Gebru, Jamie H. Morgenstern, Briana Vecchione,
man Language Technologies, pages 3002–3017, On- Jennifer Wortman Vaughan, Hanna M. Wallach, Hal
line. Association for Computational Linguistics. Daumé, and Kate Crawford. 2021. Datasheets for
datasets. Communications of the ACM, 64:86 – 92.
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos
Guestrin. 2016. Training Deep Nets with Sublin- Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao,
ear Memory Cost. arXiv:1604.06174 [cs]. ArXiv: Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu,
1604.06174. Qiaoqiao She, Xuan Liu, Tian Wu, and Haifeng
Wang. 2018. DuReader: a Chinese Machine Read-
Wenhu Chen, Ming-Wei Chang, Eva Schlinger, ing Comprehension Dataset from Real-world Appli-
William Yang Wang, and William W. Cohen. 2020. cations. In Proceedings of the Workshop on Ma-
Open Question Answering over Tables and Text. In chine Reading for Question Answering, pages 37–46,
International Conference on Learning Representa- Melbourne, Australia. Association for Computational
tions. Linguistics.
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen- Charles T. Hemphill, John J. Godfrey, and George R.
tau Yih, Yejin Choi, Percy Liang, and Luke Zettle- Doddington. 1990. The ATIS spoken language sys-
moyer. 2018. QuAC: Question Answering in Context. tems pilot corpus. In Proceedings of the workshop
In Proceedings of the 2018 Conference on Empiri- on Speech and Natural Language - HLT ’90, pages
cal Methods in Natural Language Processing, pages 96–101, Hidden Valley, Pennsylvania. Association
2174–2184, Brussels, Belgium. Association for Com- for Computational Linguistics.
putational Linguistics.
Jonathan Herzig, Thomas Müller, Syrine Krichene, and
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Julian Eisenschlos. 2021. Open Domain Question
Kristina Toutanova. 2019. BERT: Pre-training of Answering over Tables via Dense Retrieval. In Pro-
Deep Bidirectional Transformers for Language Un- ceedings of the 2021 Conference of the North Amer-
derstanding. In Proceedings of the 2019 Conference ican Chapter of the Association for Computational
of the North American Chapter of the Association for Linguistics: Human Language Technologies, pages
Computational Linguistics: Human Language Tech- 512–519, Online. Association for Computational Lin-
nologies, Volume 1 (Long and Short Papers), pages guistics.
4171–4186, Minneapolis, Minnesota. Association for
Computational Linguistics. Jonathan Herzig, Pawel Krzysztof Nowak, Thomas
Müller, Francesco Piccinno, and Julian Eisenschlos.
Martin d’Hoffschmidt, Wacim Belblidia, Quentin 2020. TaPas: Weakly Supervised Table Parsing via
Heinrich, Tom Brendlé, and Maxime Vidal. 2020. Pre-training. In Proceedings of the 58th Annual Meet-
FQuAD: French Question Answering Dataset. In ing of the Association for Computational Linguistics,
2808
pages 4320–4333, Online. Association for Computa- Chin-Yew Lin and Franz Josef Och. 2004. Auto-
tional Linguistics. matic evaluation of machine translation quality using
longest common subsequence and skip-bigram statis-
Armand Joulin, Edouard Grave, Piotr Bojanowski, and tics. In Proceedings of the 42nd Annual Meeting
Tomas Mikolov. 2017. Bag of tricks for efficient on Association for Computational Linguistics - ACL
text classification. In Proceedings of the 15th Con- ’04, pages 605–es, Barcelona, Spain. Association for
ference of the European Chapter of the Association Computational Linguistics.
for Computational Linguistics: Volume 2, Short Pa-
pers, pages 427–431, Valencia, Spain. Association Ilya Loshchilov and Frank Hutter. 2017. Decoupled
for Computational Linguistics. weight decay regularization. In International Confer-
ence on Learning Representations.
Ali Kabbadj. 2021. French-SQuAD : French Machine
Reading for Question Answering. Original-date: Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle
2019-09-22T17:54:45Z. Pineau. 2015. The Ubuntu Dialogue Corpus: A
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Large Dataset for Research in Unstructured Multi-
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Turn Dialogue Systems. In Proceedings of the 16th
Wen-tau Yih. 2020. Dense Passage Retrieval for Annual Meeting of the Special Interest Group on Dis-
Open-Domain Question Answering. In Proceedings course and Dialogue, pages 285–294, Prague, Czech
of the 2020 Conference on Empirical Methods in Republic. Association for Computational Linguistics.
Natural Language Processing (EMNLP), pages 6769–
6781, Online. Association for Computational Lin- Marco Lui and Timothy Baldwin. 2011. Cross-domain
guistics. Feature Selection for Language Identification. In
Proceedings of 5th International Joint Conference
Rachel Keraron, Guillaume Lancrenon, Mathilde Bras, on Natural Language Processing, pages 553–561,
Frédéric Allary, Gilles Moyse, Thomas Scialom, Chiang Mai, Thailand. Asian Federation of Natural
Edmundo-Pavel Soriano-Morales, and Jacopo Sta- Language Processing.
iano. 2020. Project PIAF: Building a native French
question-answering dataset. In Proceedings of the Louis Martin, Benjamin Muller, Pedro Javier Or-
Twelfth Language Resources and Evaluation Confer- tiz Suárez, Yoann Dupont, Laurent Romary, Éric
ence, pages 5481–5490, Marseille, France. European de la Clergerie, Djamé Seddah, and Benoît Sagot.
Language Resources Association. 2020. CamemBERT: a tasty French language model.
In Proceedings of the 58th Annual Meeting of the As-
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red- sociation for Computational Linguistics, pages 7203–
field, Michael Collins, Ankur Parikh, Chris Alberti, 7219, Online. Association for Computational Lin-
Danielle Epstein, Illia Polosukhin, Jacob Devlin, Ken- guistics.
ton Lee, Kristina Toutanova, Llion Jones, Matthew
Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Margaret Mitchell, Simone Wu, Andrew Zaldivar,
Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Parker Barnes, Lucy Vasserman, Ben Hutchinson,
Questions: A Benchmark for Question Answering Elena Spitzer, Inioluwa Deborah Raji, and Timnit
Research. Transactions of the Association for Com- Gebru. 2019. Model Cards for Model Reporting. In
putational Linguistics, 7:453–466. Proceedings of the Conference on Fairness, Account-
ability, and Transparency, FAT* ’19, pages 220–229,
Noam Lapidot-Lefler and Azy Barak. 2012. Effects New York, NY, USA. Association for Computing
of anonymity, invisibility, and lack of eye-contact Machinery.
on toxic online disinhibition. Computers in Human
Behavior, 28(2):434–443. Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann, Man-
Harrison Lee, Raghav Gupta, Abhinav Rastogi, Yuan aal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipan-
Cao, Bin Zhang, and Yonghui Wu. 2022. Sgd-x: jan Das. 2020. ToTTo: A controlled table-to-text
A benchmark for robust generalization in schema- generation dataset. In Proceedings of the 2020 Con-
guided dialogue systems. In Proceedings of the AAAI ference on Empirical Methods in Natural Language
Conference on Artificial Intelligence. Processing (EMNLP), pages 1173–1186, Online. As-
sociation for Computational Linguistics.
Quentin Lhoest, Albert Villanova del Moral, Yacine
Jernite, Abhishek Thakur, Patrick von Platen, Suraj Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce
Patil, Julien Chaumond, Mariama Drame, Julien Plu, Croft, and Mohit Iyyer. 2020. Open-Retrieval Con-
Lewis Tunstall, et al. 2021. Datasets: A community versational Question Answering. In Proceedings of
library for natural language processing. In Proceed- the 43rd International ACM SIGIR Conference on
ings of the 2021 Conference on Empirical Methods Research and Development in Information Retrieval,
in Natural Language Processing: System Demonstra- SIGIR ’20, pages 539–548, New York, NY, USA.
tions, pages 175–184. Association for Computing Machinery.
Chin-Yew Lin. 2004. ROUGE: A Package for Auto- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
matic Evaluation of Summaries. In Text Summariza- Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
tion Branches Out, pages 74–81, Barcelona, Spain. Wei Li, and Peter J. Liu. 2020. Exploring the Lim-
Association for Computational Linguistics. its of Transfer Learning with a Unified Text-to-Text
2809
Transformer. Journal of Machine Learning Research, Le Scao, Sylvain Gugger, Mariama Drame, Quentin
21(140):1–67. Lhoest, and Alexander Rush. 2020. Transformers:
State-of-the-Art Natural Language Processing. In
Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Proceedings of the 2020 Conference on Empirical
Raghav Gupta, and Pranav Khaitan. 2020. Towards Methods in Natural Language Processing: System
scalable multi-domain conversational agents: The Demonstrations, pages 38–45, Online. Association
schema-guided dialogue dataset. In Proceedings of for Computational Linguistics.
the AAAI Conference on Artificial Intelligence, vol-
ume 34, pages 8689–8696. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale,
Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and
Siva Reddy, Danqi Chen, and Christopher D. Manning. Colin Raffel. 2021. mT5: A Massively Multilingual
2019. CoQA: A Conversational Question Answer- Pre-trained Text-to-Text Transformer. In Proceed-
ing Challenge. Transactions of the Association for ings of the 2021 Conference of the North American
Computational Linguistics, 7:249–266. Chapter of the Association for Computational Lin-
guistics: Human Language Technologies, pages 483–
Radim Rehurek and Petr Sojka. 2010. Software Frame-
498, Online. Association for Computational Linguis-
work for Topic Modelling with Large Corpora. In
tics.
In Proceedings of the Lrec 2010 Workshop on New
Challenges for Nlp Frameworks, pages 45–50. Yi Yang, Wen-tau Yih, and Christopher Meek. 2015.
WikiQA: A Challenge Dataset for Open-Domain
Stephen Robertson and Hugo Zaragoza. 2009. The
Question Answering. In Proceedings of the 2015
Probabilistic Relevance Framework: BM25 and Be-
Conference on Empirical Methods in Natural Lan-
yond. Foundations and Trends® in Information Re-
guage Processing, pages 2013–2018, Lisbon, Portu-
trieval, 3(4):333–389.
gal. Association for Computational Linguistics.
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas.
2000. The Earth Mover’s Distance as a Metric for Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se-
Image Retrieval. International Journal of Computer bastian Riedel. 2020. TaBERT: Pretraining for Joint
Vision, 40(2):99–121. Understanding of Textual and Tabular Data. In Pro-
ceedings of the 58th Annual Meeting of the Asso-
Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer ciation for Computational Linguistics, pages 8413–
Singh, Tim Rocktäschel, Mike Sheldon, Guillaume 8426, Online. Association for Computational Lin-
Bouchard, and Sebastian Riedel. 2018. Interpretation guistics.
of natural language rules in conversational machine
reading. In Proceedings of the 2018 Conference on Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q.
Empirical Methods in Natural Language Processing, Weinberger, and Yoav Artzi. 2019. BERTScore:
pages 2087–2097, Brussels, Belgium. Association Evaluating Text Generation with BERT. In Inter-
for Computational Linguistics. national Conference on Learning Representations.
Noam Shazeer and Mitchell Stern. 2018. Adafactor: Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Chris-
Adaptive learning rates with sublinear memory cost. tian M. Meyer, and Steffen Eger. 2019. MoverScore:
In Proceedings of the 35th International Conference Text Generation Evaluating with Contextualized Em-
on Machine Learning, volume 80 of Proceedings beddings and Earth Mover Distance. In Proceedings
of Machine Learning Research, pages 4596–4604. of the 2019 Conference on Empirical Methods in
PMLR. Natural Language Processing and the 9th Interna-
tional Joint Conference on Natural Language Pro-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob cessing (EMNLP-IJCNLP), pages 563–578, Hong
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kong, China. Association for Computational Lin-
Kaiser, and Illia Polosukhin. 2017. Attention is All guistics.
you Need. In Advances in Neural Information Pro-
cessing Systems, volume 30. Curran Associates, Inc. A Appendices
Ellen M. Voorhees. 2001. The TREC question answer- A.1 Complete User Intents
ing track. Natural Language Engineering, 7(4):361–
378. To provide insight into what kind of help is offered
by StatCan’s live chat, we qualitatively analyze the
Harm de Vries, Dzmitry Bahdanau, and Christo-
pher Manning. 2020. Towards Ecologically conversations and highlight examples of the main
Valid Research on Language User Interfaces. user intents below.
arXiv:2007.14435 [cs]. ArXiv: 2007.14435.
Finding a table The main intent of the chat func-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien tionality is to help users find specific data tables.
Chaumond, Clement Delangue, Anthony Moi, Pier-
ric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz,
For example, one user was looking for the popu-
Joe Davison, Sam Shleifer, Patrick von Platen, Clara lation numbers in certain regions of Montreal for
Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven 2012-2016. This intent is the focus of our work.
2810
Obtaining meta-information of table Instead A.2.2 Detailed Specifications
of finding data tables some users are interested in This section provides supplementary details for
meta information of a specific table. For example, Section 3.3.
one visitor wanted to know when the next Census
is released. Another user was interested in under- Product ID (PID) Unique 8 to 10-digit identifier
standing the definition of Workforce Availability given to each published data table. Although other
(WFA) and Labour Market Availability (LMA). types of tables might be shared by the agent, the
PID will always be given for the official data tables;
Manipulating a data table Some users would as a result, any table that does not have a PID in
like to obtain the data tables in a different format or the URL is not considered for this task. The first
representation. For example, one user was looking two digits (1-2) represent the code of the subject
at a specific data table and asked if they can see associated with the table (this can be found in the
annual instead of monthly values. basic information), then 3-4 represent the product
type, which in our case are tables and are common
coded as “10". Digits 4-8 is a unique identifier
Help with user interface Some users are looking
representing that table for the given subject. Digits
for help with the user interface. For example, one
9-10 indicate the view of the table and will com-
user wanted to download a specific data table but
monly be “01", which represents the default view;
they were unable to find the download link.
they are needed when constructing the URL but
they are otherwise optional, and are omitted in the
Out of domain requests We find many conver-
released dataset.
sations that are outside of the scope of StatCan’s
live chat. For example, some user asked what Member items Labels for individual tables, and
documentation needs to be provided to ship a can either be represented as a column or a row
specific product to a foreign country. index through a pivot operation. Each table will
have member items different from other tables.
The first intent is covered in Table 1, and sub-
Dimensions Non-overlapping sets of member
sequent intents are in Table 18. While we believe
items; each member item must belong to a dimen-
all intents are interesting directions for dialogue
sion. For a table with d dimensions, each data
research, we focus on the table retrieval intent be-
value is associated with a single d-tuple in the d-
cause (i) there are many conversations available
fold Cartesian product of the dimensions, but not
for them and (ii) there is a clear measure of task
every tuple in the product will have an associated
success i.e., whether the correct table is retrieved.
data value (if it was not recorded or if the tuple is
Throughout the rest of this paper, we work with
invalid).
conversations where the agent returns a table URL.
Basic information Metadatum consisting of the
A.2 Supplementary Table Description title, the date range, the frequency, the dimensions,
the subject category, and the survey3 from which
A.2.1 Formatting and size the data was sourced.
Full tables are stored in a long format, where each Footnotes Unstructured comments often in-
column corresponds to a dimension, except the last cluded with a table if supplementary details need to
one which corresponds to the value. The number be given; usually, those notes are associated with a
of rows corresponds to the Cartesian product of specific member item that requires more explana-
the dimensions’ members. This means that for ni tions.
members
Qd across d dimensions, we have a total of
n i rows in the full table. When a table has Full table For all tables with a PID, the full tables
i=0
many member items, the full table can become (as a CSV file), their complete metadata, and their
extremely large. Moreover, tables shown on the basic information are made available as part of the
statcan.gc.ca are usually a pivoted and filtered view data release and can be used in our proposed tasks.
of the full table, which means certain members will A sample table can be found in Section 3.3.
become columns, others will become rows, and 3
The list of surveys and link to detailed information can
many are simply omitted. be found at https://www.statcan.gc.ca/en/survey/list
2811
A.2.3 Table updates and archives although most messages will have around 32 to-
Tables on Statcan will sometimes be updated reg- kens, the longest message can have up to 1374
ularly, whereas in other times they may only be tokens; in those scenarios, we will see agents write
released once. When they stop being updated and a large body of text, and sometimes also copy and
the information becomes outdated, they will be paste large amount of text (for example, from a
marked as archived. In some cases, a new version database of templates) when responding to the user.
of an archive table may be created with substantial Although those are usually sent in multiple consec-
changes (such as new columns). As shown in Ta- utive messages within a turn, they may decide to
ble 14, tables are released without a predetermined send everything all at once.
schedule, but has been on average increasing since
A.4 Implementation Details
2000, with major peaks in 2017, which was caused
by many health-related tables, and 2021, which This section provides the details for implementing
was caused by the release of many labour, science, the models in Section 5.
and income tables, as shown in Figure 6. Moreover,
Implementing transformer models All models
Figure 5 shows that a majority of tables released
based on the transformer architecture (Vaswani
between 2019 and 2021 are still up-to-date (cur-
et al., 2017) were implemented using Hugging-
rent), whereas most of the tables before then have
Face’s library (Wolf et al., 2020).
been archived.
BM25 To facilitate reproducibility, we imple-
A.3 Supplementary Statistics mented the model in Gensim (Rehurek and Sojka,
A.3.1 Frequently requested tables 2010).
The most frequent tables are summarized in Ta- DPR and TAPAS-NQ We used the base variant
ble 21. Whereas 2 of them are sourced from the of DPR and the large variant of TAPAS-NQ. We
consumer price index survey (commonly used to use the DPR checkpoints that were trained on 320K
track inflation), the 4 other tables cover more gen- questions from Natural Questions (Kwiatkowski
eral and broad subjects like income, demography, et al., 2019) (NQ). During training, the networks
business performance and crime. Each of those 4 were optimized with AdamW (Loshchilov and Hut-
tables are sourced from different surveys. One ta- ter, 2017) at a learning rate of 10−5 and zero weight
ble is updated monthly, another semi-annually, and decay. Based on the original work, the networks
the rest are updated annually. The oldest table was were trained for 30 epochs, with a batch size of
updated in 2019, which is when the conversations 64 queries, positive passages, and hard negative
started being recorded. passages (the latter are retrieved with BM25). Neg-
A.3.2 Table Frequency Statistics ative in-batch sampling was used to increase neg-
ative examples. To ensure reproducibility, the net-
In tables 3 and 12, we can calculate that a table works were trained on a single 32GB GPU and used
is returned on average 5.68 times (with standard gradient checkpointing (Chen et al., 2016) to re-
deviation of 12.86) in English conversations and duce memory usage. The conversation lengths was
4.25 times (standard deviation of 9.73) in French 512 tokens, and the metadata token lengths were
conversations. Thus, the most requested tables are 128 for title, 256 for basic information (defined in
disproportionately represented compared to less Section 3.3), and 512 for the rest.
popular tables, and there’s a very high variance in
the number of time a table is used. In fact, there T5 We used an Adafactor optimizer (Shazeer and
are 294 tables that appear only once in either splits Stern, 2018) with a learning rate of 0.001. We used
(i.e., 28.9%). batch sizes of 16 with 8 steps of gradient accumu-
lation and gradient checkpointing to reproduce the
A.3.3 Fine-grained Conversation Statistics batch size of 128 samples in the original implemen-
In Table 13, we observe that the number of mes- tation. The models were trained on a single 32GB
sages and turns will vary significantly around the GPU for 10 epochs. The source and target lengths
mean, with over 68% conversations lasting between were respectively 512 and 256 tokens, where the
2 and 7 turns. In extreme cases, a conversation can source was truncated from the right to ensure that
last up to 28 turns. Moreover, we also notice that, the latest messages remained after truncation. We
2812
used a beam size of 4 and length penalty of 0.6 Generation results In Table 33, we notice a sig-
following the original implementation. nificant decrease across all metrics, with the title
accuracy being consistently 0%. This is likely be-
Training time For the English split, each variant
cause the French split is significantly smaller, yet
of DPR can be trained in 68 minutes on a A100
remains as complex as the English split, which
GPU. Each large variant of TAPAS-NQ takes 15h
becomes challenging for mT5 to model. In the
to train on a V100 GPU. Each variant of T5 can
case of title accuracy, we found 55 instances in
be trained in 16h on a V100 GPU. All DPR results
the French test split where the title is in the target
can be reproduced in 8h, TAPAS in 90h, and T5 in
text (i.e., returned by an agent). However, in 54
96h, and proportionally less time would be needed
cases, the augmented mT5 returned a generic reply
for the French split.
(e.g., “Veuillez patienter pendant que j’effectue une
A.5 Modeling the French subsets recherche.") instead of the expected title, which in-
Basic statistics The number of messages by con- dicates that mT5 is incapable of determining when
versation varies between 2 and 59 with a median it is relevant to return a title and can’t generate
of 11 for the English split (see Figure 7 for the non-templated responses.
distribution). Based on Table 12, there’s on aver-
A.6 Responses generated by T5
age 3.9 turns but 12.3 messages. On average, there
are over 30 tokens for each message (using the T5 In this section, we select a few conversations from
tokenizer). the validation set and examine the messages gener-
ated by T5 and T5 augmented with DPR-retrieved
Language splitting In order to determine the lan-
titles (T5+D).
guage of each conversation, we used two popular
language identification libraries: langid.py (Lui
Common and uncommon responses In Ta-
and Baldwin, 2011) and a fasttext network fine-
ble 23, we notice that both T5 and T5+D are capa-
tuned for language detection (Joulin et al., 2017).
ble of generating common speech acts like “Thank
After apply the models on every conversation, we
you” and “Please wait...”, but struggles when faced
only retain the conversations with matching lan-
with an unfamiliar situation (having to ask for clar-
guage labels (both English or both French).
ification for a user that has been accidentally dis-
Training and evaluation The training procedure connected).
and evaluation on the French subsets follow exactly
the tasks specified in Section 4. Common table In Table 24, among the retrieved
tables (Ri ) titles, the first one was partially correct.
Modeling response generation Instead of T5, Both T5 and T5+D extended the title and also out-
we used the multilingual T5 model by Xue et al. put the desired ID, matching the expected agent’s
(2021) as it naturally handles text in French. response, which is one of the most popular table in
Modeling retrieval We used a variant of DPR the training set (see Table 21).
derived from CamemBERT (Martin et al., 2020)
and trained on three French Q&A datasets (Keraron Multiple tables, date selection In Table 28, we
et al., 2020; d’Hoffschmidt et al., 2020; Kabbadj, notice that T5 only returns one of the two tables
2021) by Etalab Lab-IA (2021). that the agent returned. On the other hand, the
correct tables were retrieved by DPR, but T5+D
Retrieval results In Table 17, we observe that, failed to select the ones with the correct dates (it
unlike the English split, adding member items to selected June 2019 instead of December 2019) but
the basic information or to the title improves val- the select were otherwise relevant.
idation results but not test results, which likely
indicates overfitting. However, we notice a high Verbosity of explanations T5+D additionally
variance between the runs, which makes it diffi- provided a paragraph of explanation while linking
cult to determine whether member items is helpful. to relevant resources, both on the StatCan website
Both overfitting and high variance are likely caused (non-tabular) and external resources. This is be-
by the smaller size of the training set. Moreover, cause T5+D memorized this information during
BM25 perform extremely poorly on any metadata training, and simply replaced the tables with the
view, which can also be linked to the dataset size. updated dates (Table 27).
2813
Uncommon tables In the conversations shown 3. Claim: TAPAS-NQ performs better with title
in Tables 25, 26, 29, the retrieved table appears and member items compared to the full table.
10 times in the training set, which is significantly With A being the model using the full table,
less common than the table retrieved in Table 24. and B using title + member, our p-value is
For Table 25, the table returned by the agent was 0.016.
not retrieved by DPR, leading to T5+D returning
the first title retrieved. However, in Table 26, the 4. Claim: In Figure 4, the validation recall@10
correct title was retrieved by DPR (title #4), yet are higher than the test split for TAPAS, DPR
T5+D failed to use that correct title in the gener- Title and DPR Basic + member. With A be-
ated message. As for Table 29, the agent gave a ing the validation score and B the test scores,
hint by stating “As a standard product, we have the p-values are respectively 0.00197, 2.18 ×
tables about employment by industry”, which was 10−5 , 0.00014.
correctly acknowledged by DPR as the second re-
trieved title perfectly matches the PID of the table
in the agent’s response. However, T5+D fails again
at selecting the correct title, instead opting to return
a generic response (“Please hold while I find the
information”), and T5 hallucinates a PID that is
different from the title it generated (both of which
are wrong).
• target_augmented_1: The same con- • Data from real users: We wanted a dataset that
tent as source_augmented, but using captures the linguistic challenges that exist in
only the first table instead of top-5. the real world
2817
Status Archived Current
1500
Tables released
1000
500
0
2000 2005 2010 2015 2020
Year
1500
Tables released
1000
500
0
2000 2005 2010 2015 2020
Year
Figure 6: The release year of all tables by subject. Only the top 8 subjects are shown for readability.
10
percent
0
20 40
messages
Figure 7: Histogram of messages by conversation in the French task splits. English split in Figure 1.
2818
split train valid test
Labour
Health
Economic accounts
Agriculture
Construction
Environment
Government
International trade
Transportation
Energy
Aboriginal peoples
Housing
Manufacturing
Languages
Figure 8: Distribution of subject categories for tables in the retrieval task, colored by the split.
2819
Train Valid Test Overall
Train 0 470 535 0 Dataset Train Valid Test All
Valid 41 0 199 0
Test 145 238 0 0 Max Tokens / Msg 1243 857 912 1243
Overall 181 610 571 0 Max Turns / Conv. 28 18 21 28
Max Msg / Conv. 78 58 73 78
(a) Differences Max Qs / Conv. 15 11 11 15
Std Tokens / Msg 1 1 1 1
Train Valid Test Overall Std Turns / Conv. 2.58 2.36 2.62 2.56
Train 778 308 243 778 Std Msg / Conv. 8.53 7.55 8.98 8.47
Valid 308 349 150 349 Std Qs / Conv. 1.27 1.24 1.18 1.25
Test 243 150 388 388 Max Tokens / Msg 1374 1373 352 1374
Overall 778 349 388 959 Max Turns / Conv. 15 10 15 15
Max Msg / Conv. 59 38 47 59
(b) Overlaps Max Qs / Conv. 15 15 5 15
Train Valid Test Overall Std Tokens / Msg 1 2 2 1
Std Turns / Conv. 2.21 2.08 2.1 2.17
Train 0 154 184 0 Std Msg / Conv. 7.02 6.74 6.61 6.91
Valid 18 0 64 0 Std Qs / Conv. 1.84 2.18 0.77 1.79
Test 44 60 0 0
Overall 61 197 201 0
Table 13: More statistics (max and standard deviation)
(c) Differences (French) at the conversation level (top: English, bottom: French),
Train Valid Test Overall following Table 3.
Train 224 70 40 224
Valid 70 88 24 88
Test 40 24 84 84
Overall 224 88 84 285
2820
Merged Acts Count Original Speech Acts
answer 50 answer, confirm, disconfirm
request 31 request, acceptOffer, declineOffer
timeManagement 28 stalling, pausing
inform 63 inform, agreement, disagreement, correction
infoSeekingQuestions 25 propositionalQuestion, checkQuestion, setQuestion, choiceQuestion, directQuestion
promise 18 promise, acceptRequest, declineRequest
autoFeedback 25 positiveAutoFeedback, negativeAutoFeedback
offer 11 offer
instruct 18 instruct
clarificationQuestion 16 clarificationQuestion
greeting 28 initialGreeting, returnGreeting
selfIntroduction 17 initialSelfIntroduction, returnSelfIntroduction
thanking 47 thanking
acceptThanking 15 acceptThanking
Table 15: Merged and original speech acts occurring in 100 turns in conversations from the validation set. This
table is summarized in Table 4.
Valid Test
Model Encoded R@1 R@10 R@20 R@1 R@10 R@20
DPR Basic 37.8 ± 1.82 73.1 ± 1.02 80.8 ± 0.51 14.7 ± 1.13 45.0 ± 0.24 55.0 ± 1.67
Basic + footnote 40.0 ± 0.64 73.3 ± 0.85 81.3 ± 1.19 13.9 ± 0.37 44.4 ± 2.73 54.2 ± 2.65
Basic + member 40.0 ± 0.44 74.5 ± 1.38 81.6 ± 0.55 15.7 ± 0.7 46.2 ± 0.44 56.3 ± 0.58
Member 28.0 ± 0.47 62.1 ± 1.07 71.4 ± 1.06 10.7 ± 0.88 35.0 ± 0.57 46.3 ± 0.83
Title 37.8 ± 0.21 70.9 ± 0.67 78.2 ± 1.18 13.9 ± 0.59 43.8 ± 1.22 53.4 ± 1.42
Title + footnote 39.5 ± 0.69 73.2 ± 0.81 80.8 ± 0.47 13.3 ± 0.7 43.6 ± 0.46 53.6 ± 1.5
Title + member 38.6 ± 1.19 72.1 ± 0.92 79.4 ± 0.72 14.3 ± 0.59 45.1 ± 0.44 54.2 ± 1.07
TAPAS Table 16.9 ± 4.26 45.9 ± 5.46 55.5 ± 3.13 3.9 ± 0.65 17.2 ± 2.83 24.6 ± 3.45
Title 21.5 ± 0.33 54.7 ± 3.39 63.7 ± 1.38 4.7 ± 0.53 20.2 ± 0.85 28.7 ± 1.98
Title + member 24.6 ± 3.6 54.9 ± 3.34 63.7 ± 2.74 6.1 ± 0.98 22.1 ± 2.66 31.5 ± 3.02
TAPAS-NQ Table 25.3 ± 3.46 60.0 ± 2.43 70.0 ± 2.89 5.3 ± 0.8 22.8 ± 2.32 32.0 ± 2.48
Title 26.5 ± 1.85 64.4 ± 1.95 71.6 ± 2.32 6.3 ± 0.64 27.4 ± 1.84 36.1 ± 2.53
Title + member 29.8 ± 2.74 62.9 ± 1.35 71.5 ± 0.63 7.4 ± 0.73 30.0 ± 1.46 39.3 ± 0.86
BM25 Basic 1.0 7.5 10.3 1.0 7.2 9.8
Basic + footnote 0.9 2.4 3.1 0.9 2.8 3.9
Basic + member 0.1 1.3 2.6 0.2 2.4 3.8
Full 0.3 1.3 1.9 0.2 1.1 2.0
Title 2.0 10.3 13.8 2.5 10.9 14.0
Title + footnote 0.8 2.3 3.1 0.7 2.9 4.0
Title + member 0.0 1.4 2.4 0.3 2.3 3.8
Table 16: Full retrieval results for the English splits. The values reported are in recall % at k. DPR and TAPAS were
run 3 times and averaged (standard deviation given after ±). Selected results in Table 8.
Valid Test
Model Encoded R@1 R@10 R@20 R@1 R@10 R@20
DPR Basic 19.7 ± 1.52 57.4 ± 2.55 67.0 ± 5.01 12.5 ± 1.78 40.0 ± 1.08 48.5 ± 4.27
Basic + footnote 23.2 ± 3.8 55.4 ± 1.88 66.8 ± 1.04 13.9 ± 1.48 35.9 ± 0.82 49.9 ± 3.91
Basic + member 22.7 ± 4.35 57.9 ± 2.45 67.7 ± 3.48 12.8 ± 2.13 38.1 ± 2.17 47.0 ± 1.78
Title 23.7 ± 2.92 58.5 ± 2.24 69.8 ± 2.35 14.7 ± 2.28 35.7 ± 3.2 48.2 ± 2.84
Title + footnote 23.2 ± 2.74 54.7 ± 3.59 64.7 ± 2.28 14.4 ± 1.48 36.6 ± 2.69 45.4 ± 2.13
Title + member 24.2 ± 1.52 55.4 ± 4.51 65.3 ± 2.01 14.4 ± 2.28 39.5 ± 0.82 48.0 ± 2.05
BM25 Basic 0.0 0.0 0.0 0.0 0.0 0.0
Basic + footnote 0.0 0.0 0.0 0.0 0.0 0.0
Basic + member 0.5 1.0 1.0 0.0 0.7 0.7
Full 0.0 0.0 0.5 0.0 0.0 0.0
Title 0.0 0.5 0.5 0.0 0.0 0.7
Title + footnote 0.0 0.0 0.0 0.0 0.0 0.0
Title + member 0.0 0.5 1.0 0.0 0.7 0.7
Table 17: Full retrieval results for the French splits. The values reported are in recall % at k ∈ {1, 10, 20}. DPR
was run 3 times and averaged (standard deviation given after ±).
2821
Obtaining meta-information of table
U: Hi I have a question on the date of new census
A: Hello my name is Graham, how may I help you?
U: Hi Graham
U: I am currently looking at : Census of Canada and National Household Survey
U: I see that the last census were made in 2011 and 2016
U: should I expect a new census in the coming month in 2020 ?
A: no - the next Census is going to be taking place in May 2021
A: it’s every 5 years
U: Ok perfect thank you
A: you’re welcome! have a good day
Help with user interface
U: Hello Graham - is there way I can see total annual values instead of monthly?
U: for - https://www150.statcan.gc.ca/t1/tbl1/en/cv!recreate.action?...
A: please hold - I will see if a table like this with an annual frequency is available
U: wonderful thank you
A: unfortunately no, the only active table on construction investment is the monthly one I sent you - there is an inactive
quarterly table Investment in Building Construction (https://www150.statcan.gc.ca/n1/en/surveys/5014)
A: it is also possible to download the monthly data in .CSV form, add together the months in a spreadsheet to get
annual totals
A: or if you’d prefer, you can make a custom request to our analysts (a fee may apply).Please send us your request,
clearly indicating the variables, geographic areas, time frame and data frequency (i.e., annually, monthly or quarterly)
needed at infostats@canada.ca (mailto:infostats@canada.ca) and refer to Case number: 986005 so we can access the
transcript of this live chat. Your request must include your name, address, postal code, and phone number (please do
not share your personal information on the live chat.)An agent will get in touch with you to provide an estimate and
proceed with the payment, if applicable.You may also submit your request by telephone at 1-800-263-1136.
U: thats fine! thank you very much :)
U: very helpful
U: :)!
U: have a good day!
A: glad I could help! you too
Help with user interface
U: Hello, Do you know where I can download the dataset for General Social Survey Cycle 32?
A: Please wait while I try to find this information for you.
U: Thank you
U: (I’m referring to this one: https://www23.statcan.gc.ca/imdb/p2SV.pl?... )
A: General Social Survey: Caregiving and Care Receiving, Public Use Microdata File (https://www150.statcan.gc.ca/...)
U: I’m sorry, but im having trouble finding the download link for cycle 32 (2018)
A: you have to click on ’More information’ and that will bring you to the order form
U: ohhhh, I see
U: thank you for letting me know
U: thank you for your help!
A: you’re welcome - happy to help!
Out of domain requests
U: I’m wondering how to obtain a copy of an autopsy that’s been done for a family member. Doctor office says they
don’t get the autopsy report.
A: Statistics Canada does not issue certificates for birth, death or marriage. Civil registration is a provincial jurisdiction.
Therefore, you must contact the government ministry, registry or vital statistics office of the province or territory
in which the birth, death or marriage occurred. To obtain the contact information for your provincial or territorial
government, please call Service Canada at 1-800-O-Canada (1-800-622-6232) or visit the following page: Vital
Statistics contacts (https://www.statcan.gc.ca/eng/health...) .
A: also: Coroner’s report (https://spvm.qc.ca/en/...)
2822
U: Hi, I’m looking to obtain quarterly data in regards to GDP grow (Canada), BC Housing STarts, Canada Oil Price/BBL
A: Hello, my name is Kelly C. Give me one moment as I search
U: thanks!
A: For GDP growth rates, please consult the following link: Gross domestic product (GDP) at basic prices, by industry,
monthly, growth rates
A: I’ll continue searching for the rest. One moment
A: What do you mean by BC Housing Starts?
U: I’m required to research all of the housing starts for BC on a quarterly basis
A: Define "starts".
U: unit basis
U: Housing starts are the number of new residential construction projects that have begun during any particular month
A: Perfect. Give me one moment.
A: I’m still searching. Sorry about the wait.
U: no worries
A: I would have monthly data regarding new building permits being issued.
U: thank you
A: To view/ manipulate the variables available in a data table: • Click on "Add/Remove data" • Select at least one variable in
each tab and click on “Apply” in order to view your customized table To download the data: • Click on “Download” •
Select one of the three output formats then click on the hyperlink and save the table The following video may also be
helpful to you: How to use the data tables (https://www.statcan.gc.ca/eng/sc/video/howto) .
A: I’ll have a look for oil prices. One moment.
U: yes
A: Would those geographies be enough?
A: Or are you looking for Canada only?
U: hmm
U: I would need something that pertains more to all of canada
A: What about this? Monthly average retail prices for food and other selected products (https://www150.statcan.gc.ca/t1/...)
A: Is there anything else I can do for you?
Table 19: Full Example (Sample #42) taken from the conversations dataset. URLs were updated to link to default
view (rather than filtered). See Table 1 for truncated conversation.
2823
Title: Production and value of maple products
Date range: 1924-01-01 to 2020-01-01
Dimensions: Geography, Maple products
Subject: Agriculture
Survey: Maple Products
Frequency: Annual
Geography:
ID: 1, Parent: None, Name: Canada
ID: 2, Parent: 1, Name: Nova Scotia
ID: 3, Parent: 1, Name: New Brunswick
ID: 4, Parent: 1, Name: Quebec
ID: 5, Parent: 1, Name: Ontario
Maple products:
ID: 9, Parent: None, Name: Maple products expressed as syrup, total
ID: 10, Parent: None, Name: Gross value of maple products
ID: 1, Parent: None, Name: Production of maple sugar
ID: 2, Parent: None, Name: Value of maple sugar
ID: 3, Parent: None, Name: Production of maple syrup
ID: 4, Parent: None, Name: Value of maple syrup
ID: 5, Parent: None, Name: Production of maple taffy
ID: 6, Parent: None, Name: Value of maple taffy
ID: 7, Parent: None, Name: Production of maple butter
ID: 8, Parent: None, Name: Value of maple butter
ID: 0, Note: Conversion factors: 1 gallon of syrup equals 10.0 pounds of maple sugar. One gallon of syrup weighs
13.24760 pounds. One gallon of syrup equals 10.4 pounds of taffy. Maple taffy is reported by Quebec and Nova
Scotia only and commenced reporting in 1965 and 1983 respectively. The conversion of maple taffy to syrup varies
with the density of syrup that year.
ID: 0, Note: Commercial production and value figures exclude inventory.
ID: 2, Note: Estimates produced by Nova Scotia horticulture industry specialists.
ID: 3, Note: Beginning in 1986, survey data for New Brunswick and Ontario are weighted using the number of taps
made on maple trees in the spring of the year of the most recently completed census of agriculture.
ID: 4, Note: Estimates produced by l’Institut de la statistique du Québec, Groupe AGÉCO and Fédération des
producteurs acéricoles du Québec.
ID: 5, Note: Beginning in 1986, survey data for New Brunswick and Ontario are weighted using the number of taps
made on maple trees in the spring of the year of the most recently completed census of agriculture.
ID: 9, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136
Maple syrup and other maple products. Data collected for maple butter, maple taffy and maple sugar are converted
into a maple syrup equivalent and are included in this category.”
ID: 10, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136 – Maple syrup and other maple products. Data collected for maple butter, maple taffy and
maple sugar are converted into a maple syrup equivalent and are included in this category.”
ID: 0, Note: “These products are represented by North American Product Classification System (NAPCS) Canada
2017 category 115136 – Maple syrup and other maple products. Data collected for maple butter, maple taffy and
maple sugar are converted into a maple syrup equivalent and are included in this category.”
Table 20: Full metadata of table in Section 3.3. Single lines were added to delimit the scope: Title, basic information,
item names, and footnotes. The double line was added to delimit the truncation limit of the DPR and TAPAS model
(512 tokens). The basic information is presented in Table 5.
2824
PID 18100004 (count = 170) U: hi, i was wondering if you vae any statistics on video
Title Consumer Price Index, monthly, not seasonally game sales, ot high school drop out rates? infoSeek-
adjusted ingQuestions; request;
Dates 1914-01-01 to 2021-08-01 A: Hello my name is Sylvain, how may I help you?
Dim. Geography, Products and product groups offer;
Subject Prices and price indexes U: hello, I am looking for high school drop out r ates and
Survey Consumer Price Index teen depression rates, but cant seem to find anything
Freq. Monthly that goes back more than a year request;
A: Data for High School dropouts is compiled by the
PID 11100222 (count = 125)
Provincial Education Ministry. To obtain this in-
Title Household spending, Canada, regions and
for please contact the Education Ministry of your
provinces
province inform; promise;
Dates 2010-01-01 to 2019-01-01
A: Data for mood disorder (depression) is available on
Dim. Geography, Statistic, Household expenditures,
our website from the Data Table 13- 10-0096-18
summary-level categories
(https://www150.statcan.gc.ca/...) answer;
Subject Income, pensions, spending and wealth
A: To view/ manipulate the variables available in a data
Survey Survey of Household Spending
table: • Click on "Add/Remove data" • Select at
Freq. Annual
least one variable in each tab and click on “Apply”
PID 17100005 (count = 123) in order to view your customized table To down-
Title Population estimates on July 1st, by age and load the data: • Click on “Download” • Select one
sex of the three output formats then click on the hyper-
Dates 1971-01-01 to 2021-01-01 link and save the table The following video may
Dim. Geography, Sex, Age group also be helpful to you: How to use the data tables
Subject Population and demography (https://www.statcan.gc.ca/eng/sc/video/...). inform;
Survey Annual Demographic Estimates Canada, request;
Provinces and Territories
U: Hi team, can you please help me retrieve data of pub-
Freq. Annual
lic deposits of chartered banks? infoSeekingQues-
PID 33100214 (count = 117) tions; request;
Title Canadian Business Counts, with employees, A: Please hold timeManagement;
June 2019 U: Surely. autoFeedback;
Dates 2019-01-01 to 2019-01-01 A: You can obtain this info on our web-
Dim. Geography, Employment size, North Ameri- site from our Data Table 10-10-0116-01
can Industry Classification System (NAICS) (https://www150.statcan.gc.ca/t1/...) answer;
Subject Business performance and ownership A: To view/ manipulate the variables available in a data
Survey Business Register table: • Click on "Add/Remove data" • Select at
Freq. Semi-annual least one variable in each tab and click on “Apply"
in order to view your customized table To down-
PID 35100177 (count = 104) load the data: • Click on “Download" • Select one
Title Incident-based crime statistics, by detailed vio- of the three output formats then click on the hyper-
lations, Canada, provinces, territories and Cen- link and save the table The following video may
sus Metropolitan Areas also be helpful to you: How to use the data tables
Dates 1998-01-01 to 2020-01-01 (https://www.statcan.gc.ca/eng/sc/video/...) . inform;
Dim. Geography, Violations, Statistics instruct;
Subject Crime and justice A: Please take note that this is a non-Statistics Canada
Survey Uniform Crime Reporting Survey Table. This info is provide from The Bank of Canada
Freq. Annual inform;
PID 18100005 (count = 91) U: Thanks a lot! So, how should I get down to Public
Title Consumer Price Index, annual average, not deposits? or rather which of these lines would direct
seasonally adjusted me to Public deposits? clarificationQuestion;
Dates 1914-01-01 to 2020-01-01 A: Under the category Chartered bank deposits. For
Dim. Geography, Products and product groups more info please contact The Bank of Canada on
Subject Prices and price indexes their website by using their Online Request Form
Survey Consumer Price Index (https://www.bankofcanada.ca/...) answer; inform;
Freq. Annual
Table 22: Samples of the dialogue analysis in Sec-
Table 21: Most frequently retrieved tables across all tion 3.2, which are summarized in Table 15. The speech
splits in the retrieval task. The basic information is acts are in blue at the end of each message. The re-
provided for each table. Analysis presented in Sec- maining annotations are in the supplementary materials.
tion A.3.1. Sample examples are given in Table 4.
2825
U: Thank you for your information, but I have done that already. Your previous website had a "manipu-
late" function to calculate the percentage change. I don’t see the button anymore, and wondered if you
have other schedule prepared as that function is gone.
A: Hello, my name is Jimmy N., how may I help you?
T5: Hello, my name is Jimmy N., how may I help you?
T5+D: Hello, my name is Jimmy N., how may I help you?
U: I was talking with Sylvain, will you be able to see the log?
A: Yes, please hold while I verify.
T5: Yes, please hold while I verify.
T5+D: Yes, please hold while I verify.
A: Just to clarify, you’re looking for all-items ?
T5: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index
by product group, monthly, percentage change, not seasonally adjusted, Canada, provinces, Whitehorse,
Yellowknife and Iqaluit ( )
T5+D: You can obtain this information at the following hyperlink : Table: 18-10-0004-01 Consumer Price Index by
product group, monthly, [...] and Iqaluit ( )
Table 23: Conversation #4890. Each message is separated by a horizontal line. The original conversation, in
bold, only contains messages by the user (U) and agent (A). The generated responses by T5 and T5 with top-5
DPR-retrieved titles (T5+D) predict the corresponding agent’s message (in bold).
[...]
U: Ideally, I want the percentage change for these items
R1 : Consumer Price Index, monthly, not seasonally adjusted
R2 : Monthly adjusted price index, provisional
R3 : Consumer Price Index, monthly, seasonally adjusted
R4 : Historical (real-time) releases of the industrial product price index, by major product group, monthly, inactive
R5 : Consumer Price Index (CPI) statistics, alternative measures, unadjusted and seasonally adjusted, Bank of
Canada
A: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price
Index by product group, monthly, percentage change, not seasonally adjusted, Canada, provinces,
Whitehorse, Yellowknife and Iqaluit (https://www150.statcan.gc.ca/t1/tbl1/...)
T5: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index [...]
and Iqaluit ( )
T5+D: You can obtain this information at the following hyperlink : Table: 18-10-0004-13 Consumer Price Index [...]
and Iqaluit ( )
Table 24: The second part of conversation #4890, immediately after Table 23. Ri indicates the i-th table retrieved
by DPR, which is only seen by T5+D.
2826
U: How can I find information relating to Kelowna population estimates, demography, population projection
estimates, population by age and sex
A: hello how may i help you today?
U: How can I find information relating to Kelowna population estimates, demography, population projection
estimates, population by age and sex
A: let me see what i can find on the subject for the projections information our tables only provide it to the
provincial level it does not go lowe then that <extra_id_0> ( <extra_id_0> )
R1 : Estimates and projections of population aged 0 to 29, by age group, Canada, provinces and territories
R2 : Projected population, by projection scenario, age and sex, as of July 1
R3 : Population estimates on July 1st, by age and sex
R4 : Components of projected population growth, by projection scenario
R5 : Population estimates, quarterly
A: https://www150.statcan.gc.ca/t1/tbl1/en/cv.action?pid=1710005801
T5: https://www150.statcan.gc.ca/n1/daily-quotidien/190619/dq190619f-cansim-eng.htm
T5+D: https://www150.statcan.gc.ca/n1/pub/71-607-x/71-607-x2018005-eng.htm
U: Hi there, do you know approximately how many craft distillers are operating in Canada?
A: Hello my name is Sylvain, how may I help you?
U: I was wondering if you had any information about craft distillers in Canada. Specifically, how many craft distillers
are operating in Can
A: The industries are classified according to the North American Industry Classification System (NAICS). To find
the NAICS code for an indu refer to this page (http://www23.statcan.gc.ca/imdb/...1181553) . You may start your
search by selecting one of the 20 sectors, until a six-d obtained. [...] You will find the number of business locations
by NAICS code in the following data tables: Table 33-10-0105-01 Canadian Business Counts, with employees,
December 2018 (https://www150.statcan.gc.ca/t1/...3310010501) [...] I also suggest consulting the financial data of
small and medium enterprise in Canada; these organizations have annual earning revenues ranges of $30,000 to
$20 million. T information is available by industry, province, territory and legal status of companies (incorporated
or unincorporated). This data is not available on our website; however, i available on Innovation, Science and
Economic Development Canada’s website, using their Financial Performance Data (http://www.ic.gc.ca/...)[...]
U: Hi. I want to know how many small, medium and big companies are there in canada. Thank you
A: Hello, my name is Olivier C. Please wait while i find the information you’re looking for.
R1 : Canadian Business Counts, without employees, December 2019
R2 : Canadian Business Counts, with employees, December 2019
R3 : Canadian Business Counts, without employees, June 2019
R4 : Canadian Business Counts, with employees, June 2019
R5 : Canadian Business Counts, without employees, December 2017
A: Thank you for waiting, you will find the information about the number of business per business size
if you click the following hyperlinks : Canadian Business Counts, with employees, December 2019
(https://www150.statcan.gc.ca/t1/...pid=3310022201) Canadian Business Counts, without employees,
December 2019 (https://www150.statcan.gc.ca/t1/...pid=3310022301)
T5: Thank you for waiting. You will find the information you’re looking for if you click the following hyperlink :
Canadian Business Counts, with employees, June 2019 ( )
T5+D: Thank you for waiting. The industries are classified according to the North American Industry
Classification System (NAICS). To find the NAICS code for an industry, please refer to this page
(http://www23.statcan.gc.ca/imdb/...1181553). You may start your search by selecting one of the 20 sectors,
until a six-digit NAICS is obtained. You will find the number of business locations by NAICS code in the
following data tables: Table 33-10-0214-01 Canadian Business Counts, with employees, June 2019 ( ) Table
33-10-0215-01 Canadian Business Counts, without employees, June 2019 ( ) I also suggest consulting the
financial data of small and medium enterprise in Canada; these organizations have annual earning revenues
ranges of $30,000 to $20 million. This information is available by industry, province, territory and legal
status of companies (incorporated or unincorporated). This data is not available on our website; however, it is
freely available on Innovation, Science and Economic Development Canada’s website, using their Financial
Performance Data (http://www.ic
2827
U: Do you have any data showing labour in manufacturing companies and how it is allocated between direct and
indirect labour?
A: Hello, my name is Jimmy N. Could you please specify what you mean by direct and indirect labour.
U: Direct = labour for building product Indirect = administrative, supervision, etc.
A: The data you are requesting is not available as a standard product. However, it may be available through a
custom order (a fee may apply). [...] Would you like to make this request?
A: As a standard product, we have tables about employment by industry.
U: I would just like to look at the standard product. Can you tell me where I can find it, or what it is called?
R1 : Labour force characteristics by industry, monthly, seasonally adjusted, last 5 months
R2 : Employment by industry, monthly, seasonally adjusted and unadjusted, and trend-cycle, last 5 months
R3 : Employment by industry, annual
R4 : Employment by industry, three-month moving average, unadjusted for seasonality, provinces and economic
regions, inactive
R5 : Usual hours worked by industry, annual
A: You can obtain this information at the following hyperlinks : Table: 14-10-0355-02 Employment by
industry, monthly, seasonally adjusted (x 1,000)Table: 14-10-0023-01 Labour force characteristics
by industry, annual (x 1,000)Employed labour force who worked full year, full time and reported
employment income in 2015, 2016 Census (https://www150.statcan.gc.ca/t1/...1410035502)
T5: Table: 14-10-0296-01 Labour force characteristics by industry, annual (x 1,000) ( )
T5+D: Please hold while I find the information.
2828
U: Hello
A: Hello, my name is Jimmy N., how may I help you?
U: [...] We saw this article from the Vancouver Sun which mentioned StatsCan data on business closures. [...] is
there any way for us to collect localized data for the Township, or at least the Fraser Valley
A: Unfortunately, the information is available on a national and provincial/territorial level : And for some Census
metropolitan areas
Ri : Experimental estimates for business openings and closures for Canada, provinces and territories, census
metropolitan areas, seasonally adjusted
Ri : Quarterly estimates of business entry and exit
Ri : Of businesses where at least one employee was laid off, percentage of workforce laid off and rehired due to
COVID-19, by business characteristics
Ri : Canadian Business Counts, with employees, census metropolitan areas and census subdivisions, December
2020
Ri : Manufacturing sales, by industry for 12 Selected Census Metropolitan Areas
A: Table: 33-10-0270-01 - Experimental estimates for business openings and closures
for Canada, provinces and territories, census metropolitan areas, seasonally adjusted
(https://www150.statcan.gc.ca/t1/...3310027001)
T5: However, it may be available through a custom order (a fee may apply). To obtain a free cost estimate
for this service, please send us your request, clearly indicating the variables, geographic areas, time frame
and data frequency needed, making sure to include your complete contact information (name of orga-
nization, address, postal code, and phone number) at the following email address: infostats@canada.ca
(mailto:infostats@canada.ca) or by telephone at 1-800-263-1136. Our agents are available Monday to Friday
(except holidays) from 7:30 am to 7:30 pm (Eastern Time). An agent will get in touch with you to verify the
information, provide an estimate and proceed with the payment, if applicable.
T5+D: However, it may be available through a custom order (a fee may apply). To obtain a free [...] An agent will get
in touch with you to verify the information, provide an estimate and proceed with the payment, if applicable.
Please do not share any personal information through the live chat. Would you like to make this request?
Table 32: List of tables that do not appear in the training set, but appear at least 3 times in the validation or test sets.
Table 33: Full response generation results. Selected results in Table 10.
2829