AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
Abstract—In the last two decades, the landscape of text research on personalized text generation is receiving
generation has undergone tremendous changes and is being unprecedented attention.
reshaped by the success of deep learning. New technologies for
text generation ranging from template-based methods to neural Different from prior survey papers on text generation, in
network-based methods emerged. Meanwhile, the research this overview, we introduce the most recent progress from the
objectives have also changed from generating smooth and methodology perspective and summarize the emerging
coherent sentences to infusing personalized traits to enrich the applications of text generation. According to the difference of
diversification of newly generated content. With the rapid data modalities, tasks of text generation can be divided into
development of text generation solutions, one comprehensive data-to-text, text-to-text, and image-to-text. Among them,
survey is urgent to summarize the achievements and track the data-to-text tasks include weather forecast generation,
state of the arts. In this survey paper, we present the general financial report generation and so on. Text-to-text tasks
systematical framework, illustrate the widely utilized models include news generation, text summarization, text retelling
and summarize the classic applications of text generation. and review generation are widely studied. While, the image-
to-text tasks include image captioning, image questioning &
Keywords—text generation, deep learning, dialog system answering, etc.
I. INTRODUCTION In short, the main contributions of this paper are shown
Text generation is an important research field in natural below:
language processing (NLP) and has great application • We summarize the most recent progress in text
prospects which enables computers to learn to express like generation and present the widely used models in this
human with various types of information, such as images, field.
structured data, text, etc., so as to replace human to complete
a variety of tasks. The first automatically generated text is • We provide one comprehensive collection of primary
dated back to March 17, 2014, when the Los Angeles Times applications including dialogue systems, text
reported the small earthquake occurred near Beverly hills, summarization, review generation and image caption
California by providing detailed information about the time, & visual question answer, and the key techniques
location and strength of the earthquake. The news was behind them.
automatically generated by a ‘robot reporter’, which
converted the automatically registered seismic data into text • Finally, we provide a promising research direction of
by filling in the blanks in the predefined template text [43]. text generation—the personalized text generation.
Since then, the landscape of text generation is rapidly The remaining of this paper is organized as follows.
expanding. Section 2 introduces the commonly used models in the field
At the initial stage, majority studies focused on how to of text generation. Section 3 presents the application scenarios
reduce the grammatical errors of the text to make the of these models in detail. Section 4 highlights application of
generated text more accurate, smooth and coherence. In recent personalized text generation in various fields. Section 5
years, deep learning achieved great success in many summarizes the evaluation and Section 6 concludes this paper
applications ranging from computer vision, speech processing with future work.
and natural language processing. Most recent advances in text II. THE TEXT GENERATION MODELS
generation field are based on deep learning technology. Not
only the most basic Recurrent Neural Networks (RNN) and In this section, we will introduce the basic frameworks of
Sequence to sequence (Seq2seq), but even the Generative the widely applied neural networks models for text generation
Adversarial Networks (GAN) and the Reinforcement learning including Recurrent Neural Networks (RNN), Sequence to
are widely used in the field of text generation. sequence (Seq2seq), Generative Adversarial Networks (GAN)
and Reinforcement learning.
With the help of these technologies, the generated text is
more coherent, logical and emotionally harmonious. Many A. Recurrent Neural Network
dialogue systems have brought great convenience to people's RNN is a special neural network structure, which is
lives such as Microsoft XiaoIce, Contona and Apple siri. They proposed according to the view that ‘people's cognition is
not only help people to accomplish specific tasks, but also based on past experience and memory’. Different from deep
communicate with people as a virtual partner. Nowadays, neural networks (DNN) and convolutional neural networks
researchers start to consider the research of personalized text (CNN), RNN not only considers the input of the previous
generation. Just as we adjust our speaking style according to moment, but also endows the network with a ‘memory’
the characteristics of each other in the daily communication, function of the previous content. The RNN structure is shown
the text generation process should also dynamically adjust the in Figure 1. RNN can remember the previous information and
generation strategy and the final generated content according apply it to the calculation of the current output. Thus, the
to the different profiles of the user. Therefore, now the nodes between the hidden layer are no longer connectionless
but connected, and the input of the hidden layer includes not The idea of smooth approximation was used to approximate
only the output of the input layer but also the output of the the output of the generator LSTM to solve the gradient
hidden layer at the previous moment. inducibility problem caused by the discrete data.
There are also many variations of RNN networks, such as optimize
Long Short-Term Memory (LSTM) and Gated Recurrent Unit
(GRU). The study in [36] is the pioneering application of RNN Random Generated
for the construction of language models. The experimental Generator
Input data
results show that the RNN language model outperforms the
traditional method.