0% found this document useful (0 votes)
30 views8 pages

AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AI-Powered Text Generation for Harmonious Human-

Machine Interaction: Current State and Future Directions


Qiuyun Zhang, Bin Guo, Hao Wang, Yunji Liang, Shaoyang Hao, Zhiwen Yu
School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, P.R.China
guob@nwpu.edu.cn

Abstract—In the last two decades, the landscape of text research on personalized text generation is receiving
generation has undergone tremendous changes and is being unprecedented attention.
reshaped by the success of deep learning. New technologies for
text generation ranging from template-based methods to neural Different from prior survey papers on text generation, in
network-based methods emerged. Meanwhile, the research this overview, we introduce the most recent progress from the
objectives have also changed from generating smooth and methodology perspective and summarize the emerging
coherent sentences to infusing personalized traits to enrich the applications of text generation. According to the difference of
diversification of newly generated content. With the rapid data modalities, tasks of text generation can be divided into
development of text generation solutions, one comprehensive data-to-text, text-to-text, and image-to-text. Among them,
survey is urgent to summarize the achievements and track the data-to-text tasks include weather forecast generation,
state of the arts. In this survey paper, we present the general financial report generation and so on. Text-to-text tasks
systematical framework, illustrate the widely utilized models include news generation, text summarization, text retelling
and summarize the classic applications of text generation. and review generation are widely studied. While, the image-
to-text tasks include image captioning, image questioning &
Keywords—text generation, deep learning, dialog system answering, etc.
I. INTRODUCTION In short, the main contributions of this paper are shown
Text generation is an important research field in natural below:
language processing (NLP) and has great application • We summarize the most recent progress in text
prospects which enables computers to learn to express like generation and present the widely used models in this
human with various types of information, such as images, field.
structured data, text, etc., so as to replace human to complete
a variety of tasks. The first automatically generated text is • We provide one comprehensive collection of primary
dated back to March 17, 2014, when the Los Angeles Times applications including dialogue systems, text
reported the small earthquake occurred near Beverly hills, summarization, review generation and image caption
California by providing detailed information about the time, & visual question answer, and the key techniques
location and strength of the earthquake. The news was behind them.
automatically generated by a ‘robot reporter’, which
converted the automatically registered seismic data into text • Finally, we provide a promising research direction of
by filling in the blanks in the predefined template text [43]. text generation—the personalized text generation.
Since then, the landscape of text generation is rapidly The remaining of this paper is organized as follows.
expanding. Section 2 introduces the commonly used models in the field
At the initial stage, majority studies focused on how to of text generation. Section 3 presents the application scenarios
reduce the grammatical errors of the text to make the of these models in detail. Section 4 highlights application of
generated text more accurate, smooth and coherence. In recent personalized text generation in various fields. Section 5
years, deep learning achieved great success in many summarizes the evaluation and Section 6 concludes this paper
applications ranging from computer vision, speech processing with future work.
and natural language processing. Most recent advances in text II. THE TEXT GENERATION MODELS
generation field are based on deep learning technology. Not
only the most basic Recurrent Neural Networks (RNN) and In this section, we will introduce the basic frameworks of
Sequence to sequence (Seq2seq), but even the Generative the widely applied neural networks models for text generation
Adversarial Networks (GAN) and the Reinforcement learning including Recurrent Neural Networks (RNN), Sequence to
are widely used in the field of text generation. sequence (Seq2seq), Generative Adversarial Networks (GAN)
and Reinforcement learning.
With the help of these technologies, the generated text is
more coherent, logical and emotionally harmonious. Many A. Recurrent Neural Network
dialogue systems have brought great convenience to people's RNN is a special neural network structure, which is
lives such as Microsoft XiaoIce, Contona and Apple siri. They proposed according to the view that ‘people's cognition is
not only help people to accomplish specific tasks, but also based on past experience and memory’. Different from deep
communicate with people as a virtual partner. Nowadays, neural networks (DNN) and convolutional neural networks
researchers start to consider the research of personalized text (CNN), RNN not only considers the input of the previous
generation. Just as we adjust our speaking style according to moment, but also endows the network with a ‘memory’
the characteristics of each other in the daily communication, function of the previous content. The RNN structure is shown
the text generation process should also dynamically adjust the in Figure 1. RNN can remember the previous information and
generation strategy and the final generated content according apply it to the calculation of the current output. Thus, the
to the different profiles of the user. Therefore, now the nodes between the hidden layer are no longer connectionless
but connected, and the input of the hidden layer includes not The idea of smooth approximation was used to approximate
only the output of the input layer but also the output of the the output of the generator LSTM to solve the gradient
hidden layer at the previous moment. inducibility problem caused by the discrete data.
There are also many variations of RNN networks, such as optimize
Long Short-Term Memory (LSTM) and Gated Recurrent Unit
(GRU). The study in [36] is the pioneering application of RNN Random Generated
for the construction of language models. The experimental Generator
Input data
results show that the RNN language model outperforms the
traditional method.

o ot-1 ot ot+1 Real/


Real Data Discriminator
Fake
Figure 3. The model structure of GAN
V V V V
W st-1 st st+1 Reinforcement learning is usually a markov decision
s process in which performing an action in each state will be
W W W W rewarded (or get negative reward--punishment). The goal of
Unfold reinforcement learning is to find the optimal policy to
U U U U
maximize rewards. The dialogue generation task is in line with
x xt-1 xt xt+1
the operating mechanism of reinforcement learning. The
Figure 1. The model structure of RNN dialogue generation process can be seen as a process of
maximizing the expected rewards of generated dialogue
B. Sequence to sequence structure content.
Standard seq2seq model used two RNN networks to
Through the combination of reinforcement learning and
compose the encoder-decoder structure [7]. The first RNN
GAN, excellent results have been achieved in the field of text
encoded a sequence of symbols into a fixed length vector
generation. SeqGAN[67] used discriminator in GAN as the
representation, and the second RNN decoded the
source of reward in reinforcement learning. Before the
representation into another sequence of symbols. The encoder
discriminator was updated, the generator continuously
and decoder were jointly trained to maximize the conditional
optimized itself according to the return score of the current
probability of a target sequence given a source sequence. The
discriminator until the texts generated by the generator were
Seq2seq structure is shown in Figure 2.
absolutely true. By using the reward mechanism and the
Encoder y1 y2 y3 yT2 policy gradient technologies in reinforcement learning, the
C problem that the gradient cannot be back propagated when
GAN faces discrete data was skillfully avoided. In the interval
of training generator with reinforcement learning method, the
discriminator was trained with the original method of GAN.
III. THE TEXT GENERATION APPLICATIONS
X1 X2 X3 XT1 In this section, we summarize the classic applications of
Decoder
Figure 2. The model structure of Seq2seq text generation including dialogue systems, text
summarization, review generation and image caption & visual
While CNN cannot process sequence data with variable question answer.
length, and the input and output sequence length of RNN must
be the same, using seq2seq model to encode with RNN in A. Task-oriented Dialogue Systems
encoder stage can receive sequences with indefinite length as Dialogue systems attracted more and more attention in
input, and in decoder stage can transform the representation recent years. According to different application fields,
vector into sequences without being affected by the input dialogue systems can be divided into two categories: task-
sequence length. Thus, seq2seq is widely used in a variety of oriented and non-task-oriented dialogue systems (also known
tasks including machine translation, text summarization, as chatbots).
reading comprehension, and speech recognition, etc. Task-oriented dialogue systems help users carry out
C. GAN and Reinforcement learning specific tasks, such as restaurant reservations, travel itineraries,
GAN[15] proposed by Goodfellow consists of two parts: etc. Apple Siri and Microsoft Cortana are the representatives
one generator and one discriminator. The generator is to of the task-oriented dialogue system. Recently, deep learning
generate a false sample distribution that is closest to the real algorithm has been applied to the construction of task-oriented
samples. The discriminator is used to distinguish generated dialogue systems. Deep learning can automatically learn high-
samples and real samples. The model structure of GAN is dimensional distributed feature representation and reduce the
shown in Figure 3. burden of manual design. Using a large amount of dialogue
data to build a pure data-driven end-to-end dialogue system to
While, the original GAN supports well for the continuous directly map user input to system output is a very popular
data instead of discrete data such as text. To address this research direction now.
problem, researchers have made some fine-tuning to GAN's
structure, which brings hope for the generation of discrete data Wen et al. [62] constructed a task-oriented dialogue
[2; 22]. Zhang et al. used LSTM as generator and CNN as system by using a modular neural generation model. Neural
discriminator to implement the task of text generation [70]. network is used to realize the process of all modules, and
specific tasks of restaurant reservation was achieved. Bordes end dialogue model for the first time. Based on the Seq2Seq
et al. used the neural generation model to treat the dialogue model, the past dialogue history was mapped to the reply.
process as a mapping between the user input content and the Many existing researches have realized the importance of
model reply content, and used the encoder-decoder structure context. The simplest method is to use RNN to directly encode
to train the mapping relationship [4]. In order to solve the the dialogue sentences as a whole sequence, and obtain the
problem of dependence on external knowledge bases in task- semantic representation vector of the context, which is treated
oriented dialogue systems, Eric et al. proposed the end-to-end as additional input in the decoding stage. This method is used
key-value retrieval network in [11], which was equipped with by Yan et al. [64] to utilize the context information. Direct
an attention-based key-value retrieval mechanism over entries concatenation of all sentences may lose the relative
of a knowledge base, and could extract relevant information relationship between sentences, so researchers have proposed
from the knowledge base. In addition, the memory network, a more complex methods to extract context information. Using
variant of RNN, was proposed to store the current user's the multi-layer model to extract context information, the first
dialogue context and similar user's conversation history with level is a model of the sentence level to encode the semantic
external memory module [31]. By matching user input and information of a single sentence, and the second is a cross-
context, appropriate replies could be selected from the sentence level model, using the first layer’s output as input, to
alternative reply set. integrate all the contextual information. Tian et al. [57]
carried out experiments on three different cross-sentence
B. Non-task-oriented Dialogue Systems methods and came to the conclusion that the performance of
Known as chatbots, non-task-oriented dialogue systems the multi-layer context information extraction model
aim to communicate with humans naturally in the open outperformed the single-layer model.
context. Microsoft XiaoIce is a typical chatbots. There are two
main design methods for the non-task-oriented dialogue As stated in [63], in daily human communication, people
systems: retrieval-based method and generative method. often associate a dialogue content with related topics in their
mind. Based on this assumption, Xing et al. organized content
1) Rtrieval-based methods and selected words according to the topics for generating
The retrieval-based method directly selects the responses. The Latent Dirichlet Allocation (LDA) topic model
corresponding reply from the alternative replies of the given was used to obtain the topical information in the dialogue
corpus according to the matching principle. In [29], dual sentences, which was taken into consideration as additional
encoder model was proposed by Lowe et al. for semantic input in the decoding process. The experimental results
representation of context and reply content. Context and reply showed that the introduction of topics into the dialogue model
were respectively encoded into semantic vectors by dual RNN is constructive with improved performance. Choudhary et al.
model, and then semantic similarity was calculated by matrix
transformation. It was found that matching only from the Recently, GAN and reinforcement Learning have been
perspective of words could not achieve good results, so Zhou applied to dialogue systems. Li et al. through combining GAN
et al. proposed matching through multiple levels (word level and reinforcement learning, jointly trained two models [26].
and utterance level), and its multi-dimensional thinking The generation model aimed to generate reply sequences, and
provided direction for the following papers [71]. In [72], Zhou the discriminator was used to distinguish between human-
et al. used the encoder part of the transformer model obtain generated and machine-generated dialogue. Li et al. [25]
the multi-granularity text representation of each context and simulated dialogues between two virtual agents, using policy
reply, and then two matching matrices were calculated for the gradient to reward sequences showing three useful dialogue
representation under each granularity of utterance-response attributes: informativity, coherence, and ease of answering
pair, and the dependency information between the words in (related to forward-looking function).
utterance and the words in response was also added to the C. Text Summarization
calculation of the alignment matrix as the expression of the Text summarization is another important research
words, so as to model a deeper semantic relationship. direction in text generation which provides concise
2) Generative methods description for users by compressing and refining the original
Recently, the data-driven model has been widely studied text. Text summarization can be regarded as a process of
in the dialogue system. The pure data-driven model directly information synthesize, in which one or more input documents
trains from a large amount of dialogue data without relying on are integrated into a short abstract. Banko et al. viewed
external knowledge. Ritter et al. [49] took the reply generation summarization as a problem analogous to statistical machine
problem as a translation problem, in which the process of translation and generated headlines using statistical models for
generating replies was regarded to translate the query into selecting and ordering the summary words [3]. There are two
corresponding replies. Based on the statistical machine methods to realize text summarization: retrieval-based method
translation model, a generating probability model was and generative method which will be described in detail in the
proposed to model the dialogue system. The disadvantages of rest of this chapter.
that model are obvious. The most important one is that only 1) Retrieval-based methods
one user query is translated into the reply in the translation Retrieval-based method is a simple method by selecting a
process without considering the context information in the subset of sentences in the original document. This process can
dialogue, which is obviously unable to work properly in be thought of selecting the most central sentences in the
multiple rounds dialogues. document, which contain the necessary and sufficient
With the development of deep learning, neural generation information related to the subject of the main theme.
model began to receive attention. Sordoni et al. and Vinyals et Nenkova et al. [40] used the word frequency as a feature
al. [53; 59] began to apply RNN to construct the dialogue of the summarization. Three attributes related to word
model, and applied the neural network method to the end-to- frequency were studied: word frequency, compound function
estimating sentence importance from word frequency, and After identifying the product domain, name, and user rating,
word frequency weight adjusted based on context. Erkan et al. the model could generate a review of the corresponding rating
[12] proposed a model based on the centrality (prestige) of like “I love Disney movies but this one was not at all what I
eigenvectors, which is known as LexPageRank. This model expected. The story line was so predictable, it was dumb.”
constructed the sentence connectivity matrix based on cosine Jaech et al. [17] made full use of RNN and concatenated the
similarity. context with the word embedding at the input layer of RNN.
Experiments on language modeling and classification tasks
Svore et al. [55] proposed a new automatic summarization using three different corpora demonstrated the advantages of
method based on neural network, whose name was NetSum. this method.
The model retrieved a set of characteristics from each sentence
to help determine the importance in the document. In the [6], Almahairi et al. [1] developed two new models (BoWLF
Cheng et al. used neural network to extract abstract, and word and LMLF) to normalize the rating predictions for the
and sentence contents were extracted respectively. What is Amazon review data set using text reviews. Lei Zheng et al.
special about this work is the use of the attention mechanism. [23] proposed a new method for modeling ratings, reviews and
They directly used the scores in attention to select sentences their temporal dynamics in conjunction with RNN was
in a document, and was actually similar to pointer networks. proposed. A recurrent network was used to capture the
Cao et al. [5] used the attention mechanism to weight the temporal evolution of user and movie states, which were
sentences. The weighted basis was the correlation of directly used to predict ratings. The user’s movie rating
document sentences to query (based on attention), and thus history was used as the input of the updated status.
extracted the summary by ranking the sentences.
The problem with review generation is how to use fine-
The disadvantages of retrieval-based method include the grained attributes as input to generate more diverse and user-
similarity of selected sentences and the lack of logic among specific comments. Generation of long comments is also a
the selected sentences. challenge.
2) Generative methods E. Image Captioning & Visual Question Answering
Different from retrieval-based method, the generative With the development of social network, the task that
method is able to generate sentences that are not in the original generates captions for images received a lot of attention.
text, which requires the generative model to have stronger
ability of understanding and representation. It is difficult for 1) Image Captioning
traditional methods to achieve these abilities. Image caption is a basic multimodal problem in the field
of artificial intelligence, which connects computer vision with
Paulus et al. [45] introduced the application of natural language generation. It can be divided into two steps,
reinforcement learning method based on Seq2Seq architecture feature extraction and natural language generation. CNN is
in abstract generation. Pasunuru et al. also used reinforcement usually used as the feature extraction sub-model. It can extract
learning to generate the summarization of the article [44]. significant features, usually represented by the context vector
The theme of [50] from Facebook was attention-based NN of fixed length. This is followed by a RNN model to generate
to generate sentence summarization. Alexander M. Rush et al. the corresponding sentence. The whole structure is similar to
proposed a sentence digest model under the framework of encoder-decoder structure.
encoder-decoder. Later, this method was used in many works Jaech et al. proposed a deep Boltzmann machine in [54] to
to construct training data. Nallapati et al. [39] not only learn how to generate such multimodal data and shows that
included work on sentence compression, but also presented a the model can be used to create a fused representation by
new data set about document into a multi-sentence. This paper combining features across modes. Kiros et al. [19] introduced
added a lot of features on it, such as POS tag, TF, IDF, NER the neural language model of multimodal constraint and used
tag, etc. The feature-rich encoder proposed in this paper was CNN to learn the word representation and image features
also of great significance for other work. together. Vinyals et al. [60] proposed a generation model
D. Review Generation based on deep RNN architecture. Given the training image,
the model could be trained to maximize the probability of the
Review generation belongs to data-to-text natural target sentence.
language generation [14]. Within the field of recommender
systems, a promising application is to estimate (or generate) Socher et al. introduced a model that recognized objects in
personalized reviews that a user would write about a product, images even when there was no training data available in the
to discover their nuanced opinions about each of its individual object class [52]. And in the completely unsupervised model,
aspects [41]. In order to recommend products to users, we the accuracy was up to 90%. Mao et al. [34] proposed a
need to ultimately predict how users will react to new products. multimodal RNN (m-RNN) model to generate new sentence
However, traditional methods often discard comment text, descriptions explaining the content of images. The model was
which makes the underlying dimensions of users and products composed of two subnetworks: sentence depth RNN and
difficult to explain [35]. image depth CNN.
Huang et al. [16] determined the meaning of words Kulkarni et al. [21] introduced an automatic natural
through the context of local and global documents of words language description generation system based on image,
and explained homonyms and polysemy by learning multiple which used a lot of statistical information of text data and
embedding of each word. Tang et al. [56] generated natural computer vision recognition algorithm. The system was very
language in a specific context or context, which introduced effective in generating image-related sentences. Mitchel et al.
two text generation models, C2S and gC2S. The C2S model [37] used a new method to generate language, in which
produced semantically and syntactically coherent sentences, syntactic models were linked to computer vision detection to
and gC2C did better when the sequence became very long. generate well-formed descriptions of images by filtering out
unlikely attributes and putting objects into ordered syntactic The user embedding vector was used to adjust the dialogue
structures. Frome et al. [13] proposed a new deep visual style and content of the dialogue agent. Kottur et al. [20]
semantic embedding model which used annotated image data extended the previous model, and also carried out vector
and semantic information extracted from unannotated text to embedding of user features. Combined with the Hierarchical
identify visual objects. Recurrent Encoder-Decoder structure, the model could better
capture context-related information and considered user
2) Visual Question Answering personalized features to generate more high-quality dialogue
Visual question answering (VQA) aims to answer content.
questions about image. The inputs are one image and one
question associated with the image, and the output is one Considering the lack of dialogue data with user
answer to the question. The deep learning model of VQA personalized characteristics, Luan et al. [30] applied the multi-
usually uses CNN to acquire image’s information and RNN to task learning mechanism to the personalized reply generation.
encode the question. A small amount of personalized dialogue data was used to
train the reply generation model firstly, and then an auto-
Ma et al. applied CNN to VQA tasks [32] and provided encoder model was trained with non-conversational data and
an end-to-end convolutional framework for learning not only the parameters of the two models were shared by the multi-
images and problem representations, but also the modal task learning mechanism to obtain the generation model of
interactions between them to generate answers. Malinowsk et personalized reply. Mo et al. and Yang et al. made use of the
al. [33] also used CNN to encode image and feed the question idea of transfer learning [38; 65]. First, they trained a large
together with the image representation into the LSTM network. number of general dialogue data to generate a general reply
The system was trained to give correct answers to questions model, and then used a small amount of personalized dialogue
about images. Ren et al. [48] used neural networks and visual data to fine-tune the model with transfer learning, so that users'
semantic embedding and not included intermediate stages, personalized information could be considered when
such as object construction and image segmentation. generating reply.
There are other methods besides CNN to implement VQA Considering the different influence of user characteristics
task. Noh et al. [42] used an independent parametric predictive on the reply content, Qian et al. [46] applied the supervision
network with a GRU with the question as input and a fully mechanism to judge when to express the appropriate user
connected layer generating as output. By combining hashing profiles in the reply generation process. Liu et al. [28] built a
techniques, they reduced the complexity of constructing a two-branch neural network to automatically learn user profiles
parameter prediction network with a large number of from user dialogues, and then the deep neural network was
parameters. Yang et al. [66] proposed a learning method based used to further learn fusion representation the user queries,
on hierarchical attention network, which could help models to replies and user profiles, so as to realize the dialogue process
answer natural language questions from images. Zhu et al. [73] from the user's perspective. Zhang et al.
evaluated several basic patterns of personnel performance and
QA tasks and proposed a new LSTM model with spatial All the work above was carried out based on chatbots, and
attention that can handle 7W quality assurance tasks. the task-oriented dialogue system is also an important research
direction in the dialogue system, but there are few studies that
IV. PERSONALIZED TEXT GENERATION consider the user's personalized information in it. Joshi et al.
With the development of deep learning techniques, we [18] published the dataset of task-oriented dialogue system, in
hope computers to automatically write high-quality natural which each conversation contained the user's personalized
language text, but much of the previous research has focused information, providing data support for subsequent research.
on the generated text content, not the user's personality. In our Luo et al. [31] made use of a variant of RNN--memory
daily conversation, we will not only consider the fact content network to realize the task-oriented personalized dialogue
to produce the corresponding dialogue content, but also system. The Profile Model was used to encode the user's
consider other’s personalized profiles, to adjust our dialogue personalized information, and the Preference Model was used
style and strategy. The purpose of personalized text generation to solve the ambiguity problem of the same query when facing
is to let the computer imitate the behavior of human beings different users. At the same time, the similar user’s dialogue
and take the personalized characteristics of users into history was stored. When the reply content was extracted, the
consideration when generating text content, so as to personalized reply content for different users was generated
dynamically adjust the generated text content and generate by combining the similar user dialogue history and the user
more high-quality text. Personalized text generation are personalized feature information.
embodied in many applications, which will be briefly B. Personalized Review Generation
introduced below.
The findings of Tintarev et al. [58] indicated that users
A. Personalized Dialogue Systems mentioned different movie features when describing their
Dialogue system is one of the text generation applications favorite movies and short, personalized arguments for users
that best reflects the user's personalized profiles. In the process were more persuasive. Therefore, personalized user review
of chatting with different users, if chatbots want to bring generation contributes to better recommend products.
pleasant interactive experience to users, it needs to adjust its Radford et al. [47] demonstrated the direct influence of
dialogue strategies and reply content according to different emotional units on the process of model generation. Lipton et
characteristics of users. al. [27] built a system of giving user/item combinations to
In [24], the personalized characteristics of the user were generate the comments that users would write when reviewing
modeled for the first time by Li et al., and the persona-based the product. They designed a character-level RNN to generate
model was proposed. Different user was embedded into the personalized product reviews. The model learned the styles
hidden vector space by the similar word embedding method. and opinions of nearly a thousand different authors, using a
large number of comments from BeerAdvocate.com. The from 0 to 1, and the closer the value is to 1, the better the
model in [9] was able to generate sentences which was close machine translation results will be.
to a real user's written comments and could identify spelling
errors and domain-specific words. BLEU adopts the n-gram matching rule, through which it
calculates a proportion of N groups of words similar between
Besides RNN, the decoder structure can be LSTM and the comparison translation and the reference translation.
GRU. Zang et al. [68] introduced a deep neural network model BLEU algorithm can give relatively valuable evaluation
to generate Chinese comments from emotional scores scores quickly.
representing user opinions. In this paper, a hierarchical
(LSTM) decoder with attention consistency was proposed. 2) ROUGE (Recall-Oriented Understudy for Gisting
Dong et al. [10] proposed an attention-enhanced attribute-to- Evaluation)
sequence model to generate product reviews for given ROUGE evaluates the abstract based on the co-occurrence
attribute information, such as users, products, and ratings. information of n-gram, which is an evaluation method
Attribute encoder learned to represent input attributes as oriented to the recall rate of n-gram words. Its basic idea is to
vectors. The sequence decoder then generated comments by generate an abstract set of standard abstracts by a number of
adjusting the output of these vectors. They also introduced an experts, respectively, and then to compare that automatically
attention mechanism to syndicate comments and align words generated abstract of the system with the artificially generated
with input attributes. standard abstract. The quality of abstract is evaluated by
counting the number of overlapping basic units (n-gram, word
Sharma et al. [51] used the model similar to [10] and added sequence and word pair) between the two kind of abstract. The
loss terms to generate more compliant comments. Ni et al. [41] stability and robustness of the system are improved by
designed a review generation model that could make use of comparing the multi-expert manual abstracts. This method has
user and project information as well as auxiliary text input and become one of the general notes of abstract evaluation
aspect perception knowledge. In the encoding stage of the technique.
model, there were three encoders (sequence encoder, attribute
encoder and aspect encoder) for information integration. The B. Word vectors evaluation metrics
decoder's processing of the encoded information biased the In addition to the word overlap, another way to evaluate
GRU model toward generating phrases and statements closest the response effect is to judge the relevance of the response by
to the input. knowing the meaning of each word, and the word vector is the
basis of this evaluation method. In accordance with semantic
In addition, GAN can also be used to generate personal
distributions, a vector is assigned to each word by a method
reviews. Wang et al. [61] proposed a new punishment-based
such as word2vec, which is used to represent that word, which
goal that took a more rational approach to minimizing overall
is represented approximately by calculating the frequency that
punishment rather than maximizing rewards. Experiments and
the word appears in the corpus. All the word vector metrics
theories have shown that based on the punishment, it could
can be approximated as sentence vectors at the sentence level
force each generator to produce multiple texts of specific
through vector connection. In this way, the sentence vectors
emotional tags, rather than producing repeated but "safe" and
of candidate and target reply sentences can be obtained
"good" examples. In addition, multi-class discriminator target
respectively, and the similarity between them can be obtained
allowed the generator to focus more on generating its own
by comparing them with cosine distance.
specific emotional tag examples. The model could generate a
variety of different emotional tags of high-quality text 1) Greedy matching
perceptual language. The greedy matching method is a matrix matching method
based on word level. In the two sentences given 𝑟 and 𝑟̂ , every
V. EVALUATION METRICS
word w ∈ r will converse into word vector 𝑒𝑤 after a
With the continuous development of text generation conversion. At the same time, cosine similarity matching is
technology, the corresponding evaluation method has carried out to the maximum extent with the word vector 𝑒𝑤̂ of
gradually become an active research direction. Researchers each word sequence 𝑤 ̂ ∈ 𝑟̂ in 𝑟̂ , and the final result is the
need to use the established evaluation method to judge the mean value after all words are matched. Greedy matching was
quality of the proposed models. Good evaluation metric is a first proposed in the intelligent navigation system, and
key factor to promote the research progress. To date, there are subsequent studies have found that the optimal solution of this
two main methods for text generation evaluation: objective method tends to be the result with a large semantic similarity
evaluation metric and artificial evaluation. Objective between the center word and the reference answer.
evaluation metric is mainly divided into two aspects, the first
is the word overlapping evaluation matrix, such as BLEU and 2) Embedding average
ROUGE, the second is based on word vector evaluation The embedding average is the way to calculate a sentence
matrix, such as Greedy Matching and Embedding business. eigenvector by the word vector in the sentence. The vector of
a sentence is calculated by averaging the vectors of each word
A. Word overlap evaluation metrics in the sentence. It's a method that has been used in many NLP
1) BLEU (bilingual evaluation understudy) domains beyond the dialogue system, like calculating the
BLEU is a method to compare the n-gram of model output similarity of the text. When comparing two sentences,
and reference output and calculate the number of matched embedding average can be calculated respectively and then
fragments. To calculate this metric, you need to use machine- put the cosine similarity of both as indicators to evaluate their
translated text (called candidate docs) and some text translated similarity.
by professional translators (called reference docs). In essence,
BLEU is used to measure the degree of similarity between
machine translation text and reference text. Its value ranges
VI. FUTURE DIRECTIONS AND CONCLUSION REFERENCES
Within the last two decades, although great achievements [1] Almahairi, A., Kastner, K., Cho, K., and Courville, A. (2015).
of deep learning spur the development of text generation, it is Learning distributed representations from reviews for collaborative
filtering. In Proceedings of the 9th ACM Conference on Recommender
still at the preliminary stage with a large number of open Systems.
issues. In this section, we will highlight some pending [2] Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein gan.
questions to underpin the future of research work. arXiv preprint arXiv:1701.07875.
[3] Banko, M., Mittal, V. O., and Witbrock, M. J. (2000). Headline
A. Dataset Deficiency generation based on statistical translation. In Proceedings of the 38th
Different from computer vision and machine translation, Annual Meeting on Association for Computational Linguistics.
[4] Bordes, A., Boureau, Y.-L., and Weston, J. (2016). Learning end-to-
there is a lack of high-quality data in the field of text end goal-oriented dialog. arXiv preprint arXiv:1605.07683.
generation and it is difficult to manually label data. How to [5] Cao, Z., Li, W., Li, S., Wei, F., and Li, Y. (2016). Attsum: Joint
use a small amount of data to complete the efficient training learning of focusing and summarization with neural attention. arXiv
of the model is the primary research direction in the future. preprint arXiv:1604.00125.
[6] Cheng, J., and Lapata, M. (2016). Neural summarization by extracting
B. Ultra-long Dialogue Context sentences and words. arXiv preprint arXiv:1603.07252.
[7] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares,
Human-computer dialogue is a hot area of text generation F., Schwenk, H., et al. (2014). Learning phrase representations using
research. Although current chatbots can preliminarily RNN encoder-decoder for statistical machine translation. arXiv
understand the context, it is still difficult to grasp the ultra- preprint arXiv:1406.1078.
long text. How to effectively capture the semantic information [8] Choudhary, S., Srivastava, P., Ungar, L., and Sedoc, J. (2017).
in the text and ensure the consistency of language and logic in Domain aware neural dialog system. arXiv preprint arXiv:1708.00897.
the whole dialogue process is a hot topic in the future. [9] Costa, F., Ouyang, S., Dolog, P., and Lawlor, A. (2018). Automatic
generation of natural language explanations. In Proceedings of the
C. Co-textual Information 23rd International Conference on Intelligent User Interfaces
Companion.
Most of the existing studies only focus on the text content, [10] Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., and Xu, K. (2017).
but ignore their co-textual information. However, in reality, Learning to generate product reviews from attributes. In Proceedings
natural language is usually generated in a specific of the 15th Conference of the European Chapter of the Association for
environment, such as time, place, emotion or emotion. Computational Linguistics: Volume 1, Long Papers.
[11] Eric, M., and Manning, C. D. (2017). Key-value retrieval networks for
Therefore, only by considering these co-textual information task-oriented dialogue. arXiv preprint arXiv:1705.05414.
can the syntactically correct, semantically reasonable and [12] Erkan, G., and Radev, D. R. (2004). Lexpagerank: Prestige in multi-
reasonable text content in a specific context be generated. document text summarization. In Proceedings of the 2004 Conference
on Empirical Methods in Natural Language Processing.
D. Evaluation Metrics [13] Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., and
Text generation field is lack of unified evaluation metrics Mikolov, T. (2013). Devise: A deep visual-semantic embedding
model. In Advances in neural information processing systems.
system, the best evaluation method is conducted by artificial [14] Gatt, A., and Krahmer, E. (2018). Survey of the state of the art in
judgement. High-quality evaluation metric is crucial to the natural language generation: Core tasks, applications and evaluation.
research in the field of artificial intelligence. Only through Journal of Artificial Intelligence Research, 61, 65-170.
reasonable and unified evaluation metric can researchers [15] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley,
know whether their research work is reasonable or not. This is D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances
in neural information processing systems.
a major research gap in the future. [16] Huang, E. H., Socher, R., Manning, C. D., and Ng, A. Y. (2012).
E. Personalized text generation Improving word representations via global context and multiple word
prototypes. In Proceedings of the 50th Annual Meeting of the
Research on personalized text generation is attracting Association for Computational Linguistics: Long Papers-Volume 1.
more and more attention. Most of the existing research is [17] Jaech, A., and Ostendorf, M. (2017). Improving Context Aware
based on the encoding of user personalized profiles. How to Language Models.
[18] Joshi, C. K., Mi, F., and Faltings, B. (2017). Personalization in goal-
effectively obtain the relationship between personalized oriented dialog. arXiv preprint arXiv:1706.07503.
profiles and text content is the focus of future research. [19] Kiros, R., Salakhutdinov, R., and Zemel, R. (2014). Multimodal neural
Another problem is the impact of the lack of personalized data language models. In International Conference on Machine Learning.
on model training. How to use a small amount of personalized [20] Kottur, S., Wang, X., and Carvalho, V. (2017). Exploring Personalized
data to achieve personalized text generation is the focus of Neural Conversational Models. In IJCAI.
researchers. [21] Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., et al.
(2011). Baby talk: Understanding and generating image descriptions.
This paper gives a comprehensive introduction to the basic In Proceedings of the 24th CVPR.
[22] Kusner, M. J., and Hernández-Lobato, J. M. (2016). Gans for
concepts, commonly used models, and popular applications in
sequences of discrete elements with the gumbel-softmax distribution.
the text generation. At the same time, some unsolved arXiv preprint arXiv:1611.04051.
problems are put forward. Since there are many researchers in [23] Lei, Z., Noroozi, V., and Yu, P. S. (2017). Joint Deep Modeling of
each of these work, the relevant research results are also Users and Items Using Reviews for Recommendation. In Tenth Acm
endless, so the text is inevitably missing. We hope that this International Conference on Web Search & Data Mining.
paper will provide some help to relevant researchers in this [24] Li, J., Galley, M., Brockett, C., Spithourakis, G. P., Gao, J., and Dolan,
B. (2016). A persona-based neural conversation model. arXiv preprint
field. arXiv:1603.06155.
[25] Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., and Jurafsky, D.
ACKNOWLEDGMENT (2016). Deep reinforcement learning for dialogue generation. arXiv
This work was partially supported by the National Key preprint arXiv:1606.01541.
[26] Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., and Jurafsky, D. (2017).
R&D Program of China(2017YFB1001800), the National Adversarial learning for neural dialogue generation. arXiv preprint
Natural Science Foundation of China (No. 61772428, arXiv:1701.06547.
61725205).
[27] Lipton, Z. C., Vikram, S., and McAuley, J. (2015). Generative [51] Sharma, V., Sharma, H., Bishnu, A., and Patel, L. (2018). Cyclegen:
concatenative nets jointly learn to write and classify reviews. arXiv Cyclic consistency based product review generator from attributes. In
preprint arXiv:1511.03683. Proceedings of the 11th International Conference on Natural
[28] Liu, B., Xu, Z., Sun, C., Wang, B., Wang, X., Wong, D. F., et al. Language Generation.
(2018). Content-Oriented User Modeling for Personalized Response [52] Socher, R., Ganjoo, M., Manning, C. D., and Ng, A. (2013). Zero-shot
Ranking in Chatbots. IEEE/ACM Transactions on Audio, Speech and learning through cross-modal transfer. In Advances in neural
Language Processing (TASLP), 26(1), 122-133. information processing systems.
[29] Lowe, R., Pow, N., Serban, I., and Pineau, J. (2015). The ubuntu [53] Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., et
dialogue corpus: A large dataset for research in unstructured multi- al. (2015). A neural network approach to context-sensitive generation
turn dialogue systems. arXiv preprint arXiv:1506.08909. of conversational responses. arXiv preprint arXiv:1506.06714.
[30] Luan, Y., Brockett, C., Dolan, B., Gao, J., and Galley, M. (2017). [54] Srivastava, N., and Salakhutdinov, R. R. (2012). Multimodal learning
Multi-task learning for speaker-role adaptation in neural conversation with deep boltzmann machines. In Advances in neural information
models. arXiv preprint arXiv:1710.07388. processing systems.
[31] Luo, L., Huang, W., Zeng, Q., Nie, Z., and Sun, X. (2018). Learning [55] Svore, K., Vanderwende, L., and Burges, C. (2007). Enhancing single-
personalized end-to-end goal-oriented dialog. arXiv preprint document summarization by combining RankNet and third-party
arXiv:1811.04604. sources. In Proceedings of the 2007 joint conference on empirical
[32] Ma, L., Lu, Z., and Li, H. (2016). Learning to answer questions from methods in natural language processing and computational natural
image using convolutional neural network. In Thirtieth AAAI language learning (EMNLP-CoNLL).
Conference on Artificial Intelligence. [56] Tang, J., Yang, Y., Carton, S., Zhang, M., and Mei, Q. (2016).
[33] Malinowski, M., Rohrbach, M., and Fritz, M. (2017). Ask your Context-aware natural language generation with recurrent neural
neurons: A deep learning approach to visual question answering. networks. arXiv preprint arXiv:1611.09900.
International Journal of Computer Vision, 125(1-3), 110-135. [57] Tian, Z., Yan, R., Mou, L., Song, Y., Feng, Y., and Zhao, D. (2017).
[34] Mao, J., Xu, W., Yang, Y., Wang, J., and Yuille, A. L. (2014). Explain How to make context more useful? an empirical study on context-
images with multimodal recurrent neural networks. arXiv preprint aware neural conversational models. In Proceedings of the 55th
arXiv:1410.1090. Annual Meeting of the Association for Computational Linguistics
[35] McAuley, J., and Leskovec, J. (2013). Hidden factors and hidden (Volume 2: Short Papers).
topics: understanding rating dimensions with review text. In [58] Tintarev, N., and Masthoff, J. (2007). Effective explanations of
Proceedings of the 7th ACM conference on Recommender systems. recommendations: user-centered design. In Proceedings of the 2007
[36] Mikolov, T., Karafiát, M., Burget, L., Černocký, J., and Khudanpur, ACM conference on Recommender systems.
S. (2010). Recurrent neural network based language model. In [59] Vinyals, O., and Le, Q. (2015). A neural conversational model. arXiv
Eleventh annual conference of the international speech preprint arXiv:1506.05869.
communication association. [60] Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015). Show and
[37] Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., et tell: A neural image caption generator. In Proceedings of the IEEE
al. (2012). Midge: Generating image descriptions from computer conference on computer vision and pattern recognition.
vision detections. In Proceedings of the 13th Conference of the [61] Wang, K., and Wan, X. (2018). SentiGAN: Generating Sentimental
European Chapter of the Association for Computational Linguistics. Texts via Mixture Adversarial Networks. In IJCAI.
[38] Mo, K., Zhang, Y., Li, S., Li, J., and Yang, Q. (2018). Personalizing a [62] Wen, T.-H., Vandyke, D., Mrksic, N., Gasic, M., Rojas-Barahona, L.
dialogue system with transfer reinforcement learning. In Thirty- M., Su, P.-H., et al. (2016). A network-based end-to-end trainable
Second AAAI Conference on Artificial Intelligence. task-oriented dialogue system. arXiv preprint arXiv:1604.04562.
[39] Nallapati, R., Zhou, B., Gulcehre, C., and Xiang, B. (2016). [63] Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M., et al. (2017).
Abstractive text summarization using sequence-to-sequence rnns and Topic aware neural response generation. In Thirty-First AAAI
beyond. arXiv preprint arXiv:1602.06023. Conference on Artificial Intelligence.
[40] Nenkova, A., Vanderwende, L., and McKeown, K. (2006). A [64] Yan, R., Song, Y., and Wu, H. (2016). Learning to respond with deep
compositional context sensitive multi-document summarizer: neural networks for retrieval-based human-computer conversation
exploring the factors that influence summarization. In Proceedings of system. In Proceedings of the 39th International ACM SIGIR
the 29th annual international ACM SIGIR conference on Research conference on Research and Development in Information Retrieval.
and development in information retrieval. [65] Yang, M., Zhao, Z., Zhao, W., Chen, X., Zhu, J., Zhou, L., et al. (2017).
[41] Ni, J., and McAuley, J. (2018). Personalized Review Generation by Personalized response generation via domain adaptation. In
Expanding Phrases and Attending on Aspect-Aware Representations. Proceedings of the 40th International ACM SIGIR Conference on
In Proceedings of the 56th Annual Meeting of the Association for Research and Development in Information Retrieval.
Computational Linguistics (Volume 2: Short Papers). [66] Yang, Z., He, X., Gao, J., Deng, L., and Smola, A. (2016). Stacked
[42] Noh, H., Hongsuck Seo, P., and Han, B. (2016). Image question attention networks for image question answering. In Proceedings of
answering using convolutional neural network with dynamic the IEEE conference on computer vision and pattern recognition.
parameter prediction. In Proceedings of the IEEE conference on [67] Yu, L., Zhang, W., Wang, J., and Yu, Y. (2017). Seqgan: Sequence
computer vision and pattern recognition. generative adversarial nets with policy gradient. In Thirty-First AAAI
[43] Oremus, W. (2014). The first news report on the LA earthquake was Conference on Artificial Intelligence.
written by a robot. Slate. In. [68] Zang, H., and Wan, X. (2017). Towards automatic generation of
[44] Pasunuru, R., and Bansal, M. (2018). Multi-reward reinforced product reviews from aspect-sentiment scores. In Proceedings of the
summarization with saliency and entailment. arXiv preprint 10th International Conference on Natural Language Generation.
arXiv:1804.06451. [69] Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., and Weston,
[45] Paulus, R., Xiong, C., and Socher, R. (2017). A deep reinforced model J. (2018). Personalizing Dialogue Agents: I have a dog, do you have
for abstractive summarization. arXiv preprint arXiv:1705.04304. pets too? arXiv preprint arXiv:1801.07243.
[46] Qian, Q., Huang, M., Zhao, H., Xu, J., and Zhu, X. (2018). Assigning [70] Zhang, Y., Gan, Z., and Carin, L. (2016). Generating text via
Personality/Profile to a Chatting Machine for Coherent Conversation adversarial training. In NIPS workshop on Adversarial Training.
Generation. In IJCAI. [71] Zhou, X., Dong, D., Wu, H., Zhao, S., Yu, D., Tian, H., et al. (2016).
[47] Radford, A., Jozefowicz, R., and Sutskever, I. (2017). Learning to Multi-view response selection for human-computer conversation. In
generate reviews and discovering sentiment. arXiv preprint Proceedings of the 2016 Conference on Empirical Methods in Natural
arXiv:1704.01444. Language Processing.
[48] Ren, M., Kiros, R., and Zemel, R. (2015). Exploring models and data [72] Zhou, X., Li, L., Dong, D., Liu, Y., Chen, Y., Zhao, W. X., et al.
for image question answering. In Advances in neural information (2018). Multi-turn response selection for chatbots with deep attention
processing systems. matching network. In Proceedings of the 56th Annual Meeting of the
[49] Ritter, A., Cherry, C., and Dolan, W. B. (2011). Data-driven response Association for Computational Linguistics (Volume 1: Long Papers).
generation in social media. In Proceedings of the conference on [73] Zhu, Y., Groth, O., Bernstein, M., and Fei-Fei, L. (2016). Visual7w:
empirical methods in natural language processing. Grounded question answering in images. In Proceedings of the IEEE
[50] Rush, A. M., Chopra, S., and Weston, J. (2015). A neural attention conference on computer vision and pattern recognition.
model for abstractive sentence summarization. arXiv preprint
arXiv:1509.00685.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy