0% found this document useful (0 votes)

70 views8 pages

A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG

This document summarizes a research paper that introduces a probabilistic model called the Log-Bilinear Document Model. The model learns word representations from document-level word co-occurrence data in an unsupervised manner. It models documents as mixtures over words, where each word's probability is determined by an inner product between its vector and a document-specific weighting vector. The model provides a theoretically grounded approach for learning semantic word vectors from documents.

Uploaded by

Abbé Busoni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views8 pages

A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG

Uploaded by

Abbé Busoni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Probabilistic Model for Semantic Word Vectors

Andrew L. Maas and Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 [amaas, ang]@cs.stanford.edu

Abstract
Vector representations of words capture relationships in words functions and meanings. Many existing techniques for inducing such representations from data use a pipeline of hand-coded processing techniques. Neural language models offer principled techniques to learn word vectors using a probabilistic modeling approach. However, learning word vectors via language modeling produces representations with a syntactic focus, where word similarity is based upon how words are used in sentences. In this work we wish to learn word representations to encode word meaning semantics. We introduce a model which learns semantically focused word vectors using a probabilistic model of documents. We evaluate the models word vectors in two tasks of sentiment analysis.

Introduction

Word representations are a critical component of many natural language processing systems. Representing words as indices in a vocabulary fails to capture the rich structure of synonymy and antonymy among words. Vector representations encode continuous similarities between words as distance or angle between word vectors in a high-dimensional space. Word representation vectors have proved useful in tasks such as named entity recognition, part of speech tagging, and document retrieval [23, 6, 21]. Neural language models [2, 6, 14, 15] induce word vectors by back-propagating errors in a language modeling task through nonlinear neural networks, or linear transform matrices. Language modeling, predicting the next word in a sentence given a few preceding words, is primarily a syntactic task. Issues of syntax concern word function and the structural arrangement of words in a sentence, while issues of semantics concern word meaning. Learning word vectors using the syntactic task of language modeling produces representations which are syntactically focused. Word similarities with syntactic focus would pair wonderful with other highly polarized adjectives such as terrible or awful. These similarities result from the fact that these words have similar syntactic properties they are likely to occur at the same location in sentences like the food was absolutely . In contrast, word representations capturing semantic similarity would associate wonderful with words of similar meaning such as fantastic and prize-winner because they have similar meaning despite possible differences in syntactic function. The construction of neural language models makes them unable to learn word representations which are primarily semantic. Neural language models are instances of vector space models, which broadly refers to any method for inducing vector representations of words. Turney and Pantel [23] give a recent review of both syntactic and semantic vector space models. Most VSMs implement some combination of weighting, smoothing, and dimension reducing a word association matrix (e.g. TF-IDF weighting). For semantic or syntactic word representations VSMs use a term-document or word-context matrix respectively for word association. For each VSM processing stage there are dozens of possibilities, making the design space of VSMs overwhelming. Furthermore, many methods have little theo1

retical foundation and a particular weighting or dimension reduction technique is selected simply because it has been shown to work in practice. Neural language models offer a VSM for syntactic word vectors which has a complete probabilistic foundation. The present work offers a similarly well-founded probabilistic model which learns semantic, as opposed to syntactic, word vectors. This work develops a model which learns semantically oriented word vectors using unsupervised learning. Word vectors are discovered from data as part of a probabilistic model of word occurrence in documents similar to a probabilistic topic model. Learning vectors from document-level word co-occurrence allows our model to learn word representations based on the topical information conveyed by words. Building a VSM with probabilistic foundation allows us to offer a principled solution to word vector learning in place of the hand-designed processing pipelines typically used. Our experiments show that our model learns vectors more suitable for document-level tasks when compared with other VSMs.

Log-Bilinear Language Model

Prior work introduced neural probabilistic language models [2], which predict the nth word in a sequence given the n 1 preceding context words. More formally, a model denes a distribution P (wn |w1:n1 ) where the number of context words is often small (n 6). Neural language models encode this distribution using word vectors. Let w be the vector representation of word w, a neural language model uses P (wn |w1:n1 ) = P (wn |1:n1 ) Mnih and Hinton [14] introduce a neural language model which uses a log-bilinear energy function (lblLm). The model parametrizes the log probability of a word occurring in a given context using an inner product of the form
n1

log p(wn |w1:n1 ) = log p(wn |1:n1 )

n ,
i=1

T i Ci

(1)

This is an inner product between the query words representation n and a sum of the context words representations after each is transformed by a position specic matrix Ci . The vectors learned as part of the language modeling task are useful features for syntactic natural language processing tasks such as named-entity recognition and chunking [21]. As a VSM, the lblLm is a theoretically well-founded approach to learning syntactic word representations from word-context information. The lblLm method does not provide a tractable solution for inducing word vectors from termdocument data. The model introduces a transform matrix Ci for each context word, which causes the number of model parameters to grow linearly as the number of context words increases. For 100-dimensional word representation vectors, each Ci contains 104 parameters, which makes for an unreasonably large number of parameters when trying to learn representations from documents containing hundreds or thousands of words. Furthermore it is unclear how the model could handle documents of variable length, or if predicting a single word given all other words in the document is a good objective for training semantic word representations. Though the details of other neural language models differ, they face similar challenges in learning semantic word vectors because of their parametrization and language modeling objective.

Log-Bilinear Document Model

We now introduce a model which learns word representations from term-document information using principles similar to those used in the lblLm and other neural language models. However unlike previous work in the neural language model literature our model naturally handles termdocument data to learn semantic word vectors. We derive a probabilistic model with log-bilinear energy function to model the bag of words distribution of a document. This approach naturally handles long, variable length documents, and learns representations sensitive to long-range word correlations. Maximum likelihood learning can then be efciently performed with coordinate ascent optimization. 2

3.1

Model

Starting with the broad goal of matching the empirical distribution of words in a document, we model a document using a continuous mixture distribution over words indexed by a random variable . We assume words in a document are conditionally independent given the mixture variable . We assign a probability to a document d using a joint distribution over the document and a random variable . The model assumes each word wi d is conditionally independent of the other words given . The probability of a document is thus,
N

p( d ) =

p(d, )d =

p( )
i=1

p(wi |)d.

(2)

Where N is the number of words in d and wi is the ith word in d. We use a Gaussian prior on . We dene the conditional distribution p(wi |) using a log-linear model with parameters R and b. The energy function uses a word representation matrix R R( x |V |) where each word w (represented as a one-hot vector) in the vocabulary V has a -dimensional vector representation w = Rw corresponding to that words column in R. The random variable is also a -dimensional vector, R( ) which weights each of the dimensions of words representation vectors. We additionally introduce a bias bw for each word to capture differences in overall word frequencies. The energy assigned to a word w given these model parameters is, E (w; , w , bw ) = T w bw . To obtain the nal distribution p(w|) we use a softmax, p(w|; R, b) = exp(E (w; , w , bw )) = w V exp(E (w ; , w , bw )) exp(T w + bw ) . T w V exp( w + bw ) (3)

(4)

The number of terms in the denominators summation grows linearly in |V |, making exact computation of the distribution possible. For a given , a word ws occurrence probability is proportional to how closely its representation vector w matches the scaling direction of This idea is similar to the word vector inner product used in the lblLm model. Equation 2 resembles the probabilistic model of latent Dirichlet allocation (LDA) [3], which models documents as mixtures of latent topics. One could view the entries of a word vector as that words association strength with respect to each latent topic dimension. The random variable then denes a weighting over topics. However, our model does not attempt to model individual topics, but instead directly models word probabilities conditioned on the topic weighting variable . Because of the log-linear formulation of the conditional distribution, is a vector in R and not restricted to the unit simplex as it is in LDA. 3.2 Learning

Given a document collection D, we assume documents are i.i.d samples and denote the k th document as dk . We wish to learn model parameters R and b to maximize,
Nk

max p(D; R, b) =
R,b dk D

p( )
i=1

p(wi |; R, b)d.

(5)

Using MAP estimates for , we approximate this learning problem as,

max
R,b dk D

k ) p(
i=1

k ; R, b), p( w i |

(6)

k denotes the MAP estimate of for dk . We introduce a regularization term for the word where representation matrix R. The word biases b are not regularized reecting the fact that we want the biases to capture whatever overall word frequency statistics are present in the data. By taking the logarithm and simplifying we obtain the nal learning problem,
Nk

max ||R||2 F +
R,b dk D

k ||2 + || 2
i=1

k ; R, b). log p(wi |

(7)

The free parameters in the model are the regularization weight and the word vector dimensionality . We use a single regularization weight for R and because the two are linearly linked in the conditional distribution p(w|; R, b). The problem of nding optimal values for R and b requires optimization of the non-convex objective function. We use coordinate ascent, which rst optimizes the word representations (R and b) ) xed. Then we nd the new MAP estimate for each document while leaving the MAP estimates ( while leaving the word representations xed, and continue this process until convergence. The optik because we have a low-dimensional, mization algorithm quickly nds a global solution for each k . Because the MAP estimation problems for different documents are inconvex problem in each dependent, we can solve them on separate machines in parallel. This facilitates scaling the model to document collections with hundreds of thousands of documents.

Experiments

We evaluate our model with document-level and sentence-level categorization tasks in the domain of online movie reviews. These are sub-tasks of sentiment analysis which has recently received much attention as a challenging set of problems in natural language processing [4, 18, 22]. In both tasks we compare our model with several existing methods for word vector induction, and previously reported results from the literature. We also qualitatively evaluate the models word representations by visualizing word similarities. 4.1 Word Representation Learning

We induce word representations with our model using 50,000 movie reviews from The Internet Movie Database (IMDB). Because some movies receive substantially more reviews than others, we limited ourselves to including at most 30 reviews from any movie in the collection. Previous work [5] shows function and negating words usually treated as stop words are in fact indicative of sentiment, so we build our dictionary by keeping the 20,000 most frequent unigram tokens without stop word removal. Additionally, because certain non-word tokens (e.g. ! and :-) ) are indicative of sentiment, we allow them in our vocabulary. As a qualitative assessment of word representations, we visualize the words most similar to a query word using vector similarity of the learned representations. Given a query word w and another word w we obtain their vector representations w and w , and evaluate their cosine similarity as T w w Similarity (w , w ) = ||w |||| w || . By assessing the similarity of w with all other words w in the vocabulary we can nd the words deemed most similar by the model. Cosine similarity is often used with word vectors because it ignores differences in magnitude. Table 1 shows the most similar words to given query words using our models word representations. The vector similarities capture our intuitive notions of semantic similarity. The most similar words have a broad range of part-of-speech and functionality, but adhere to the theme suggested by the query word. Previous work on term-document VSMs demonstrated similar results, and compared the recovered word similarities to human concept organization [12, 20]. Table 1 also shows the most similar words to query words using word vectors trained via the lblLm on news articles (obtained already trained from [21]). Word similarities captured by the neural language model are primarily syntactic where part of speech similarity dominates semantic similarity. Word vectors obtained from LDA perform poorly on this task (not shown), presumably because LDA word/topic distributions do not meaningfully embed words in a vector space. 4.2 Other Word Representations

We implemented several alternative vector space models for comparison. With the exception of the lblLm, we induce word representations for each of the models using the same training data used to induce our own word representations. Latent Semantic Analysis (LSA) [7]. One of the most commonly used tools in information retrieval, LSA applies the singular value decomposition (SVD) algorithm to factor a term-document 4

Table 1: Similarity of learned word vectors. The ve words most similar to the target word (top row) using cosine similarity applied to the word vectors discovered by our model and the log-bilinear language model. romance romantic love chemistry relationship drama colours paintings joy diet craftsmanship mothers lesbian mother jewish mom tolerance parents families veterans patients adults murder murdered crime murders committed murderer fraud kidnapping rape corruption conspiracy comedy funny laughs hilarious serious few drama monster slogan guest mentality awful terrible horrible ridiculous bad stupid unsettling vice energetic hires unbelievable amazing absolutely fantastic truly incredible extremely unbelievable incredible obvious perfect clear

Our Model

LblLm

co-occurrence matrix. To obtain a k -dimensional representation for a given word, only the entries corresponding to the k largest singular values are taken from the words basis in the factored matrix. Latent Dirichlet Allocation (LDA) [3]. LDA is a probabilistic model of documents which assumes each document is a mixture of latent topics. This model is often used to categorize or cluster documents by topic. For each latent topic, the model learns a conditional distribution p(word|topic) for the probability a word occurs within the given topic. To obtain a k -dimensional vector representation of each word w, we use each p(w|topic) value in the vector after training a k -topic model on the data. We normalize this vector to unit length because more frequent words often have high probability in many topics. To train the LDA model we use code released by the authors of [3]. When training LDA we remove from our vocabulary very frequent and very rare words. Log-Bilinear Language Model (lblLm) [15]. This is the model given in [14] and discussed in section 2, but extended to reduce training time. We obtained the word representations from this model used in [21] which were trained on roughly 37 million words from a news corpus with a context window of size ve. 4.3 Sentiment Classication

Our rst evaluation task is document-level sentiment classication. A classier must predict whether a given review is positive or negative (thumbs up vs. thumbs down) given only the text of the review. As a document-level categorization task, sentiment classication is substantially more difcult than topic-based categorization [22]. We chose this task because word vectors trained using termdocument matrices are most commonly used in document-level tasks such as categorization and retrieval. The evaluation dataset is the polarity dataset version 2.0 introduced by Pang and Lee1 [17]. This dataset consists of 2,000 movie reviews, where each is associated with a binary sentiment polarity label. We report 10-fold cross validation results using the authors published folds to make our results comparable to those previously reported in the literature. We use a linear support vector machine classier trained with LibLinear [8] and set the SVM regularization parameter to the same value used in [18, 17]. Because we are interested in evaluating the capabilities of various word representation learners, we use as features the mean representation vector, an average of the word representations for all words present in the document. The number of times a word appears in a document is often used as a feature when categorizing documents by topic. However, previous work found a binary indicator of whether or not the word is present to be a more useful feature in sentiment classication [22, 18]. For this reason we used term presence for our bag-of-words features. We also evaluate performance using mean representation vectors concatenated with the original bag-of-words vector. In all cases we normalize each feature vector to unit norm, and following the technique of [21] scale word representation matrices to have unit standard deviation.
1

http://www.cs.cornell.edu/people/pabo/movie-review-data

Table 2: Sentiment classication results on the movie review dataset from [17]. Features labeled with mean are arithmetic means of the word vectors for words present in the review. Our models representation outperforms other word vector methods, and is competitive with systems specially designed for sentiment classication. Features Bag of Words (BoW) LblLm Mean LDA Mean LSA Mean Our Method Mean LblLm Mean + BoW LDA Mean + BoW LSA Mean + BoW Our Method Mean + BoW BoW SVM reported in [17] Contextual Valence Shifters [11] TF-IDF Weighting [13] Appraisal Taxonomy [25] Accuracy (%) 86.75 71.30 66.70 77.45 88.50 86.10 86.70 85.25 89.35 87.15 86.20 88.10 90.20

Table 2 shows the classication performance of our method, other VSMs we implemented, and previously reported results from the literature. Our methods features clearly outperform those of other VSMs. On its own, our methods word vectors outperform bag-of-words features with two orders of magnitude fewer features. When concatenated with the bag-of-words features, our method is competitive with previously reported results which use models engineered specically for the task of sentiment classication. To our knowledge, the only method which outperforms our models mean vectors concatenated with bag-of-words features is the work of Whitelaw et al [25]. This work builds a feature set of adjective phrases expressing sentiment using hand-selected words indicative of sentiment, WordNet, and online thesauri. That such a task-specic model narrowly outperforms our method is evidence for the power of unsupervised feature learning. 4.4 Subjectivity Detection

As a second evaluation task, we performed sentence-level subjectivity classication. In this task, a classier is trained to decide whether a given sentence is subjective, expressing the writers opinions, or objective, expressing purely facts. We used the dataset of Pang and Lee [17] which gathered subjective sentences from movie review summaries and objective sentences from movie plot summaries. This task is substantially different from the review classication task because it uses sentences as opposed to entire documents and the target concept is subjectivity instead of opinion polarity. We randomly split the 10,000 examples into 10 folds and report 10-fold cross validation accuracy using the SVM training protocol of [17]. Table 3 shows classication accuracies from the sentence subjectivity experiment. Our model provided superior features when compared against other VSMs, and slightly outperformed the bag-ofwords baseline. Further improvement over the bag-of-words baseline is obtained by concatenating the two sets of features together.

Related Work

Prior work has developed several models to learn word representations via a probabilistic language modeling objective. Mnih and Hinton [14, 15] introduced an energy-based log-bilinear model for word representations following earlier work on neural language models [2, 16]. Successful application of these word representation learners and other neural network models include semantic role labeling, chunking, and named entity recognition [6, 21]. In contrast to the syntactic focus of language models, probabilistic topic models aim to capture document-level correlations among words [20]. Our probabilistic model is similar to LDA [3], 6

Table 3: Sentence subjective/objective classication accuracies using the movie review subjectivity dataset of [17]. Features labeled with mean are arithmetic means of the word vectors for words present in the sentence. Features Bag of Words (BoW) LblLm Mean LDA Mean LSA Mean Our Method Mean LblLm Mean + BoW LDA Mean + BoW LSA Mean + BoW Our Method Mean + BoW BoW SVM reported in [17] Accuracy (%) 90.25 78.45 66.65 84.11 90.36 87.29 88.82 88.75 91.54 90

which is related to pLSI [10]. However, pLSI doesnt give a well-dened probabilistic model over previously unseen novel documents. The recently introduced replicated softmax model [19] uses an undirected graphical model to learn topics in a document collection. Turney and Pantel [23] offer an extensive review of VSMs which employ a matrix factorization technique after applying some weighting or smoothing operation to the matrix entries. Several recent techniques learn word representations in a principled manner as part of an application of interest. These applications include retrieval and ranking systems [1, 9], and systems to represent images and textual tags in the same vector space [24]. Our work learns word representations via the more basic task of topic modeling as compared to these more specialized representation learners.

Discussion

We presented a vector space model which learns semantically sensitive word representations via a probabilistic model of word occurrence in documents. Its probabilistic foundation gives a theoretically justied technique for word vector induction as an alternative to the overwhelming number of matrix factorization-based techniques commonly used. Our model is parametrized as a log-bilinear model following recent success in using similar techniques for language models [2, 6, 14, 15]. By assuming word order independence and replacing the language modeling objective with a document modeling objective, our model captures word relations at the document level. Our models foundation is closely related to probabilistic latent topic models [3, 20]. However, we parametrize our topic model in a manner which aims to capture word representations instead of latent topics. In our experiments, our method performed better than LDA which models latent topics directly. We demonstrated the utility of our learned word vectors on two tasks of sentiment classication. Both were tasks of a semantic nature, and our methods word vectors performed better than word vectors trained with the more syntactic objective of language modeling. Using the mean of word vectors to represent documents ignores vast amounts of information that could help categorization negated phrases for example. Future work could better capture the information conveyed by words in sequence using convolutional models over word vectors.

Acknowledgments We thank Chris Potts, Dan Ramage, Richard Socher, and Chris Manning for insightful discussions. This work is supported by the DARPA Deep Learning program under contract number FA8650-10C-7020. 7

References
[1] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Supervised semantic indexing. In Proceeding of CIKM, 2009. [2] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3(6):11371155, August 2003. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4-5):9931022, May 2003. [4] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classication. In Proceedings of the ACL, 2007. [5] C. K. Chung and J. W. Pennebaker. The psychological function of function words. Social Communication, pages 343359, 2007. [6] R. Collobert and J. Weston. A unied architecture for natural language processing. Proceedings of the 25th ICML, 2008. [7] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391 407, September 1990. [8] R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classication. The Journal of Machine Learning Research, 9:18711874, 2008. [9] D. Grangier, F. Monay, and S. Bengio. A discriminative approach for the retrieval of images from text queries. In Proceedings of the ECML, 2006. [10] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR, 1999. [11] A. Kennedy and D. Inkpen. Sentiment classication of movie reviews using contextual valence shifters. Computational Intelligence, 22(2):110125, May 2006. [12] T. Landauer, P. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse Processes, 25(2):259284, 1998. [13] J. Martineau and T. Finin. Delta tdf: An improved feature space for sentiment analysis. In Proceedings of the third AAAI internatonal conference on weblogs and social media, 2009. [14] A. Mnih and G. E. Hinton. Three new graphical models for statistical language modelling. In Proceedings of the 24th ICML, 2007. [15] A. Mnih and G. E. Hinton. A scalable hierarchical distributed language model. In Neural Information Processing Systems, volume 22, 2009. [16] F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on articial intelligence and statistics, 2005. [17] B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL, volume 2004, 2004. [18] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classication using machine learning techniques. In Empirical methods in natural language processing, 2002. [19] R. Salakhutdinov and G. E. Hinton. Replicated softmax: an undirected topic model. In Advances in Neural Information Processing Systems, volume 22, 2009. [20] M. Steyvers and T. L. Grifths. Probabilistic Topic Models. In Latent Semantic Analysis: A Road to Meaning, 2006. [21] J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the ACL, 2010. [22] P. D. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classication of reviews. In Proceedings of the ACL, 2002. [23] P. D. Turney and P. Pantel. From Frequency to Meaning : Vector Space Models of Semantics. Journal of Articial Intelligence Research, 37:141188, 2010. [24] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. In Proceedings of the ECML, 2010. [25] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal taxonomies for sentiment analysis. In Proceedings of CIKM, 2005.

Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Efficient Estimation of Word Representations in Vector Space
No ratings yet
Efficient Estimation of Word Representations in Vector Space
12 pages
A Cilinical Assessment On The Efficacy of The Anterior and Middle Superior Alveolar Nerve Block Tehnique During e
No ratings yet
A Cilinical Assessment On The Efficacy of The Anterior and Middle Superior Alveolar Nerve Block Tehnique During e
6 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Word Embedding
No ratings yet
Word Embedding
35 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Ke Unit 4 Notes
No ratings yet
Ke Unit 4 Notes
22 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Word Vectors I
No ratings yet
Word Vectors I
23 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Unit IV
No ratings yet
Unit IV
58 pages
Unit IV
No ratings yet
Unit IV
57 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
Linux Fundamentals
100% (2)
Linux Fundamentals
166 pages
Wordembed
No ratings yet
Wordembed
31 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Introduction To Parallel Programming: Center For Institutional Research Computing
No ratings yet
Introduction To Parallel Programming: Center For Institutional Research Computing
98 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
A Latent Variable Model Approach To PMI-based Word Embeddings
No ratings yet
A Latent Variable Model Approach To PMI-based Word Embeddings
16 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Update Standards - April 2018
0% (1)
Update Standards - April 2018
60 pages
Word Embeddings A Survey
No ratings yet
Word Embeddings A Survey
11 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Formal Languages, Automata and Numeration Systems 2: Applications to Recognizability and Decidability
From Everand
Formal Languages, Automata and Numeration Systems 2: Applications to Recognizability and Decidability
Michel Rigo
No ratings yet
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Marcus Ranum Keynote
No ratings yet
Marcus Ranum Keynote
40 pages
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
No ratings yet
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
20 pages
Mathematical Equality: Fundamentals and Applications
From Everand
Mathematical Equality: Fundamentals and Applications
Fouad Sabry
No ratings yet
50 Most Challenging Algebra Problems!
From Everand
50 Most Challenging Algebra Problems!
Andrei Besedin
No ratings yet
Spark SQL
No ratings yet
Spark SQL
12 pages
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
From Everand
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
Dr. Judith Coats
No ratings yet
Solving Optimal Power Flow With Voltage Constraints Using Matlab Optimization Toolbox
50% (2)
Solving Optimal Power Flow With Voltage Constraints Using Matlab Optimization Toolbox
55 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
NIPS DeepLearningWorkshop NNforText
100% (1)
NIPS DeepLearningWorkshop NNforText
31 pages
Variational Bayesian Learning of Directed Graphical Models With Hidden Variables
No ratings yet
Variational Bayesian Learning of Directed Graphical Models With Hidden Variables
44 pages
Unicast Routing Protocols
No ratings yet
Unicast Routing Protocols
31 pages
Efficient Estimation of Word Representations in Vector Space: January 2013
No ratings yet
Efficient Estimation of Word Representations in Vector Space: January 2013
13 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Turney-From Frequency To Meaning
No ratings yet
Turney-From Frequency To Meaning
48 pages
Neural Network
No ratings yet
Neural Network
23 pages
Uputstvo
No ratings yet
Uputstvo
54 pages
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
C++ Library Management System Project
No ratings yet
C++ Library Management System Project
9 pages
Human Resource Management: Course Code: MGT
No ratings yet
Human Resource Management: Course Code: MGT
17 pages
Enumeration
No ratings yet
Enumeration
33 pages
Punchout Catalog
No ratings yet
Punchout Catalog
8 pages
Configuration TO Enterprise Structure (Fi) : Welcome
No ratings yet
Configuration TO Enterprise Structure (Fi) : Welcome
50 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Semantic Compositionality Through Recursive Matrix-Vector Spaces
No ratings yet
Semantic Compositionality Through Recursive Matrix-Vector Spaces
11 pages
Hierarchical Dirichlet Processes: Ywteh@eecs - Berkeley.edu
No ratings yet
Hierarchical Dirichlet Processes: Ywteh@eecs - Berkeley.edu
34 pages
600-FKM (FYP1-LB-Rev.0) - FYP LogBook
0% (1)
600-FKM (FYP1-LB-Rev.0) - FYP LogBook
8 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
On Profiling Mobility and Predicting Locations of Campus-Wide Wireless Network Users
No ratings yet
On Profiling Mobility and Predicting Locations of Campus-Wide Wireless Network Users
15 pages
Consulting - Q1 Day 1 PDF
No ratings yet
Consulting - Q1 Day 1 PDF
4 pages
Unit 6 Homework Week November 26
No ratings yet
Unit 6 Homework Week November 26
4 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Foundation Practice Exam 4
No ratings yet
Foundation Practice Exam 4
9 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
A Neural Probabilistic Language Model by Yoshua Bengio Ducharme and Vincent 2001
No ratings yet
A Neural Probabilistic Language Model by Yoshua Bengio Ducharme and Vincent 2001
7 pages
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
No ratings yet
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
6 pages
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
No ratings yet
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
9 pages
Tree of Latent Mixtures For Bayesian Modelling and Classification of High Dimensional Data
No ratings yet
Tree of Latent Mixtures For Bayesian Modelling and Classification of High Dimensional Data
8 pages
Multi-Camera Object Detection For Robotics: Adam Coates Andrew Y. NG
No ratings yet
Multi-Camera Object Detection For Robotics: Adam Coates Andrew Y. NG
8 pages
Grasping Novel Objects With Depth Segmentation
No ratings yet
Grasping Novel Objects With Depth Segmentation
8 pages
Library Management System Project Proposal
No ratings yet
Library Management System Project Proposal
8 pages
DBMS - Quiz 004 - 10 PDF
No ratings yet
DBMS - Quiz 004 - 10 PDF
4 pages
Low-Cost Accelerometers For Robotic Manipulator Perception
No ratings yet
Low-Cost Accelerometers For Robotic Manipulator Perception
7 pages
Jordan Bailey
No ratings yet
Jordan Bailey
2 pages
Paragraph Vector PDF
No ratings yet
Paragraph Vector PDF
9 pages
Pid Feedforward Controller
No ratings yet
Pid Feedforward Controller
6 pages
Competitive Mixtures of Simple Neurons: Karthik Sridharan Matthew J. Beal Venu Govindaraju
No ratings yet
Competitive Mixtures of Simple Neurons: Karthik Sridharan Matthew J. Beal Venu Govindaraju
4 pages
Multinomijalna Raspodela
No ratings yet
Multinomijalna Raspodela
5 pages
Pencom - Visual Basic For Application Serial Port Software Example
No ratings yet
Pencom - Visual Basic For Application Serial Port Software Example
8 pages
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
No ratings yet
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
8 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Crossov Postupak-1
No ratings yet
Crossov Postupak-1
5 pages
Enriching Word Vectors With Subword Information: Piotr Bojanowski
No ratings yet
Enriching Word Vectors With Subword Information: Piotr Bojanowski
7 pages
Rsa Examples
No ratings yet
Rsa Examples
4 pages
Linguistic Regularities in Continuous Space Word Representations
No ratings yet
Linguistic Regularities in Continuous Space Word Representations
6 pages
Vector Semantics 4
No ratings yet
Vector Semantics 4
3 pages
Arduino Proposal
100% (1)
Arduino Proposal
7 pages
Word 2 Vec Representation
No ratings yet
Word 2 Vec Representation
12 pages
Nipsdlufl10 MultimodalDeepLearning
No ratings yet
Nipsdlufl10 MultimodalDeepLearning
9 pages
Banking & Finance Banking: Central Bank of India Case Study
No ratings yet
Banking & Finance Banking: Central Bank of India Case Study
1 page
SDSF
No ratings yet
SDSF
2 pages
MCQ Nmcpmix
No ratings yet
MCQ Nmcpmix
5 pages
An Instrument For Measuring Customer Satisfaction Toward
No ratings yet
An Instrument For Measuring Customer Satisfaction Toward
14 pages
Sim City 4 Deluxe Install Info
No ratings yet
Sim City 4 Deluxe Install Info
3 pages
Standard P&I Cyber-Security-Poster-2017 - 05 PDF
100% (2)
Standard P&I Cyber-Security-Poster-2017 - 05 PDF
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG

Uploaded by

A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG

Uploaded by

A Probabilistic Model for Semantic Word Vectors

Log-Bilinear Language Model

log p(wn |w1:n1 ) = log p(wn |1:n1 )

Log-Bilinear Document Model

Using MAP estimates for , we approximate this learning problem as,

k ; R, b). log p(wi |

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.