Singular Value Decomposition
Singular Value Decomposition
PARAB
22MDT1081
𝐴 = 𝑈Σ𝑉 𝑇
where:
1
SAHAS R. PARAB
22MDT1081
Singular Value Decomposition has also been widely used in information retrieval, in this
particular application, it is also known as Latent Semantic Analysis (LSA) or Latent Semantic
Indexing (LSI). As we will soon see, this idea is very similar to topic model. The basic
problem of information retrieval is: given some search terms the algorithm will retrieve all
documents containing those search terms, or perhaps more usefully, return documents with
relevant content. semantically related to search terms. For example, if one of the search terms
is "automobile", documents containing the search term "cars" can also be returned.
One approach to this problem is as follows: Given an information, we can convert plain text
into a document term matrix with one row per document and one column per word. Then
convert the search term to a vector in the same space and retrieve the document vectors that
are close to the search vector. There are some problems with vector-based recovery.
• First of all, space has a very high dimension. For example, a typical document
collection can easily refer to more than 100,000 words even when the original word is
used (i.e., "jump", "jumping", "jumped" are all considered the same word). This
creates distance measurement problems due to the dimensionality curse.
• Second, it treats each word as independent, while in languages like English the same
word can mean two different things ("left" versus "left" as in direction), and two
different words can mean the same ("car" and "automobile").
2
SAHAS R. PARAB
22MDT1081
3
SAHAS R. PARAB
22MDT1081