0% found this document useful (0 votes)
60 views16 pages

Semantic Analysis Theory1

Frequency distribution is used to count the number of times each word occurs in a document. It uses the FreqDist class from the nltk module. The frequency distribution of words in a sample text can be visualized with a graph to understand the prominence of different words. Bigrams and trigrams are also useful for natural language processing as they consider pairs and groups of words that commonly occur together.

Uploaded by

Madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views16 pages

Semantic Analysis Theory1

Frequency distribution is used to count the number of times each word occurs in a document. It uses the FreqDist class from the nltk module. The frequency distribution of words in a sample text can be visualized with a graph to understand the prominence of different words. Bigrams and trigrams are also useful for natural language processing as they consider pairs and groups of words that commonly occur together.

Uploaded by

Madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Frequency Distribution

Frequency Distribution is referred to as the number of times an outcome of an


experiment occurs. It is used to find the frequency of each word occurring in a
document. It uses FreqDistclass and defined by the nltk.probabilty module.

freq_dist = FreqDist()

for the token in the document:

freq_dist.inc(token.type())

For any word, we can check how many times it occurred in a particular document. E.g.

1. Count Method: freq_dist.count('and')This expression returns the value of the


number of times 'and' occurred. It is called the count method.
2. Frequency Method: freq_dist.freq('and')This the expression returns frequency of a
given sample.

We will write a small program and will explain its working in detail. We will write
some text and will calculate the frequency distribution of each word in the text.

import nltk
a = "Guru99 is the site where you can find the best tutorials for Software Testing Tutori
al, SAP Course for Beginners. Java Tutorial for Beginners and much more. Please visit t
he site Guru99.com and much more."
words = nltk.tokenize.word_tokenize(a)
fd = nltk.FreqDist(words)
fd.plot()

Explanation of code:

1. Import nltk module.


2. Write the text whose word distribution you need to find.
3. Tokenize each word in the text which is served as input to FreqDist module of the
nltk.
4. Apply each word to nlk.FreqDist in the form of a list
5. Plot the words in the graph using plot()

Please visualize the graph for a better understanding of the text written
Frequency distribution of each word in the graph

NOTE: You need to have matplotlib installed to see the above graph
Observe the graph above. It corresponds to counting the occurrence of each word in the
text. It helps in the study of text and further in implementing text-based sentimental
analysis. In a nutshell, it can be concluded that nltk has a module for counting the
occurrence of each word in the text which helps in preparing the stats of natural
language features. It plays a significant role in finding the keywords in the text. You
can also extract the text from the pdf using libraries like extract, PyPDF2 and feed the
text to nlk.FreqDist.

Counting each word may not be much useful. Instead one should focus on collocation
and bigrams which deals with a lot of words in a pair. These pairs identify useful
keywords to better natural language features which can be fed to the machine. Please
look below for their details.
Collocations: Bigrams and Trigrams
What is Collocations?

Collocations are the pairs of words occurring together many times in a document. It is
calculated by the number of those pair occurring together to the overall word count of
the document.

Consider electromagnetic spectrum with words like ultraviolet rays, infrared rays.
The words ultraviolet and rays are not used individually and hence can be treated as
Collocation. Another example is the CT Scan. We don't say CT and Scan separately,
and hence they are also treated as collocation.

Collocation can be categorized into two types-

 Bigrams combination of two words


 Trigrams combination of three words

Bigrams and Trigrams provide more meaningful and useful features for the feature
extraction stage. These are especially useful in text-based sentimental analysis.
Bigrams Example Code

import nltk

text = "Guru99 is a totally new kind of learning experience."


Tokens = nltk.word_tokenize(text)
output = list(nltk.bigrams(Tokens))
print(output)

Output
[('Guru99', 'is'), ('is', 'totally'), ('totally', 'new'), ('new', 'kind'), ('kind', 'of'), ('of', 'learning')
, ('learning', 'experience'), ('experience', '.')]

Trigrams Example Code

Sometimes it becomes important to see a pair of three words in the sentence for
statistical analysis and frequency count. This again plays a crucial role in forming
NLP (natural language processing features) as well as text-based sentimental
prediction.

The same code is run for calculating the trigrams.

import nltk
text = “Guru99 is a totally new kind of learning experience.”
Tokens = nltk.word_tokenize(text)
output = list(nltk.trigrams(Tokens))
print(output)

Output

[('Guru99', 'is', 'totally'), ('is', 'totally', 'new'), ('totally', 'new', 'kind'), ('new', 'kind', 'of'), ('
kind', 'of', 'learning'), ('of', 'learning', 'experience'), ('learning', 'experience', '.')]
Semantic Analysis:
For humans, making sense of text is simple: we recognize individual words and the
context in which they’re used. If you read this tweet:
"Your customer service is a joke! I've been on hold for 30 minutes and
counting!"
You understand that a customer is frustrated because a customer service agent is taking
too long to respond.

However, machines first need to be trained to make sense of human language and
understand the context in which words are used; otherwise, they might misinterpret
the word “joke” as positive.

Powered by machine learning algorithms and natural language processing, semantic


analysis systems can understand the context of natural language, detect emotions and
sarcasm, and extract valuable information from unstructured data, achieving human-
level accuracy.
What Is Semantic Analysis?
Semantic analysis is the process of drawing meaning from text. It allows computers to
understand and interpret sentences, paragraphs, or whole documents, by analyzing
their grammatical structure, and identifying relationships between individual words in
a particular context.
It’s an essential sub-task of Natural Language Processing (NLP) and the driving force
behind machine learning tools like chatbots, search engines, and text analysis.

Semantic analysis-driven tools can help companies automatically extract meaningful


information from unstructured data, such as emails, support tickets, and customer
feedback. Below, we’ll explain how it works.
How Semantic Analysis Works

Lexical semantics plays an important role in semantic analysis, allowing machines to


understand relationships between lexical items (words, phrasal verbs, etc.):

 Hyponyms: specific lexical items of a generic lexical item (hypernym) e.g. orange
is a hyponym of fruit (hypernym).
 Meronomy: a logical arrangement of text and words that denotes a constituent
part of or member of something e.g., a segment of an orange
 Polysemy: a relationship between the meanings of words or phrases, although
slightly different, share a common core meaning e.g. I read a paper, and I wrote a
paper)
 Synonyms: words that have the same sense or nearly the same meaning as
another, e.g., happy, content, ecstatic, overjoyed
 Antonyms: words that have close to opposite meanings e.g., happy, sad
 Homonyms: two words that are sound the same and are spelled alike but have a
different meaning e.g., orange (color), orange (fruit)

Semantic analysis also takes into account signs and symbols (semiotics) and
collocations (words that often go together).
Automated semantic analysis works with the help of machine learning algorithms.

By feeding semantically enhanced machine learning algorithms with samples of text,


you can train machines to make accurate predictions based on past observations. There
are various sub-tasks involved in a semantic-based approach for machine learning,
including word sense disambiguation and relationship extraction:

Word Sense Disambiguation

The automated process of identifying in which sense is a word used according to its
context.
Natural language is ambiguous and polysemic; sometimes, the same word can have
different meanings depending on how it’s used.

The word “orange,” for example, can refer to a color, a fruit, or even a city in Florida!

The same happens with the word “date,” which can mean either a particular day of the
month, a fruit, or a meeting.
In semantic analysis with machine learning, computers use word sense disambiguation
to determine which meaning is correct in the given context.
Relationship Extraction

This task consists of detecting the semantic relationships present in a text. Relationships
usually involve two or more entities (which can be names of people, places, company
names, etc.). These entities are connected through a semantic category, such as “works
at,” “lives in,” “is the CEO of,” “headquartered at.”
For example, the phrase “Steve Jobs is one of the founders of Apple, which is headquartered in
California” contains two different relationships:

Semantic Analysis Techniques

Depending on the type of information you’d like to obtain from data, you can use one
of two semantic analysis techniques: a text classification model (which assigns
predefined categories to text) or a text extractor (which pulls out specific information
from the text).

Semantic Classification Models


 Topic classification: sorting text into predefined categories based on its content.
Customer service teams may want to classify support tickets as they drop into
their help desk. Through semantic analysis, machine learning tools can recognize if
a ticket should be classified as a “Payment issue” or a “Shipping problem.”
 Sentiment analysis: detecting positive, negative, or neutral emotions in a text to
denote urgency. For example, tagging Twitter mentions by sentiment to get a sense
of how customers feel about your brand, and being able to identify disgruntled
customers in real time.
 Intent classification: classifying text based on what customers want to do next. You
can use this to tag sales emails as “Interested” and “Not Interested” to proactively
reach out to those who may want to try your product.

Semantic Extraction Models

 Keyword extraction: finding relevant words and expressions in a text. For instance,
you could analyze the keywords in a bunch of tweets that have been categorized
as “negative” and detect which words or topics are mentioned most often.
 Entity extraction: identifying named entities in text, like names of people,
companies, places, etc. A customer service team might find this useful to
automatically extract names of products, shipping numbers, emails, and any other
relevant data from customer support tickets.

Automatically classifying tickets using semantic analysis tools alleviates agents from
repetitive tasks and allows them to focus on tasks that provide more value while
improving the whole customer experience.

Tickets can be instantly routed to the right hands, and urgent issues can be easily
prioritized, shortening response times, and keeping satisfaction levels high.
Conclusion

When combined with machine learning, semantic analysis allows you to delve into
your customer data by enabling machines to extract meaning from unstructured text at
scale and in real time.

Powerful semantic-enhanced machine learning tools will deliver valuable insights that
drive better decision-making and improve customer experience.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy