0% found this document useful (0 votes)
176 views10 pages

Getting Started With Natural Language Processing

Natural language processing (NLP) is a subfield of artificial intelligence that analyzes text, speech and other forms of natural language. This document discusses several key techniques in NLP including text preprocessing, stemming, word standardization, noise/entity removal, term frequency-inverse document frequency (TF-IDF) analysis. The goal of preprocessing is to remove noise from text before analysis, such as removing stop words, punctuation and formatting. Stemming reduces words to their root forms. TF-IDF is a method for understanding word importance that considers term frequency in a document and inverse document frequency across a corpus.

Uploaded by

Omar Benjelloun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views10 pages

Getting Started With Natural Language Processing

Natural language processing (NLP) is a subfield of artificial intelligence that analyzes text, speech and other forms of natural language. This document discusses several key techniques in NLP including text preprocessing, stemming, word standardization, noise/entity removal, term frequency-inverse document frequency (TF-IDF) analysis. The goal of preprocessing is to remove noise from text before analysis, such as removing stop words, punctuation and formatting. Stemming reduces words to their root forms. TF-IDF is a method for understanding word importance that considers term frequency in a document and inverse document frequency across a corpus.

Uploaded by

Omar Benjelloun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Getting Started with Natural

Language Processing
• Natural Language Processing is a sub field of
artificial-intelligence that helps in processing
and analyzing natural language like text,
speech and so on. In this article we will try to
explain various techniques.
Text Preprocessing
Text is mostly in unstructured form. Lot of noises will be
present in it. In data preprocessing we will remove
the noises associated with it. It is not possible to
analyze the data without properly preprocessing it.
Stemming
• Stemming is a rule based approach which
strips suffixes(ing,ly,s etc). some of the
examples are:
Word standardisation:
Text may contains words that are not in
dictionary.for example in tweets or
comments , it can contain words like ‘re’
representing are,’s’ for is,’awsm’ for awesome
and so on.such words will not recognized by
our model.so we have to fix it
 Noise entity removal:

• In this step we will remove html tags,stop words,


punctuation's, white spaces etc. In a sentence, there are
many extra words. Example: to, is,and etc.
• These words don’t add any meaning to the sentence.They
are called stop words.So, we can remove them.In the
code we will remove stop words.Also we will remove
punctuation's, numbers and unnecessary white spaces.
• We can use nltk library to remove stop words
and re library to remove punctuation's, numbers and
white spaces.
TF-IDF

• TF-IDF stands for term frequency-inverse


document frequency. TF-IDF value is obtained
by multiplying TF score and IDF score
Term Frequency(TF):
• Term frequency of a word is the frequency of
the word in the document. The term
frequency is often divided by the document
length to normalize.
Inverse Document Frequency (IDF):

• Inverse Document Frequency (IDF): is a


scoring of how rare the word is across
documents. It reflects how important a word
is to a document in a collection or corpus.
Merci !
Réalisé par : Encadré par :
• Omar Benjelloun • Abdelaghani
• Bouali Mohammed Ghanem
• Ellaoui Younes
• Soukaina Guedira

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy