0% found this document useful (0 votes)
2 views12 pages

Foundation (Week 4) - DeepTech - Ready Upskilling Program

This document outlines the Week 4 curriculum for a course on Natural Language Processing (NLP), focusing on fundamentals, text preprocessing, and representation techniques. It includes learning objectives, required resources, and a series of applied learning assignments that involve practical tasks such as text cleaning, tokenization, stemming, and lemmatization. Additionally, it provides links to course materials and specific coding tasks using Python and various libraries.

Uploaded by

gurjibecha88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views12 pages

Foundation (Week 4) - DeepTech - Ready Upskilling Program

This document outlines the Week 4 curriculum for a course on Natural Language Processing (NLP), focusing on fundamentals, text preprocessing, and representation techniques. It includes learning objectives, required resources, and a series of applied learning assignments that involve practical tasks such as text cleaning, tokenization, stemming, and lemmatization. Additionally, it provides links to course materials and specific coding tasks using Python and various libraries.

Uploaded by

gurjibecha88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Week 4: Fundamentals of Natural Language Processing

In this week, you will look at this course;

● Fundamentals of NLP
Course 1: Fundamentals of Natural Language Processing

Learning objectives for course


At the end of this course, you should be able to;

● Understand NLP Fundamentals.


● Preprocess Text and Analysis.
● Apply Text Representation Techniques.
Learning Requirements

To support your learning this week, you will require the


following resources;

● Jupyter Notebook
● Google Colab (Recommended)

Provided is a guide on how to use and for your


assignment with Google Colab. Google Colab Guide
Course 1: Fundamentals of Natural Language Processing

Link(s) to the course:

• Introduction to Natural Language Processing (NLP)


• Text Preprocessing, Tokenization, Stemming & Lemmatization
• Text Representations in NLP
Week 4: Fundamentals of Natural Language Processing

Learning Resources
Course:

1. Slide 1 – Introduction to Natural Language Processing (NLP)


2. Slide 2 - Text Preprocessing, Tokenization, Stemming &
Lemmatization
3. Slide 3 - Text Representations in NLP
4. Notebook 1- Regex Colab Notebook
5. Notebook 2 - String Processing Colab
6. Notebook 3- Text tokenization Colab
7. Notebook 4 - Text Representation Colab
Week 4: Fundamentals of Natural Language Processing
Applied Learning Assignments 1:
1. Define Natural Language Processing (NLP) in your own words.
2. List at least three real-world applications of NLP and explain their significance.
3. Identify and explain two challenges that make NLP complex.
4. Extract the following patterns using regex:
a) All email addresses from the text below:
“Contact us at support@company.com or sales@business.org.
For more, email info@service.net.”

b) All words that end with "ing" from this sentence:


“NLP is amazing for cleaning and processing text while learning new
techniques.”

5. Write a Python program to clean the following text by:


“NLP makes AI smarter! But, sometimes, it’s challenging… Don’t you agree?”
a) Removing all punctuation.
b) Converting it to lowercase.
c) Splitting it into words.
Week 4: Fundamentals of Natural Language Processing
Applied Learning Assignments 2:
1. Text Cleaning Task
Apply text cleaning techniques to preprocess the following text:
"OMG!! NLP is soooo coool 🤩...!!! It costs $1000. Learn it now at https://3mtt.com 😎."

Refer to the course slide for more information

2. Tokenization Task
Perform both word-level and sentence-level tokenization on the given
text.

"Tokenization is the first step in NLP. It splits text into smaller pieces for
analysis."
o Use NLTK to perform word tokenization.
o Use NLTK to perform sentence tokenization
Week 4: Fundamentals of Natural Language Processing
Applied Learning Assignments 2:

3. Stemming and Lemmatization Task

Apply stemming and lemmatization techniques to a list of words:

["running", "flies", "studies", "easily", "studying", "better"]

o Use Porter Stemmer to perform stemming on the words.

o Use spaCy to perform lemmatization on the same words.


Week 4: Fundamentals of Natural Language Processing
Applied Learning Assignments 3:
1. Define a vocabulary of at least 5 unique words. Write Python code to
generate one-hot encoded vectors for your vocabulary.

2. Use the following sentences as your dataset:

● “The quick brown fox jumps over the lazy dog.”

● “The dog sleeps in the kernel”

– Write Python code to generate a Bag of Words representation for the


dataset using CountVectorizer.

– Write Python code to compute the TF-IDF representation using


TfidfVectorizer.
Week 4: Fundamentals of Natural Language Processing
Applied Learning Assignments 3:

3. Create a small dataset of at least 3 sentences related to animals.


Example: "The cat meows. The dog barks. The bird sings."
– Write Python code to:
• Train a Word2Vec model using gensim.
• Retrieve the embedding for the word "dog".

4. Load the pretrained GloVe model (glove-wiki-gigaword-50) using gensim.


– Write Python code to:
• Retrieve the embedding for the word "king".
• Find the 5 most similar words to "king".

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy