0% found this document useful (0 votes)
8 views10 pages

Sumati

The document introduces key concepts in Natural Language Processing (NLP) such as tokenization, stemming, lemmatization, and text vectorization, along with practical examples using NLTK and Scikit-learn. It also covers the fundamentals of Computer Vision, including Convolutional Neural Networks (CNNs) for image classification, specifically for distinguishing between cats and dogs using the CIFAR-10 dataset. Lastly, it discusses AI ethics, emphasizing the importance of fairness and the understanding of bias in machine learning systems.

Uploaded by

ssiippit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Sumati

The document introduces key concepts in Natural Language Processing (NLP) such as tokenization, stemming, lemmatization, and text vectorization, along with practical examples using NLTK and Scikit-learn. It also covers the fundamentals of Computer Vision, including Convolutional Neural Networks (CNNs) for image classification, specifically for distinguishing between cats and dogs using the CIFAR-10 dataset. Lastly, it discusses AI ethics, emphasizing the importance of fairness and the understanding of bias in machine learning systems.

Uploaded by

ssiippit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

18-03-2025

Introduction to NLP Concepts

Tokenization:

Breaking text into words or sentences. Example: "I love NLP" → ["I", "love", "NLP"]

Stemming:

Reduces words to their root form by chopping off prefixes or suffixes. Example: "running",
"runs", "ran" → "run"

Lemmatization:

Converts words to their dictionary form. Example: "better" → "good" (Lemmatization considers
meaning, unlike stemming.)

Text Vectorization:

Converting text into numerical format so that a computer can process it. Example: "I love NLP" →
[1, 0, 2, 3] (using word indexing)

import nltk

nltk.download('punkt_tab') # Download punkt_tab dataset

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer

# Download required NLTK resources


nltk.download('punkt')
nltk.download('wordnet')

# Sample text
text = "I love NLP. NLP is amazing! Running, runs, and ran are
different forms of run."

1# 1 Tokenization (Breaking text into words and sentences)


1️⃣

sent_tokens = sent_tokenize(text) # Sentence tokenization


word_tokens = word_tokenize(text) # Word tokenization

print("🔹 Sentence Tokenization:", sent_tokens)


print("\n🔹 Word Tokenization:", word_tokens)

#
2️⃣ Stemming (Reducing words to their root form)
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in word_tokens]

print("\n🔹 Stemming:", stemmed_words)

3️⃣ Lemmatization (Converting words to their dictionary form)


#
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word, wordnet.VERB) for word
in word_tokens]

print("\n🔹 Lemmatization:", lemmatized_words)

4️⃣ Text Vectorization (Converting text to numbers)


#
vectorizer = CountVectorizer()
vectorized_text = vectorizer.fit_transform([text])

print("\n🔹 Vectorized Text (Word Frequency Matrix):\n",


vectorized_text.toarray())
print("🔹 Vocabulary:", vectorizer.get_feature_names_out())

🔹 Sentence Tokenization: ['I love NLP.', 'NLP is amazing!', 'Running,


runs, and ran are different forms of run.']

🔹 Word Tokenization: ['I', 'love', 'NLP', '.', 'NLP', 'is', 'amazing',


'!', 'Running', ',', 'runs', ',', 'and', 'ran', 'are', 'different',
'forms', 'of', 'run', '.']

🔹 Stemming: ['i', 'love', 'nlp', '.', 'nlp', 'is', 'amaz', '!', 'run',
',', 'run', ',', 'and', 'ran', 'are', 'differ', 'form', 'of', 'run',
'.']

🔹 Lemmatization: ['I', 'love', 'NLP', '.', 'NLP', 'be', 'amaze', '!',


'Running', ',', 'run', ',', 'and', 'run', 'be', 'different', 'form',
'of', 'run', '.']

🔹 Vectorized Text (Word Frequency Matrix):


[[1 1 1 1 1 1 1 2 1 1 1 1 1]]
🔹 Vocabulary: ['amazing' 'and' 'are' 'different' 'forms' 'is' 'love'
'nlp' 'of' 'ran'
'run' 'running' 'runs']

[nltk_data] Downloading package punkt_tab to /root/nltk_data...


[nltk_data] Package punkt_tab is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Package wordnet is already up-to-date!

Explanation

Tokenization: Splits text into words and sentences using NLTK.


Stemming: Uses PorterStemmer to convert words to their base form (e.g., "running" → "run").

Lemmatization: Uses WordNetLemmatizer, considering the meaning of words (e.g., "better" →


"good").

Text Vectorization: Uses CountVectorizer from Scikit-learn to convert words into numerical
representation.

24-3-2025

import nltk

nltk.download('punkt_tab') # Download punkt_tab dataset

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer

# Download required NLTK resources


nltk.download('punkt')
nltk.download('wordnet')

# Sample text
text = "I love NLP. NLP is amazing! Running, runs, and ran are
different forms of run."

#
1️⃣ Tokenization (Breaking text into words and sentences)

sent_tokens = sent_tokenize(text) # Sentence tokenization


word_tokens = word_tokenize(text) # Word tokenization

print("🔹 Sentence Tokenization:", sent_tokens)


print("\n🔹 Word Tokenization:", word_tokens)

2️⃣ Stemming (Reducing words to their root form)


#
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in word_tokens]

print("\n🔹 Stemming:", stemmed_words)

3️⃣ Lemmatization (Converting words to their dictionary form)


#
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word, wordnet.VERB) for word
in word_tokens]

print("\n🔹 Lemmatization:", lemmatized_words)


4️⃣ Text Vectorization (Converting text to numbers)
#
vectorizer = CountVectorizer()
vectorized_text = vectorizer.fit_transform([text])

print("\n🔹 Vectorized Text (Word Frequency Matrix):\n",


vectorized_text.toarray())
print("🔹 Vocabulary:", vectorizer.get_feature_names_out())

[nltk_data] Downloading package punkt_tab to /root/nltk_data...


[nltk_data] Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...

🔹 Sentence Tokenization: ['I love NLP.', 'NLP is amazing!', 'Running,


runs, and ran are different forms of run.']

🔹 Word Tokenization: ['I', 'love', 'NLP', '.', 'NLP', 'is', 'amazing',


'!', 'Running', ',', 'runs', ',', 'and', 'ran', 'are', 'different',
'forms', 'of', 'run', '.']

🔹 Stemming: ['i', 'love', 'nlp', '.', 'nlp', 'is', 'amaz', '!', 'run',
',', 'run', ',', 'and', 'ran', 'are', 'differ', 'form', 'of', 'run',
'.']

🔹 Lemmatization: ['I', 'love', 'NLP', '.', 'NLP', 'be', 'amaze', '!',


'Running', ',', 'run', ',', 'and', 'run', 'be', 'different', 'form',
'of', 'run', '.']

🔹 Vectorized Text (Word Frequency Matrix):


[[1 1 1 1 1 1 1 2 1 1 1 1 1]]
🔹 Vocabulary: ['amazing' 'and' 'are' 'different' 'forms' 'is' 'love'
'nlp' 'of' 'ran'
'run' 'running' 'runs']

Explanation

Tokenization: Splits text into words and sentences using NLTK.

Stemming: Uses PorterStemmer to convert words to their base form (e.g., "running" → "run").

Lemmatization: Uses WordNetLemmatizer, considering the meaning of words (e.g., "better" →


"good").

Text Vectorization: Uses CountVectorizer from Scikit-learn to convert words into numerical
representation.

Introduction to Computer Vision & Image Processing

What is Computer Vision?


Computer Vision is a field of AI that helps computers understand and analyze images and videos,
just like humans do. It is used in face recognition, self-driving cars, and medical image analysis.

What is Image Processing?

Image Processing is a technique to enhance or modify images using mathematical operations.


Example: Converting a colored image to black and white, detecting edges, or blurring an image.

What is a Convolutional Neural Network (CNN)?

A CNN is a type of deep learning model that can recognize patterns in images.

It is used in image classification, such as identifying whether a picture contains a cat or a dog.

CNN works in 3 simple steps:

1 Extract Features (e.g., Detect edges, shapes, colors)

2 Recognize Patterns (e.g., Understand object parts like eyes, ears, and tails)

3 Classify Images (e.g., "This is a cat!")

pip install tensorflow matplotlib

# Step 1: Import necessary libraries


import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# Step 2: Load the CIFAR-10 dataset


dataset = keras.datasets.cifar10
(train_images, train_labels), (test_images, test_labels) =
dataset.load_data()

# CIFAR-10 class names


class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck']

# Filter the dataset to include only cats (class 3) and dogs (class 5)
def filter_cats_dogs(images, labels):
cat_dog_indices = np.where((labels == 3) | (labels == 5))[0] #
Cats = 3, Dogs = 5
filtered_images = images[cat_dog_indices]
filtered_labels = labels[cat_dog_indices]
# Convert labels to binary: cat = 0, dog = 1
filtered_labels = np.where(filtered_labels == 3, 0, 1)
return filtered_images, filtered_labels

train_images, train_labels = filter_cats_dogs(train_images,


train_labels)
test_images, test_labels = filter_cats_dogs(test_images, test_labels)
# Step 3: Normalize images (scale pixel values to 0-1 for better
training)
train_images, test_images = train_images / 255.0, test_images / 255.0

# Step 4: Build a Simple CNN Model


model = keras.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32,
3)), # Convolution Layer
layers.MaxPooling2D((2, 2)), # Pooling Layer
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(), # Flatten to 1D
layers.Dense(64, activation='relu'), # Fully Connected Layer
layers.Dense(1, activation='sigmoid') # Output Layer (binary
classification)
])

# Step 5: Compile the Model


model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

# Step 6: Train the Model


print("Training the CNN Model...")
history = model.fit(train_images, train_labels, epochs=5,
validation_data=(test_images, test_labels))

# Step 7: Evaluate Model Performance


test_loss, test_acc = model.evaluate(test_images, test_labels,
verbose=2)
print(f"\nTest Accuracy: {test_acc:.2f}")

# Step 8: Display a Sample Image with Prediction


def plot_image(i, predictions_array, true_label, img):
true_label, img = true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])

plt.imshow(img, cmap=plt.cm.binary)

predicted_label = 1 if predictions_array[i] > 0.5 else 0


if predicted_label == true_label:
color = 'blue'
else:
color = 'red'

plt.xlabel(f"Predicted: {'Dog' if predicted_label == 1 else 'Cat'}


"
f"(True: {'Dog' if true_label == 1 else 'Cat'})",
color=color)

# Make predictions
predictions = model.predict(test_images)

# Plot the first 5 test images with their predictions


num_rows = 1
num_cols = 5
plt.figure(figsize=(2 * num_cols, 2 * num_rows))
for i in range(num_rows * num_cols):
plt.subplot(num_rows, num_cols, i + 1)
plot_image(i, predictions, test_labels, test_images)
plt.tight_layout()
plt.show()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-


python.tar.gz
170498071/170498071 ━━━━━━━━━━━━━━━━━━━━ 2s 0us/step

/usr/local/lib/python3.11/dist-packages/keras/src/layers/
convolutional/base_conv.py:107: UserWarning: Do not pass an
`input_shape`/`input_dim` argument to a layer. When using Sequential
models, prefer using an `Input(shape)` object as the first layer in
the model instead.
super().__init__(activity_regularizer=activity_regularizer,
**kwargs)

Training the CNN Model...


Epoch 1/5
313/313 ━━━━━━━━━━━━━━━━━━━━ 15s 43ms/step - accuracy: 0.5431 - loss:
0.6839 - val_accuracy: 0.6525 - val_loss: 0.6281
Epoch 2/5
313/313 ━━━━━━━━━━━━━━━━━━━━ 13s 41ms/step - accuracy: 0.6604 - loss:
0.6102 - val_accuracy: 0.7030 - val_loss: 0.5889
Epoch 3/5
313/313 ━━━━━━━━━━━━━━━━━━━━ 20s 41ms/step - accuracy: 0.7129 - loss:
0.5674 - val_accuracy: 0.7040 - val_loss: 0.5716
Epoch 4/5
313/313 ━━━━━━━━━━━━━━━━━━━━ 13s 43ms/step - accuracy: 0.7386 - loss:
0.5273 - val_accuracy: 0.7085 - val_loss: 0.5526
Epoch 5/5
313/313 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.7563 - loss:
0.4973 - val_accuracy: 0.7465 - val_loss: 0.5095
63/63 - 1s - 11ms/step - accuracy: 0.7465 - loss: 0.5095

Test Accuracy: 0.75


63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step
Explanation

Dataset Filtering:

The CIFAR-10 dataset contains 10 classes. We filter it to include only cats (class 3) and dogs
(class 5).

Labels are converted to binary: 0 for cats and 1 for dogs.

Model Output:

The output layer uses sigmoid activation for binary classification (cat or dog).

Visualization:

The plot_image function displays the image along with the predicted and true labels.

If the prediction is correct, the label is displayed in blue; otherwise, it's displayed in red.

AI Ethics and Bias, Understanding bias in machine learning Ethical considerations in AI


development

What is AI Ethics?

Artificial Intelligence (AI) is a powerful technology used in robots, self-driving cars, chatbots, and
facial recognition systems. However, AI must be used fairly, safely, and responsibly. AI Ethics
ensures that AI systems:

1.as_integer_ratio Do not harm people

2.Are fair to everyone

Do not invade privacy

Are used for good purposes

Understanding Bias in Machine Learning

What is Bias?

Bias happens when AI makes unfair decisions because of incorrect, incomplete, or unbalanced
data.

Example of Bias in AI

1 Facial Recognition Bias – AI may recognize lighter-skinned faces more accurately than darker-
skinned faces if it was trained on mostly lighter-skinned images.
Job Hiring Bias – If an AI system is trained mostly on male candidates' resumes, it might favor
men over women for job selections.

Why Does AI Have Bias?

AI learns from data – If the data is not diverse, the AI can be biased.

2.AI models reflect human choices – If humans create biased rules, AI will follow them.

AI can amplify discrimination – If AI is not tested properly, it might make unfair decisions.

Ethical Considerations in AI Development

To ensure fair and responsible AI, developers must follow AI ethics principles:

1 Fairness & Equality

AI should treat everyone equally and not favor any group.

Example: AI should not give loans only to rich people but check all applicants fairly.

2 Privacy & Security

AI must protect personal data and not misuse it.

Example: Social media AI should not track users without permission.

3 Transparency & Explainability

AI decisions should be clear and understandable.

Example: If an AI rejects a loan, it should explain why.

Accountability & Responsibility

Companies and programmers should take responsibility for AI mistakes.

Example: If a self-driving car causes an accident, who is responsible?

Avoiding Harm

AI should never cause harm to humans.

Example: AI in medical diagnosis should be 100% accurate to avoid mistakes.

How Can We Reduce Bias in AI?

1.Use diverse training data – Ensure AI learns from all types of people and situations.

Test AI systems for fairness – Check if AI treats everyone equally.

Have human oversight – Humans must monitor AI to prevent mistakes.

Regulate AI – Governments should set rules for ethical AI use.


from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

# Sample AI hiring system (favoring candidates with high income)


X_train = np.array([[50000], [60000], [70000], [80000], [90000]]) #
Income levels
y_train = np.array([0, 0, 1, 1, 1]) # 0 = Not Hired, 1 = Hired

model = LogisticRegression() # AI hiring model


model.fit(X_train, y_train) # Train the model

# Checking if AI is biased by testing on different income levels


X_test = np.array([[40000], [100000]])
y_pred = model.predict(X_test) # AI's decision

print("AI's Decision:", y_pred) # Will AI unfairly prefer high-income


candidates?

AI's Decision: [0 1]

EXPLAINATION

In the Python code for AI bias detection, we trained a simple logistic regression model to decide
whether a candidate should be hired based on income levels.

Model Training Data:

We provided the model with training data where higher-income individuals were more likely to
be hired.

Testing AI with New Candidates:

Candidate 1: Income = 40,000 → AI predicts 0 (Not Hired)

Candidate 2: Income = 100,000 → AI predicts 1 (Hired)

0 (Not Hired) means the AI does not select the person for the job.

1 (Hired) means the AI selects the person for the job.

END OF UNIT V

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy