0% found this document useful (0 votes)

1K views29 pages

GENAI lab

The Generative AI Lab Manual outlines a practical course aimed at understanding and implementing generative AI models, with a focus on word embeddings and their applications. Students will engage in various experiments, including exploring pre-trained word vectors, training custom models, and utilizing Hugging Face for sentiment analysis and text summarization. The course emphasizes hands-on learning through projects, assessments, and the development of skills relevant to real-world applications in AI.

Uploaded by

Mayduru Lekhana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views29 pages

GENAI lab

Uploaded by

Mayduru Lekhana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

GENERATIVE

Lab Manual
Generative AI Semester 6
Course Code BAIL657C CIE Marks 50
Teaching Hours/Week (L:T:P: S) 0:0:1:0 SEE Marks 50
Credits 01 Exam Hours 100
Examination type (SEE) Practical
Course objectives:
● Understand the principles and concepts behind generative AI models
● Explain the knowledge gained to implement generative models using Prompt design frameworks.
● Apply various Generative AI applications for increasing productivity.
● Develop Large Language Model-based Apps.

Sl.NO Experiments
1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic. Perform arithmetic
operations and analyze results.

2. Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings for Q 1. Select 10 words from
a specific domain (e.g., sports, technology) and visualize their embeddings. Analyze clusters and relationships.
Generate contextually rich outputs using embeddings. Write a program to generate 5 semantically similar words
for a given input.

3. Train a custom Word2Vec model on a small dataset. Train embeddings on a domain-specific corpus (e.g., legal,
medical) and analyze how embeddings capture domain-specific semantics.

4. Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word
embeddings. Use the similar words to enrich a GenAI prompt. Use the AI model to generate responses for the
original and enriched prompts. Compare the outputs in terms of detail and relevance.

5. Use word embeddings to create meaningful sentences for creative tasks. Retrieve similar words for a seed word.
Create a sentence or story using these words as a starting point. Write a program that: Takes a seed word.
Generates similar words. Constructs a short paragraph using these words.

6. Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a real-world application, Load the
sentiment analysis pipeline. Analyze the sentiment by giving sentences to input.

7. Summarize long texts using a pre-trained summarization model using Hugging face model. Load the
summarization pipeline. Take a passage as input and obtain the summarized text.

8. Install langchain, cohere (for key), langchain-community. Get the api key( By logging into Cohere and obtaining
the cohere key). Load a text document from your google drive . Create a prompt template to display the output in
a particular manner.

9. Take the Institution name as input. Use Pydantic to define the schema for the desired output and create a custom
output parser. Invoke the Chain and Fetch Results. Extract the below Institution related details from Wikipedia:
The founder of the Institution. When it was founded. The current branches in the institution . How many
employees are working in it. A brief 4-line summary of the institution.

10 Build a chatbot for the Indian Penal Code. We'll start by downloading the official Indian Penal Code document,
and then we'll create a chatbot that can interact with it. Users will be able to ask questions about the Indian Penal
Code and have a conversation with it.
Course outcomes (Course Skill Set):
At the end of the course the student will be able to:
● Develop the ability to explore and analyze word embeddings, perform vector arithmetic to investigate
word relationships, visualize embeddings using dimensionality reduction techniques
● Apply prompt engineering skills to real-world scenarios, such as information retrieval, text generation.
● Utilize pre-trained Hugging Face models for real-world applications, including sentiment analysis and text
summarization.
● Apply different architectures used in large language models, such as transformers, and understand their
advantages and limitations.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together

Continuous Internal Evaluation (CIE):

CIE marks for the practical course are 50 Marks.
The split-up of CIE marks for record/ journal and test are in the ratio 60:40.
● Each experiment is to be evaluated for conduction with an observation sheet and
record write-up. Rubrics for the evaluation of the journal/write-up for
hardware/software experiments are designed by the faculty who is handling the
laboratory session and are made known to students at the beginning of the practical
session.
● Record should contain all the specified experiments in the syllabus and each
experiment write-up will be evaluated for 10 marks.
● Total marks scored by the students are scaled down to 30 marks (60% of
maximum marks).
● Weightage to be given for neatness and submission of record/write-up on time.
● Department shall conduct a test of 100 marks after the completion of all the
experiments listed in the syllabus.
● In a test, test write-up, conduction of experiment, acceptable result, and
procedural knowledge will carry a weightage of 60% and the rest 40% for viva-
voce.
● The suitable rubrics can be designed to evaluate each student’s performance and
learning ability.
● The marks scored shall be scaled down to 20 marks (40% of the maximum marks).
The Sum of scaled-down marks scored in the report write-up/journal and marks of a test is the
total CIE marks scored by the student.
Semester End Evaluation (SEE):
● SEE marks for the practical course are 50 Marks.
● SEE shall be conducted jointly by the two examiners of the same institute, examiners are
appointed by the Head of the Institute.
● The examination schedule and names of examiners are informed to the university before
the conduction of the examination. These practical examinations are to be conducted
between the schedule mentioned in the academic calendar of the University.
● All laboratory experiments are to be included for practical examination.
● (Rubrics) Breakup of marks and the instructions printed on the cover page of the answer
script to be strictly adhered to by the examiners. OR based on the course requirement
evaluation rubrics shall be decided jointly by examiners.
● Students can pick one question (experiment) from the questions lot prepared by the
examiners jointly.
● Evaluation of test write-up/ conduction procedure and result/viva will be conducted
jointly by examiners.
General rubrics suggested for SEE are mentioned here, writeup-20%, Conduction procedure
and result in -60%, Viva-voce 20% of maximum marks. SEE for practical shall be evaluated for
100 marks and scored marks shall be scaled down to 50 marks (however, based on course
type, rubrics shall be decided by the examiners)
Change of experiment is allowed only once and 15% of Marks allotted to the procedure part are
to be made zero.
The minimum duration of SEE is 02 hours
Suggested Learning Resources:
Books:

1. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of
OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4, by Valentina
Alto, Packt Publishing Ltd, 2023.
2. Generative AI for Cloud Solutions: Architect modern AI LLMs in secure, scalable, and
ethical cloud environments, by Paul Singh, Anurag Karuparti ,Packt Publishing Ltd,
2024.

Objective:

 To understand pre-trained word vectors and how they represent words as numbers in a
continuous space.
 To explore word relationships using vector arithmetic.
 To perform arithmetic operations on word vectors and analyze the results using
simple examples.

In this experiment, we will learn about pre-trained word vectors and how they help us
represent words in a way that computers can understand. These vectors capture the meaning
and context of words. For example, the word "apple" can be represented as a set of numbers
that encode its meaning. Words with similar meanings will have similar vectors.

We will also explore vector arithmetic, which is a way to perform mathematical operations
on these word vectors to discover relationships between words.

Example:
If you subtract the vector for "cat" from "kitten" and add the vector for "puppy," you get a
word related to young dogs—"dog".

What Are Pre-trained Word Vectors?

Pre-trained word vectors are created by training models on large text datasets. Each word is
mapped to a numerical vector, typically with 100 to 300 dimensions, which captures the
meaning and context of the word.

Why Use Pre-trained Word Vectors?

 Efficient: No need to train a model from scratch.

 Context-Aware: Similar words are close to each other in the vector space.
 Useful for NLP Tasks: Such as translation, sentiment analysis, and question-answering.

Example:
The word "banana" might be represented as a vector like this:
[0.4, -0.7, 0.1, ..., 0.9]

Vector Arithmetic in Word Vectors

Vector arithmetic allows us to perform mathematical operations on word vectors. By adding

or subtracting vectors, we can reveal hidden relationships between words.

Example:
If we want to find out what "lion" is to "cub" as "dog" is to "puppy," we can use the
following equation:
cub≈lion−adult+young\text{cub} ≈ \text{lion} - \text{adult} +
\text{young}cub≈lion−adult+young

Word Relationships with Real-Time Examples

Example 1: Animal Relationships

 Vector("kitten") - Vector("cat") + Vector("dog") ≈ Vector("puppy")

Example 2: Fruit Relationships

 Vector("orange") - Vector("fruit") + Vector("tropical") ≈ Vector("mango")

Sample Program: Exploring Animal and Fruit Relationships

# Install Gensim if not already installed

!pip install gensim

from gensim.models import KeyedVectors

# Load pre-trained GloVe vectors (100-dimensional)

from gensim.downloader import load

word_vectors = load('glove-wiki-gigaword-100') # Automatically downloads the model

# Example 1: Animal relationship (kitten → cat, puppy → dog)

result = word_vectors.most_similar(positive=['kitten', 'dog'], negative=['cat'], topn=1)

print("Result of 'kitten - cat + dog':", result[0][0]) # Expected output: 'puppy' or a related word

# Example 2: Fruit relationship (orange → fruit, mango → tropical fruit)

result = word_vectors.most_similar(positive=['orange', 'tropical'], negative=['fruit'], topn=1)

print("Result of 'orange - fruit + tropical':", result[0][0]) # Expected output: 'mango' or a related

word

Output:
Experiment 2: Visualizing Word Embedding’s and Generating Semantically
Similar Words.

Objective:

 To visualize word embedding’s using dimensionality reduction techniques like PCA or t-SNE.
 To select 10 words from a specific domain (e.g., sports, technology) and analyze the
clusters and relationships between them.
 To generate contextually rich outputs by finding semantically similar words using pre-
trained word embedding’s.

Dimensionality Reduction for Word Embeddings

Word embedding’s like GloVe or Word2Vec represent words in high-dimensional

spaces (usually 100 to 300 dimensions). Dimensionality reduction techniques help us
visualize these high-dimensional embedding’s in a 2D or 3D space. This makes it easier
to observe clusters and relationships between words.

Techniques:

1. Principal Component Analysis (PCA): A linear method to reduce dimensions

while preserving maximum variance.
2. t-SNE (t-Distributed Stochastic Neighbour Embedding): A non-linear method that
captures local structure and forms better clusters for visualization.

Real-Time Visualization and Semantic Similarity Generation

Step 1: Visualize 10 Words from a Specific Domain

We will select 10 words from the technology domain and visualize their embeddings using t-
SNE.

Step 2: Generate 5 Semantically Similar Words for a Given Input

Given an input word, we will use pre-trained word vectors to find the 5 most semantically
similar words.

Sample Program
# Install required libraries

!pip install gensim matplotlib scikit-learn numpy

import matplotlib.pyplot as plt

from sklearn.manifold import TSNE

from gensim.downloader import load

import numpy as np # Import NumPy for array conversion

# Load pre-trained word vectors (GloVe - 100 dimensions)

word_vectors = load('glove-wiki-gigaword-100')

# Select 10 words from the "technology" domain (ensure words exist in the model)

tech_words = ['computer', 'internet', 'software', 'hardware', 'network', 'data', 'cloud', 'robot',

'algorithm', 'technology']

tech_words = [word for word in tech_words if word in word_vectors.key_to_index]

# Extract word vectors and convert to a NumPy array

vectors = np.array([word_vectors[word] for word in tech_words])

# Reduce dimensions using t-SNE

tsne = TSNE(n_components=2, random_state=42, perplexity=5) # Perplexity is reduced to match

the small sample size

reduced_vectors = tsne.fit_transform(vectors)

# Plot the 2D visualization

plt.figure(figsize=(10, 6))

for i, word in enumerate(tech_words):

plt.scatter(reduced_vectors[i, 0], reduced_vectors[i, 1], label=word)

plt.text(reduced_vectors[i, 0] + 0.02, reduced_vectors[i, 1] + 0.02, word,

fontsize=12) plt.title("t-SNE Visualization of Technology Words")

plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")

plt.legend()

plt.show()

# Generate 5 semantically similar words for a given input word

input_word = 'computer'

if input_word in word_vectors.key_to_index:

similar_words = word_vectors.most_similar(input_word,

topn=5) print(f"5 words similar to '{input_word}':")

for word, similarity in similar_words:

print(f"{word} (similarity: {similarity:.2f})")

else:

print(f"'{input_word}' is not in the vocabulary.")

Output:
Experiment 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus (e.g., legal, medical) and analyze how
embeddings capture domain-specific semantics.

Objective:

1. Train a custom Word2Vec model on a small domain-specific dataset (medical text).

2. Analyze how the embeddings capture domain-specific word relationships.
3. Generate similar words for a given input to observe how the model learned from
the domain-specific data.

# Install required library

!pip install gensim

from gensim.models import Word2Vec

# Step 1: Create a small dataset (list of medical-related word lists)

medical_data = [
["patient", "doctor", "nurse", "hospital", "treatment"],
["cancer", "chemotherapy", "radiation", "surgery", "recovery"],
["infection", "antibiotics", "diagnosis", "disease", "virus"],
["heart", "disease", "surgery", "cardiology", "recovery"] ]

# Step 2: Train a Word2Vec model

model = Word2Vec(sentences=medical_data, vector_size=10,
window=2, min_count=1, workers=1, epochs=50)

# Step 3: Find similar words for a given input word

input_word = "patient"
if input_word in model.wv:
similar_words = model.wv.most_similar(input_word, topn=3)
print(f"3 words similar to '{input_word}':")
for word, similarity in similar_words:
print(f"{word} (similarity: {similarity:.2f})")
else:
print(f"'{input_word}' is not in the vocabulary.")

Output:

What This Code Does:

1. Creates a small medical dataset using lists of related words.
2. Trains a Word2Vec model to learn relationships between these words.
3. Finds 3 words similar to the input word, showing how well the model captures relationships.
Experiment 4: Use word embeddings to improve prompts for Generative AI
model. Retrieve similar words using word embeddings. Use the similar
words to enrich a GenAI prompt. Use the AI model to generate responses
for the original and enriched prompts. Compare the outputs in terms of
detail and relevance.

When interacting with Generative AI models (like GPT), the quality of the output often
depends on how well the input prompt is framed. Enhancing prompts using word
embeddings helps improve the model's understanding and provides more contextually rich
and detailed responses.

Here’s how we can enhance prompts using Word2Vec embeddings:

Use Word Embeddings:

Word embeddings represent words as vectors in a continuous vector space. Words with
similar meanings have similar vector representations. For example, the word "AI" might be
similar to "machine learning" or "artificial intelligence."

Retrieve Similar Words:

By training or using pre-trained word embeddings, we can find words that are semantically
close to the original prompt. These similar words help make the prompt richer.

Example:
Original Prompt: "Explain the impact of AI on technology."
Enriched Prompt: "Explain the impact of AI, machine learning, deep learning, and data
science on technology."

Generate Responses:
Use a Generative AI model (e.g., OpenAI GPT) to generate responses for both the original
and enriched prompts.
Comparison: The enriched prompt will usually yield a more detailed and relevant response.

# Step 1: Pre-defined dictionary of words and their similar terms (static word
embeddings)

word_embeddings = {

"ai": ["machine learning", "deep learning", "data science"],

"data": ["information", "dataset", "analytics"],

"science": ["research", "experiment", "technology"],

"learning": ["education", "training", "knowledge"],

"robot": ["automation", "machine", "mechanism"]

}
# Step 2: Function to find similar words using the static dictionary

def find_similar_words(word):

if word in word_embeddings:

return word_embeddings[word]

else:

return []

# Step 3: Function to enrich a prompt with similar words

def enrich_prompt(prompt):

words = prompt.lower().split()

enriched_words = []

for word in words:

similar_words = find_similar_words(word)

if similar_words:

enriched_words.append(f"{word} ({', '.join(similar_words)})")

else:

enriched_words.append(word)

return " ".join(enriched_words)

# Step 4: Original prompt

original_prompt = "Explain AI and its applications in science."

# Step 5: Enrich the prompt using similar words

enriched_prompt = enrich_prompt(original_prompt)
# Step 6: Print the original and enriched prompts

print("Original Prompt:")

print(original_prompt)

print("\nEnriched Prompt:")

print(enriched_prompt)

Output:
Experiment 5: Use word embeddings to create meaningful sentences for
creative tasks. Retrieve similar words for a seed word. Create a sentence
or story using these words as a starting point. Write a program that: Takes
a seed word. Generates similar words. Constructs a short paragraph using
these words.

# Step 1: Pre-defined dictionary of words and their similar

terms word_embeddings = {
"adventure": ["journey", "exploration", "quest"],
"robot": ["machine", "automation", "mechanism"],
"forest": ["woods", "jungle", "wilderness"],
"ocean": ["sea", "waves", "depths"],
"magic": ["spell", "wizardry", "enchantment"]
}

# Step 2: Function to get similar words for a seed

word def get_similar_words(seed_word):
if seed_word in word_embeddings:
return
word_embeddings[seed_word]
else:
return ["No similar words found"]

# Step 3: Function to create a short paragraph using the seed word and
similar words def create_paragraph(seed_word):
similar_words =
get_similar_words(seed_word) if "No
similar words found" in similar_words:
return f"Sorry, I couldn't find similar words for '{seed_word}'."

# Construct a short story using the seed word and

similar words paragraph = (
f"Once upon a time, there was a great {seed_word}. "
f"It was full of {', '.join(similar_words[:-1])}, and {similar_words[-1]}. "
f"Everyone who experienced this {seed_word} always remembered it as a remarkable
tale."
)
return paragraph

# Step 4: Input a seed word

seed_word = "adventure" # You can change this to "robot", "forest", "ocean", "magic", etc.

# Step 5: Generate and print the

paragraph story =
create_paragraph(seed_word)
print("Generated Paragraph:")
print(story)

Output:
What This Program Does:

1. Uses a static dictionary of word embeddings to find similar words for a given seed word.
2. Constructs a short paragraph using the seed word and its similar words.
3. Prints the paragraph, creating a small story based on the seed word.
Experiment 6: Use a pre-trained Hugging Face model to analyze sentiment in
text. Assume a real-world application, Load the sentiment analysis
pipeline. Analyze the sentiment by giving sentences to input.

Pre-trained model: A model that has already been trained on a large dataset and can
perform sentiment analysis without needing additional training.
Hugging Face Pipeline: An easy way to use pre-trained models for tasks like sentiment
analysis, text generation, translation, etc.
Real-world application: Analyzing customer reviews or social media comments to
understand user feedback.

Sample Program:
# Step 1: Install and import the necessary library
# You can uncomment and run this in Google
Colab # !pip install transformers

from transformers import pipeline

# Step 2: Load the sentiment analysis

pipeline sentiment_analyzer =
pipeline("sentiment-analysis")

# Step 3: Define sample sentences for

analysis sentences = [
"I love using this product! It makes my life so much
easier.", "The service was terrible, and I'm very
disappointed.",
"It's an average experience, nothing special but not bad either."]

# Step 4: Analyze the sentiment for each

sentence for sentence in sentences:
result = sentiment_analyzer(sentence)[0]

Output:
What This Program Does:

1. Loads a pre-trained Hugging Face model for sentiment analysis.

2. Analyzes the sentiment of sample sentences.
3. Prints the sentiment label (POSITIVE, NEGATIVE, or NEUTRAL) along with a
confidence score.
Experiment 7: Summarize long texts using a pre-trained summarization
model using Hugging face model. Load the summarization pipeline.
Take a passage as input and obtain the summarized text.

Sample Program:
# Step 1: Import the Hugging Face
pipeline from transformers import
pipeline

# Step 2: Load the summarization pipeline

summarizer = pipeline("summarization")

# Step 3: Input a long passage for

summarization long_text = """
Artificial Intelligence (AI) is transforming various industries by automating tasks,
improving efficiency,
and enabling new capabilities. In the healthcare sector, AI is used for disease
diagnosis, personalized medicine,
and drug discovery. In the business world, AI-powered systems are optimizing
customer service, fraud detection,
and supply chain management. AI's impact on everyday life is significant, from
smart assistants to recommendation
systems in streaming platforms. As AI continues to evolve, it promises even
greater advancements in fields like
education, transportation, and environmental
sustainability. """

# Step 4: Summarize the input passage

summary = summarizer(long_text, max_length=50, min_length=20,
do_sample=False)[0]["summary_text"]

# Step 5: Print the summarized

text print("Summarized Text:")
print(summary)

Output:
What This Program Does:

1. Uses Hugging Face's pipeline("summarization") to load a pre-

trained summarization model.
2. Processes a long text passage and reduces it to a concise summary.
3. Prints the summarized version, which highlights the key points.
Experiment 8: Install langchain, cohere (for key), langchain-community. Get the
api key( By logging into Cohere and obtaining the cohere key). Load a text
document from your google drive . Create a prompt template to display the
output in a particular manner.

Step-by-Step Explanation

 Install necessary libraries: We will install langchain, cohere, and

langchain- community.
 Set up the Cohere API: Obtain your Cohere API key by logging into Cohere's platform.
 Load a text document from Google Drive.
 Create a Langchain Prompt Template to process the document and return the
result in a particular format.

# Step 1: Install necessary libraries

!pip install langchain cohere langchain-community

# Step 2: Import the required

modules from langchain.llms import
Cohere
from langchain.prompts import
PromptTemplate from langchain import
LLMChain
from google.colab import drive

# Step 3: Mount Google Drive to access the document

drive.mount('/content/drive')

# Step 4: Load the text document from Google Drive

file_path = "/content/drive/MyDrive/sample_text.txt" # Change this path to your
file location with open(file_path, "r") as file:
text = file.read()

# Step 5: Set up Cohere API key

cohere_api_key = "YOUR_COHERE_API_KEY" # Replace with your actual Cohere API key

# Step 6: Create a prompt

template prompt_template =
"""
Summarize the following text in three bullet points:
{tex
t}
"""

# Step 7: Configure the Cohere model with

Langchain llm =
Cohere(cohere_api_key=cohere_api_key)
prompt = PromptTemplate(input_variables=["text"], template=prompt_template)

# Step 8: Create an LLMChain with the Cohere model and prompt

template chain = LLMChain(llm=llm, prompt=prompt)

# Step 9: Run the chain on the loaded

What This Program Does:

1. Mounts Google Drive to access a text document (sample_text.txt).

2. Reads the document's content and prepares it for processing.
3. Uses Langchain’s PromptTemplate to create a structured request for summarization.
4. Cohere LLM processes the text and returns the summarized output in a bullet-
point format.

Output:

Summarized Output in Bullet Points:

- AI is transforming industries like healthcare, business, and education.

- Smart assistants and recommendation systems are examples of AI's impact

on daily life.

- Future advancements will bring improvements in transportation

and sustainability.
Experiment 9: Take the Institution name as input. Use Pydantic to define the
schema for the desired output and create a custom output parser. Invoke the
Chain and Fetch Results. Extract the below Institution related details from
Wikipedia:The founder of the Institution. When it was founded. The current
branches in the institution . How many employees are working in it. A brief 4-line
summary of the institution.

Step-by-Step Explanation:

1. Install necessary libraries: Install langchain, pydantic, and

wikipedia-api.
2. Take institution name as input.
3. Use Pydantic to define a schema for the output (structured format).
4. Fetch institution details from Wikipedia and format the output according
to the schema.

# Step 1: Install necessary libraries

!pip install langchain pydantic wikipedia-api

# Step 2: Import required modules

from langchain.llms import Cohere

from langchain.prompts import PromptTemplate

from langchain import LLMChain

from pydantic import BaseModel

import wikipediaapi

# Step 3: Define a Pydantic schema for the institution's details

class InstitutionDetails(BaseModel):

founder: str

founded: str

branches: str

employees: str

summary: str
# Step 4: Function to fetch details from Wikipedia with user-agent specified

def fetch_wikipedia_summary(institution_name):

wiki_wiki = wikipediaapi.Wikipedia(language='en',
user_agent="InstitutionInfoBot/1.0 (contact: youremail@example.com)")

page = wiki_wiki.page(institution_name)

if page.exists():

return page.text

else:

return "No information available on Wikipedia for this institution."

# Step 5: Prompt template for extracting relevant details

prompt_template = """

Extract the following information from the given text:

- Founder

- Founded (year)

- Current branches

- Number of employees

- 4-line brief summary

Text: {text}

Provide the information in the following format:

Founder: <founder>

Founded: <founded>

Branches: <branches>

Employees: <employees>

Summary: <summary>
"""

# Step 6: Take institution name as input

institution_name = input("Enter the name of the institution: ")

# Step 7: Fetch Wikipedia data for the institution

wiki_text = fetch_wikipedia_summary(institution_name)

# Step 8: Set up Cohere (Replace YOUR_COHERE_API_KEY with your actual key)

cohere_api_key = "YOUR_COHERE_API_KEY"

llm = Cohere(cohere_api_key=cohere_api_key)

# Step 9: Create the Langchain prompt and chain

prompt = PromptTemplate(input_variables=["text"], template=prompt_template)

chain = LLMChain(llm=llm, prompt=prompt)

# Step 10: Run the chain and parse the output

response = chain.run(wiki_text)

# Step 11: Parse the response using Pydantic

try:

details = InstitutionDetails.parse_raw(response)

print("Institution Details:")

print(f"Founder: {details.founder}")
print(f"Founded: {details.founded}")

print(f"Branches: {details.branches}")

print(f"Employees: {details.employees}")

print(f"Summary: {details.summary}")

except Exception as e:

print("Error parsing the response:", e)

Output:

Enter the name of the institution: Google

Institution Details:

Founder: Larry Page, Sergey Brin

Founded: 1998

Branches: Global offices in more than 50 countries

Employees: Over 100,000

Summary: Google is a multinational technology company specializing in internet-

related services and products. It is known for its search engine, online advertising,
cloud computing, and software. Google is one of the Big Five tech companies. It was
founded by Larry Page and Sergey Brin in 1998.
Experiment 10: Build a chatbot for the Indian Penal Code. We'll start by
downloading the official Indian Penal Code document, and then we'll create a
chatbot that can interact with it. Users will be able to ask questions about the
Indian Penal Code and have a conversation with it.

Step-by-step Explanation

1. Load the IPC text document.

2. Create a chatbot using a basic question-answering chain.
3. Users can ask questions, and the chatbot will retrieve relevant sections from the IPC.

Sample Program:
# Step 1: Install necessary packages
!pip install langchain pydantic wikipedia-api openai

# Step 2: Import required modules

from langchain.chains import load_qa_chain
from langchain.docstore.document import
Document from langchain.llms import OpenAI

# Step 3: Load the Indian Penal Code text from a file

ipc_file_path = "path_to_your_ipc_file.txt" # Replace with the actual path to your IPC text file

# Read the IPC document

with open(ipc_file_path, "r", encoding="utf-8") as
file: ipc_text = file.read()

# Step 4: Create a Langchain Document object

ipc_document =
Document(page_content=ipc_text)

# Step 5: Set up OpenAI (or any other LLM of your choice)

llm = OpenAI(openai_api_key="YOUR_OPENAI_API_KEY", temperature=0.3) # Use
temperature=0.3 for more factual responses

# Step 6: Create a simple question-answering

chain qa_chain = load_qa_chain(llm,
chain_type="stuff")

# Step 7: Chat with the chatbot

print("Chatbot for the Indian Penal Code (IPC)")
print("Ask a question about the Indian Penal Code (type 'exit' to stop):")

while True:
user_question = input("\nYour
question: ") if user_question.lower()
== "exit":
print("Goodbye!")
break

# Use the QA chain to answer the question

response = qa_chain.run(input_documents=[ipc_document],
question=user_question) print(f"Answer: {response}")
Output:
Chatbot for the Indian Penal Code (IPC)
Ask a question about the Indian Penal Code (type 'exit' to stop):

Your question: What is Section 302 of the IPC?

Answer: Section 302 of the Indian Penal Code refers to punishment for
murder, which is punishable with death or life imprisonment and a fine.

Your question: exit

Goodbye!

Flutter Manual
No ratings yet
Flutter Manual
62 pages
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
No ratings yet
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
40 pages
BAIL657C-G-AI
100% (1)
BAIL657C-G-AI
3 pages
Syllabus 6th Sem 21cs63
No ratings yet
Syllabus 6th Sem 21cs63
7 pages
DV Lab Manual
No ratings yet
DV Lab Manual
88 pages
MCS101-Artificial Intelligence
100% (1)
MCS101-Artificial Intelligence
3 pages
Software Quality: Robert Hughes and Mike Cotterell
No ratings yet
Software Quality: Robert Hughes and Mike Cotterell
46 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages
USN CS822: B. E. Degree (Autonomous) Eighth Semester End Examination (SEE), May 2020/june 2020
No ratings yet
USN CS822: B. E. Degree (Autonomous) Eighth Semester End Examination (SEE), May 2020/june 2020
2 pages
Module 4 - Cloud Programming and Software Environments
No ratings yet
Module 4 - Cloud Programming and Software Environments
25 pages
VTU Exam Question Paper With Solution of 18CS753 Introduction To Artificial Intelligence March-2022-Dr. Srividya R
100% (1)
VTU Exam Question Paper With Solution of 18CS753 Introduction To Artificial Intelligence March-2022-Dr. Srividya R
18 pages
Course Exit Survey - DCS
No ratings yet
Course Exit Survey - DCS
2 pages
21cs502 Ai Unit-I Notes Short 42 Pges
No ratings yet
21cs502 Ai Unit-I Notes Short 42 Pges
42 pages
Model Question Paper II - 21cs642 - 6 Sem (2021 Scheme)
No ratings yet
Model Question Paper II - 21cs642 - 6 Sem (2021 Scheme)
2 pages
Ai (Bad402)
100% (2)
Ai (Bad402)
4 pages
Algorithms Lab Viva Questions
No ratings yet
Algorithms Lab Viva Questions
2 pages
CS8691 AI CO-PO Mapping
No ratings yet
CS8691 AI CO-PO Mapping
6 pages
Deep Learning KCS078
0% (1)
Deep Learning KCS078
2 pages
BCS306B Manual
No ratings yet
BCS306B Manual
27 pages
Cyber Space, Cybersquatting, Cyber Punk, Cyber Warfare, Cyber Terrorism
No ratings yet
Cyber Space, Cybersquatting, Cyber Punk, Cyber Warfare, Cyber Terrorism
12 pages
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
100% (1)
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
2 pages
devops lab viva questions
No ratings yet
devops lab viva questions
14 pages
Cp4251 Internet of Things
No ratings yet
Cp4251 Internet of Things
61 pages
@vtucode - in 21CS61 Module 4 2021 Scheme
No ratings yet
@vtucode - in 21CS61 Module 4 2021 Scheme
31 pages
CSM Laboratory Manual Edited
No ratings yet
CSM Laboratory Manual Edited
22 pages
Open Source Software Syllabus (Anna University, Coimbatore)
100% (1)
Open Source Software Syllabus (Anna University, Coimbatore)
1 page
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
CCS375 Set1
No ratings yet
CCS375 Set1
3 pages
21cs735-IOT MODULE 1 Notes
No ratings yet
21cs735-IOT MODULE 1 Notes
28 pages
CCS334 Big Data Analytics Important Question
No ratings yet
CCS334 Big Data Analytics Important Question
1 page
STM Viva Que
100% (2)
STM Viva Que
54 pages
Cs2351 Artificial Intelligence 16 Marks
100% (1)
Cs2351 Artificial Intelligence 16 Marks
1 page
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
CS 8791 Cloud Computing Previous Question Paper
No ratings yet
CS 8791 Cloud Computing Previous Question Paper
2 pages
3.1 Extracting Evolution of Web Community From A Series of Web Archive
No ratings yet
3.1 Extracting Evolution of Web Community From A Series of Web Archive
18 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
B.SC (CS) Real Syllabus
No ratings yet
B.SC (CS) Real Syllabus
75 pages
Forward Chaining and Backward Chaining in Ai: Inference Engine
No ratings yet
Forward Chaining and Backward Chaining in Ai: Inference Engine
18 pages
Machine Learning Viva Questions With Answers
0% (1)
Machine Learning Viva Questions With Answers
3 pages
DAA PPT - Unit - I
No ratings yet
DAA PPT - Unit - I
111 pages
SPM Oldqnpapers
No ratings yet
SPM Oldqnpapers
6 pages
@vtucode - in Module 1 RM 2021 Scheme 5th Semester
No ratings yet
@vtucode - in Module 1 RM 2021 Scheme 5th Semester
16 pages
6 PRGM
No ratings yet
6 PRGM
6 pages
@vtucode - in 21AI54 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21AI54 Question Bank 2021 Scheme
5 pages
Content Beyond Syllabus New(1)
No ratings yet
Content Beyond Syllabus New(1)
56 pages
Research Methodology & Ipr (BRMK557
100% (1)
Research Methodology & Ipr (BRMK557
3 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages
RM IPR (21RMI56) SRN QBank
No ratings yet
RM IPR (21RMI56) SRN QBank
2 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
DS Module 2 NOtes /PPT BCS304 VTU
No ratings yet
DS Module 2 NOtes /PPT BCS304 VTU
56 pages
UNITI
No ratings yet
UNITI
6 pages
BCS515B
0% (1)
BCS515B
2 pages
III B.SC CS - Operating Systems
No ratings yet
III B.SC CS - Operating Systems
63 pages
Ai Viva Questions
No ratings yet
Ai Viva Questions
13 pages
18CS81 IoT Module - 5 Question Bank
No ratings yet
18CS81 IoT Module - 5 Question Bank
3 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
Cie QP 2 - 21ai71
No ratings yet
Cie QP 2 - 21ai71
2 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
GENAI lab
No ratings yet
GENAI lab
29 pages
Generative AI Manual 6th Sem.
No ratings yet
Generative AI Manual 6th Sem.
15 pages
4TH SEM LAB ATTENDANCE LIST-2024-25-ADA
No ratings yet
4TH SEM LAB ATTENDANCE LIST-2024-25-ADA
4 pages
AI&ML Lab manual (1)
No ratings yet
AI&ML Lab manual (1)
14 pages
3rd and 4th sem syllbus
No ratings yet
3rd and 4th sem syllbus
94 pages
Student List 23-24
No ratings yet
Student List 23-24
2 pages
Chem Folder
No ratings yet
Chem Folder
1 page
What Is Data Modelling - Types (Conceptual, Logical, Physical)
No ratings yet
What Is Data Modelling - Types (Conceptual, Logical, Physical)
10 pages
Control System Architectures
No ratings yet
Control System Architectures
5 pages
File Handling in Python
No ratings yet
File Handling in Python
17 pages
2-QUESTION PAPER DR K UMA Question Bank CS3001 SOFTWARE ENGG-converted1
No ratings yet
2-QUESTION PAPER DR K UMA Question Bank CS3001 SOFTWARE ENGG-converted1
71 pages
16406
No ratings yet
16406
55 pages
Canonical Decision Problems
No ratings yet
Canonical Decision Problems
65 pages
Ether Mach Mach3 Plugin Guide
No ratings yet
Ether Mach Mach3 Plugin Guide
24 pages
Big Switch Installation
No ratings yet
Big Switch Installation
35 pages
Brochure i950 Cabinet Servo Inverter Servoumrichter En
No ratings yet
Brochure i950 Cabinet Servo Inverter Servoumrichter En
24 pages
2022 BAC Catalogue,Per diem,fare,JO rates
No ratings yet
2022 BAC Catalogue,Per diem,fare,JO rates
163 pages
Introduction Aux Projets Urbains Et Architecturaux Et Aux NTIC
No ratings yet
Introduction Aux Projets Urbains Et Architecturaux Et Aux NTIC
14 pages
Lecture 10 - Quick Sort
No ratings yet
Lecture 10 - Quick Sort
74 pages
I Made a Secret YouTube Channel to Prove It's Not Luck [Part 1] [English] [Translated] [DownloadYoutubeSubtitles.com]
No ratings yet
I Made a Secret YouTube Channel to Prove It's Not Luck [Part 1] [English] [Translated] [DownloadYoutubeSubtitles.com]
34 pages
TI2122 - Komputasi Awan 11
No ratings yet
TI2122 - Komputasi Awan 11
19 pages
Python NumPy | GeeksforGeeks
No ratings yet
Python NumPy | GeeksforGeeks
16 pages
changelog3
No ratings yet
changelog3
7 pages
TAFJ Standalone
No ratings yet
TAFJ Standalone
55 pages
Valacich Msad8e ch08
No ratings yet
Valacich Msad8e ch08
57 pages
Олимпиада 9 класс
No ratings yet
Олимпиада 9 класс
8 pages
8255
No ratings yet
8255
28 pages
Practical Raspberry Pi Projects 2nd ED (PDFDrive)
67% (3)
Practical Raspberry Pi Projects 2nd ED (PDFDrive)
164 pages
IBM Toolbox For Java
No ratings yet
IBM Toolbox For Java
550 pages
Pass Amazon AWS Certified Solutions Architect - Associate Certification Exam in First Attempt Guaranteed!
No ratings yet
Pass Amazon AWS Certified Solutions Architect - Associate Certification Exam in First Attempt Guaranteed!
5 pages
OFC Characteristics
No ratings yet
OFC Characteristics
27 pages
CPL Assignment 1
No ratings yet
CPL Assignment 1
13 pages
Activity No 2 Empowerment Technologies
No ratings yet
Activity No 2 Empowerment Technologies
4 pages
2 Memory and bus architecture
No ratings yet
2 Memory and bus architecture
14 pages
A Task
No ratings yet
A Task
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

GENAI lab

Uploaded by

GENAI lab

Uploaded by

GENERATIVE

Continuous Internal Evaluation (CIE):

Web links and Video Lectures (e-Resources):

What Are Pre-trained Word Vectors?

Why Use Pre-trained Word Vectors?

 Efficient: No need to train a model from scratch.

Vector Arithmetic in Word Vectors

Vector arithmetic allows us to perform mathematical operations on word vectors. By adding

Word Relationships with Real-Time Examples

Example 1: Animal Relationships

 Vector("kitten") - Vector("cat") + Vector("dog") ≈ Vector("puppy")

Example 2: Fruit Relationships

 Vector("orange") - Vector("fruit") + Vector("tropical") ≈ Vector("mango")

Sample Program: Exploring Animal and Fruit Relationships

!pip install gensim

from gensim.models import KeyedVectors

# Load pre-trained GloVe vectors (100-dimensional)

from gensim.downloader import load

word_vectors = load('glove-wiki-gigaword-100') # Automatically downloads the model

# Example 1: Animal relationship (kitten → cat, puppy → dog)

result = word_vectors.most_similar(positive=['kitten', 'dog'], negative=['cat'], topn=1)

# Example 2: Fruit relationship (orange → fruit, mango → tropical fruit)

result = word_vectors.most_similar(positive=['orange', 'tropical'], negative=['fruit'], topn=1)

print("Result of 'orange - fruit + tropical':", result[0][0]) # Expected output: 'mango' or a related

Dimensionality Reduction for Word Embeddings

Word embedding’s like GloVe or Word2Vec represent words in high-dimensional

1. Principal Component Analysis (PCA): A linear method to reduce dimensions

Real-Time Visualization and Semantic Similarity Generation

Step 1: Visualize 10 Words from a Specific Domain

Step 2: Generate 5 Semantically Similar Words for a Given Input

!pip install gensim matplotlib scikit-learn numpy

import matplotlib.pyplot as plt

from sklearn.manifold import TSNE

import numpy as np # Import NumPy for array conversion

# Load pre-trained word vectors (GloVe - 100 dimensions)

tech_words = ['computer', 'internet', 'software', 'hardware', 'network', 'data', 'cloud', 'robot',

tech_words = [word for word in tech_words if word in word_vectors.key_to_index]

# Extract word vectors and convert to a NumPy array

vectors = np.array([word_vectors[word] for word in tech_words])

# Reduce dimensions using t-SNE

tsne = TSNE(n_components=2, random_state=42, perplexity=5) # Perplexity is reduced to match

# Plot the 2D visualization

for i, word in enumerate(tech_words):

plt.scatter(reduced_vectors[i, 0], reduced_vectors[i, 1], label=word)

plt.text(reduced_vectors[i, 0] + 0.02, reduced_vectors[i, 1] + 0.02, word,

fontsize=12) plt.title("t-SNE Visualization of Technology Words")

# Generate 5 semantically similar words for a given input word

topn=5) print(f"5 words similar to '{input_word}':")

for word, similarity in similar_words:

print(f"{word} (similarity: {similarity:.2f})")

print(f"'{input_word}' is not in the vocabulary.")

1. Train a custom Word2Vec model on a small domain-specific dataset (medical text).

# Install required library

!pip install gensim

from gensim.models import Word2Vec

# Step 1: Create a small dataset (list of medical-related word lists)

# Step 2: Train a Word2Vec model

# Step 3: Find similar words for a given input word

What This Code Does:

Here’s how we can enhance prompts using Word2Vec embeddings:

Use Word Embeddings:

Retrieve Similar Words:

"ai": ["machine learning", "deep learning", "data science"],

"data": ["information", "dataset", "analytics"],

"science": ["research", "experiment", "technology"],

"learning": ["education", "training", "knowledge"],

"robot": ["automation", "machine", "mechanism"]

# Step 3: Function to enrich a prompt with similar words

for word in words:

enriched_words.append(f"{word} ({', '.join(similar_words)})")

return " ".join(enriched_words)

# Step 4: Original prompt

original_prompt = "Explain AI and its applications in science."

# Step 5: Enrich the prompt using similar words

# Step 1: Pre-defined dictionary of words and their similar

# Step 2: Function to get similar words for a seed