0% found this document useful (0 votes)

3 views29 pages

GENAI lab

The Generative AI Lab Manual outlines a practical course aimed at understanding generative AI principles and implementing models through various experiments. Students will explore word embeddings, train models, and apply techniques for tasks like sentiment analysis and text summarization. The course includes assessments through continuous internal evaluation and semester-end exams, with a focus on hands-on learning and real-world applications.

Uploaded by

pinkyponkey2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views29 pages

GENAI lab

Uploaded by

pinkyponkey2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

GENERATIVE

Lab Manual
Generative AI Semester 6
Course Code BAIL657C CIE Marks 50
Teaching Hours/Week (L:T:P: S) 0:0:1:0 SEE Marks 50
Credits 01 Exam Hours 100
Examination type (SEE) Practical
Course objectives:
● Understand the principles and concepts behind generative AI models
● Explain the knowledge gained to implement generative models using Prompt design frameworks.
● Apply various Generative AI applications for increasing productivity.
● Develop Large Language Model-based Apps.

Sl.NO Experiments
1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic. Perform arithmetic
operations and analyze results.

2. Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings for Q 1. Select 10 words from a
specific domain (e.g., sports, technology) and visualize their embeddings. Analyze clusters and relationships.
Generate contextually rich outputs using embeddings. Write a program to generate 5 semantically similar words
for a given input.

3. Train a custom Word2Vec model on a small dataset. Train embeddings on a domain-specific corpus (e.g., legal,
medical) and analyze how embeddings capture domain-specific semantics.

4. Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word
embeddings. Use the similar words to enrich a GenAI prompt. Use the AI model to generate responses for the
original and enriched prompts. Compare the outputs in terms of detail and relevance.

5. Use word embeddings to create meaningful sentences for creative tasks. Retrieve similar words for a seed word.
Create a sentence or story using these words as a starting point. Write a program that: Takes a seed word. Generates
similar words. Constructs a short paragraph using these words.

6. Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a real-world application, Load the
sentiment analysis pipeline. Analyze the sentiment by giving sentences to input.

7. Summarize long texts using a pre-trained summarization model using Hugging face model. Load the
summarization pipeline. Take a passage as input and obtain the summarized text.

8. Install langchain, cohere (for key), langchain-community. Get the api key( By logging into Cohere and obtaining
the cohere key). Load a text document from your google drive . Create a prompt template to display the output in
a particular manner.

9. Take the Institution name as input. Use Pydantic to define the schema for the desired output and create a custom
output parser. Invoke the Chain and Fetch Results. Extract the below Institution related details from Wikipedia:
The founder of the Institution. When it was founded. The current branches in the institution . How many
employees are working in it. A brief 4-line summary of the institution.

10 Build a chatbot for the Indian Penal Code. We'll start by downloading the official Indian Penal Code document,
and then we'll create a chatbot that can interact with it. Users will be able to ask questions about the Indian Penal
Code and have a conversation with it.
Course outcomes (Course Skill Set):
At the end of the course the student will be able to:
● Develop the ability to explore and analyze word embeddings, perform vector arithmetic to investigate word
relationships, visualize embeddings using dimensionality reduction techniques
● Apply prompt engineering skills to real-world scenarios, such as information retrieval, text generation.
● Utilize pre-trained Hugging Face models for real-world applications, including sentiment analysis and text
summarization.
● Apply different architectures used in large language models, such as transformers, and understand their
advantages and limitations.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together

Continuous Internal Evaluation (CIE):

CIE marks for the practical course are 50 Marks.
The split-up of CIE marks for record/ journal and test are in the ratio 60:40.
● Each experiment is to be evaluated for conduction with an observation sheet and record
write-up. Rubrics for the evaluation of the journal/write-up for hardware/software
experiments are designed by the faculty who is handling the laboratory session and are
made known to students at the beginning of the practical session.
● Record should contain all the specified experiments in the syllabus and each experiment
write-up will be evaluated for 10 marks.
● Total marks scored by the students are scaled down to 30 marks (60% of maximum
marks).
● Weightage to be given for neatness and submission of record/write-up on time.
● Department shall conduct a test of 100 marks after the completion of all the experiments
listed in the syllabus.
● In a test, test write-up, conduction of experiment, acceptable result, and procedural
knowledge will carry a weightage of 60% and the rest 40% for viva-voce.
● The suitable rubrics can be designed to evaluate each student’s performance and learning
ability.
● The marks scored shall be scaled down to 20 marks (40% of the maximum marks).
The Sum of scaled-down marks scored in the report write-up/journal and marks of a test is the
total CIE marks scored by the student.

Semester End Evaluation (SEE):

● SEE marks for the practical course are 50 Marks.
● SEE shall be conducted jointly by the two examiners of the same institute, examiners are
appointed by the Head of the Institute.
● The examination schedule and names of examiners are informed to the university before
the conduction of the examination. These practical examinations are to be conducted
between the schedule mentioned in the academic calendar of the University.
● All laboratory experiments are to be included for practical examination.
● (Rubrics) Breakup of marks and the instructions printed on the cover page of the answer
script to be strictly adhered to by the examiners. OR based on the course requirement
evaluation rubrics shall be decided jointly by examiners.
● Students can pick one question (experiment) from the questions lot prepared by the
examiners jointly.
● Evaluation of test write-up/ conduction procedure and result/viva will be conducted
jointly by examiners.
General rubrics suggested for SEE are mentioned here, writeup-20%, Conduction procedure
and result in -60%, Viva-voce 20% of maximum marks. SEE for practical shall be evaluated for
100 marks and scored marks shall be scaled down to 50 marks (however, based on course type,
rubrics shall be decided by the examiners)
Change of experiment is allowed only once and 15% of Marks allotted to the procedure part are
to be made zero.
The minimum duration of SEE is 02 hours
Suggested Learning Resources:
Books:
1. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of
OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4, by Valentina Alto,
Packt Publishing Ltd, 2023.
2. Generative AI for Cloud Solutions: Architect modern AI LLMs in secure, scalable, and
ethical cloud environments, by Paul Singh, Anurag Karuparti ,Packt Publishing Ltd, 2024.

Objective:

 To understand pre-trained word vectors and how they represent words as numbers in a
continuous space.
 To explore word relationships using vector arithmetic.
 To perform arithmetic operations on word vectors and analyze the results using
simple examples.

In this experiment, we will learn about pre-trained word vectors and how they help us
represent words in a way that computers can understand. These vectors capture the meaning
and context of words. For example, the word "apple" can be represented as a set of numbers
that encode its meaning. Words with similar meanings will have similar vectors.

We will also explore vector arithmetic, which is a way to perform mathematical operations
on these word vectors to discover relationships between words.

Example:
If you subtract the vector for "cat" from "kitten" and add the vector for "puppy," you get a
word related to young dogs—"dog".

What Are Pre-trained Word Vectors?

Pre-trained word vectors are created by training models on large text datasets. Each word is
mapped to a numerical vector, typically with 100 to 300 dimensions, which captures the
meaning and context of the word.

Why Use Pre-trained Word Vectors?

 Efficient: No need to train a model from scratch.

 Context-Aware: Similar words are close to each other in the vector space.
 Useful for NLP Tasks: Such as translation, sentiment analysis, and question-answering.

Example:
The word "banana" might be represented as a vector like this:
[0.4, -0.7, 0.1, ..., 0.9]

Vector Arithmetic in Word Vectors

Vector arithmetic allows us to perform mathematical operations on word vectors. By adding

or subtracting vectors, we can reveal hidden relationships between words.

Example:
If we want to find out what "lion" is to "cub" as "dog" is to "puppy," we can use the
following equation:
cub≈lion−adult+young\text{cub} ≈ \text{lion} - \text{adult} +
\text{young}cub≈lion−adult+young

Word Relationships with Real-Time Examples

Example 1: Animal Relationships

 Vector("kitten") - Vector("cat") + Vector("dog") ≈ Vector("puppy")

Example 2: Fruit Relationships

 Vector("orange") - Vector("fruit") + Vector("tropical") ≈ Vector("mango")

Sample Program: Exploring Animal and Fruit Relationships

# Install Gensim if not already installed

!pip install gensim

from gensim.models import KeyedVectors

# Load pre-trained GloVe vectors (100-dimensional)

from gensim.downloader import load

word_vectors = load('glove-wiki-gigaword-100') # Automatically downloads the model

# Example 1: Animal relationship (kitten → cat, puppy → dog)

result = word_vectors.most_similar(positive=['kitten', 'dog'], negative=['cat'], topn=1)

print("Result of 'kitten - cat + dog':", result[0][0]) # Expected output: 'puppy' or a related word

# Example 2: Fruit relationship (orange → fruit, mango → tropical fruit)

result = word_vectors.most_similar(positive=['orange', 'tropical'], negative=['fruit'], topn=1)

print("Result of 'orange - fruit + tropical':", result[0][0]) # Expected output: 'mango' or a related

word

Output:
Experiment 2: Visualizing Word Embedding’s and Generating Semantically
Similar Words.

Objective:

 To visualize word embedding’s using dimensionality reduction techniques like PCA or t-SNE.
 To select 10 words from a specific domain (e.g., sports, technology) and analyze the clusters
and relationships between them.
 To generate contextually rich outputs by finding semantically similar words using pre-trained
word embedding’s.

Dimensionality Reduction for Word Embeddings

Word embedding’s like GloVe or Word2Vec represent words in high-dimensional spaces

(usually 100 to 300 dimensions). Dimensionality reduction techniques help us visualize
these high-dimensional embedding’s in a 2D or 3D space. This makes it easier to observe
clusters and relationships between words.

Techniques:

1. Principal Component Analysis (PCA): A linear method to reduce dimensions while

preserving maximum variance.
2. t-SNE (t-Distributed Stochastic Neighbour Embedding): A non-linear method that captures
local structure and forms better clusters for visualization.

Real-Time Visualization and Semantic Similarity Generation

Step 1: Visualize 10 Words from a Specific Domain

We will select 10 words from the technology domain and visualize their embeddings using t-
SNE.

Step 2: Generate 5 Semantically Similar Words for a Given Input

Given an input word, we will use pre-trained word vectors to find the 5 most semantically
similar words.

Sample Program
# Install required libraries

!pip install gensim matplotlib scikit-learn numpy

import matplotlib.pyplot as plt

from sklearn.manifold import TSNE

from gensim.downloader import load

import numpy as np # Import NumPy for array conversion

# Load pre-trained word vectors (GloVe - 100 dimensions)

word_vectors = load('glove-wiki-gigaword-100')

# Select 10 words from the "technology" domain (ensure words exist in the model)

tech_words = ['computer', 'internet', 'software', 'hardware', 'network', 'data', 'cloud', 'robot',

'algorithm', 'technology']

tech_words = [word for word in tech_words if word in word_vectors.key_to_index]

# Extract word vectors and convert to a NumPy array

vectors = np.array([word_vectors[word] for word in tech_words])

# Reduce dimensions using t-SNE

tsne = TSNE(n_components=2, random_state=42, perplexity=5) # Perplexity is reduced to match

the small sample size

reduced_vectors = tsne.fit_transform(vectors)

# Plot the 2D visualization

plt.figure(figsize=(10, 6))

for i, word in enumerate(tech_words):

plt.scatter(reduced_vectors[i, 0], reduced_vectors[i, 1], label=word)

plt.text(reduced_vectors[i, 0] + 0.02, reduced_vectors[i, 1] + 0.02, word, fontsize=12)

plt.title("t-SNE Visualization of Technology Words")

plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")

plt.legend()

plt.show()

# Generate 5 semantically similar words for a given input word

input_word = 'computer'

if input_word in word_vectors.key_to_index:

similar_words = word_vectors.most_similar(input_word, topn=5)

print(f"5 words similar to '{input_word}':")

for word, similarity in similar_words:

print(f"{word} (similarity: {similarity:.2f})")

else:

print(f"'{input_word}' is not in the vocabulary.")

Output:
Experiment 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus (e.g., legal, medical) and analyze how
embeddings capture domain-specific semantics.

Objective:

1. Train a custom Word2Vec model on a small domain-specific dataset (medical text).

2. Analyze how the embeddings capture domain-specific word relationships.
3. Generate similar words for a given input to observe how the model learned from the
domain-specific data.

# Install required library

!pip install gensim

from gensim.models import Word2Vec

# Step 1: Create a small dataset (list of medical-related word lists)

medical_data = [
["patient", "doctor", "nurse", "hospital", "treatment"],
["cancer", "chemotherapy", "radiation", "surgery", "recovery"],
["infection", "antibiotics", "diagnosis", "disease", "virus"],
["heart", "disease", "surgery", "cardiology", "recovery"] ]

# Step 2: Train a Word2Vec model

model = Word2Vec(sentences=medical_data, vector_size=10, window=2,
min_count=1, workers=1, epochs=50)

# Step 3: Find similar words for a given input word

input_word = "patient"
if input_word in model.wv:
similar_words = model.wv.most_similar(input_word, topn=3)
print(f"3 words similar to '{input_word}':")
for word, similarity in similar_words:
print(f"{word} (similarity: {similarity:.2f})")
else:
print(f"'{input_word}' is not in the vocabulary.")

Output:

What This Code Does:

1. Creates a small medical dataset using lists of related words.
2. Trains a Word2Vec model to learn relationships between these words.
3. Finds 3 words similar to the input word, showing how well the model captures relationships.
Experiment 4: Use word embeddings to improve prompts for Generative AI model.
Retrieve similar words using word embeddings. Use the similar words to enrich a
GenAI prompt. Use the AI model to generate responses for the original and enriched
prompts. Compare the outputs in terms of detail and relevance.

When interacting with Generative AI models (like GPT), the quality of the output often
depends on how well the input prompt is framed. Enhancing prompts using word
embeddings helps improve the model's understanding and provides more contextually rich
and detailed responses.

Here’s how we can enhance prompts using Word2Vec embeddings:

Use Word Embeddings:

Word embeddings represent words as vectors in a continuous vector space. Words with
similar meanings have similar vector representations. For example, the word "AI" might
be similar to "machine learning" or "artificial intelligence."

Retrieve Similar Words:

By training or using pre-trained word embeddings, we can find words that are
semantically close to the original prompt. These similar words help make the prompt
richer.

Example:
Original Prompt: "Explain the impact of AI on technology."
Enriched Prompt: "Explain the impact of AI, machine learning, deep learning, and data
science on technology."

Generate Responses:
Use a Generative AI model (e.g., OpenAI GPT) to generate responses for both the
original and enriched prompts.
Comparison: The enriched prompt will usually yield a more detailed and relevant
response.

# Step 1: Pre-defined dictionary of words and their similar terms (static word
embeddings)

word_embeddings = {

"ai": ["machine learning", "deep learning", "data science"],

"data": ["information", "dataset", "analytics"],

"science": ["research", "experiment", "technology"],

"learning": ["education", "training", "knowledge"],

"robot": ["automation", "machine", "mechanism"]

}
# Step 2: Function to find similar words using the static dictionary

def find_similar_words(word):

if word in word_embeddings:

return word_embeddings[word]

else:

return []

# Step 3: Function to enrich a prompt with similar words

def enrich_prompt(prompt):

words = prompt.lower().split()

enriched_words = []

for word in words:

similar_words = find_similar_words(word)

if similar_words:

enriched_words.append(f"{word} ({', '.join(similar_words)})")

else:

enriched_words.append(word)

return " ".join(enriched_words)

# Step 4: Original prompt

original_prompt = "Explain AI and its applications in science."

# Step 5: Enrich the prompt using similar words

enriched_prompt = enrich_prompt(original_prompt)
# Step 6: Print the original and enriched prompts

print("Original Prompt:")

print(original_prompt)

print("\nEnriched Prompt:")

print(enriched_prompt)

Output:
Experiment 5: Use word embeddings to create meaningful sentences for creative
tasks. Retrieve similar words for a seed word. Create a sentence or story using these
words as a starting point. Write a program that: Takes a seed word. Generates similar
words. Constructs a short paragraph using these words.

# Step 1: Pre-defined dictionary of words and their similar terms

word_embeddings = {
"adventure": ["journey", "exploration", "quest"],
"robot": ["machine", "automation", "mechanism"],
"forest": ["woods", "jungle", "wilderness"],
"ocean": ["sea", "waves", "depths"],
"magic": ["spell", "wizardry", "enchantment"]
}

# Step 2: Function to get similar words for a seed word

def get_similar_words(seed_word):
if seed_word in word_embeddings:
return word_embeddings[seed_word]
else:
return ["No similar words found"]

# Step 3: Function to create a short paragraph using the seed word and similar words
def create_paragraph(seed_word):
similar_words = get_similar_words(seed_word)
if "No similar words found" in similar_words:
return f"Sorry, I couldn't find similar words for '{seed_word}'."

# Construct a short story using the seed word and similar words
paragraph = (
f"Once upon a time, there was a great {seed_word}. "
f"It was full of {', '.join(similar_words[:-1])}, and {similar_words[-1]}. "
f"Everyone who experienced this {seed_word} always remembered it as a remarkable tale."
)
return paragraph

# Step 4: Input a seed word

seed_word = "adventure" # You can change this to "robot", "forest", "ocean", "magic", etc.

# Step 5: Generate and print the paragraph

story = create_paragraph(seed_word)
print("Generated Paragraph:")
print(story)

Output:
What This Program Does:

1. Uses a static dictionary of word embeddings to find similar words for a given seed word.
2. Constructs a short paragraph using the seed word and its similar words.
3. Prints the paragraph, creating a small story based on the seed word.
Experiment 6: Use a pre-trained Hugging Face model to analyze sentiment in text.
Assume a real-world application, Load the sentiment analysis pipeline. Analyze the
sentiment by giving sentences to input.

Pre-trained model: A model that has already been trained on a large dataset and can
perform sentiment analysis without needing additional training.
Hugging Face Pipeline: An easy way to use pre-trained models for tasks like sentiment
analysis, text generation, translation, etc.
Real-world application: Analyzing customer reviews or social media comments to
understand user feedback.

Sample Program:
# Step 1: Install and import the necessary library
# You can uncomment and run this in Google Colab
# !pip install transformers

from transformers import pipeline

# Step 2: Load the sentiment analysis pipeline

sentiment_analyzer = pipeline("sentiment-analysis")

# Step 3: Define sample sentences for analysis

sentences = [
"I love using this product! It makes my life so much easier.",
"The service was terrible, and I'm very disappointed.",
"It's an average experience, nothing special but not bad either."]

# Step 4: Analyze the sentiment for each sentence

for sentence in sentences:
result = sentiment_analyzer(sentence)[0]
print(f"Sentence: {sentence}")
print(f"Sentiment: {result['label']} (Score: {result['score']:.2f})\n")

Output:
What This Program Does:

1. Loads a pre-trained Hugging Face model for sentiment analysis.

2. Analyzes the sentiment of sample sentences.
3. Prints the sentiment label (POSITIVE, NEGATIVE, or NEUTRAL) along with a
confidence score.
Experiment 7: Summarize long texts using a pre-trained summarization model
using Hugging face model. Load the summarization pipeline. Take a passage as
input and obtain the summarized text.

Sample Program:
# Step 1: Import the Hugging Face pipeline
from transformers import pipeline

# Step 2: Load the summarization pipeline

summarizer = pipeline("summarization")

# Step 3: Input a long passage for summarization

long_text = """
Artificial Intelligence (AI) is transforming various industries by automating tasks, improving
efficiency,
and enabling new capabilities. In the healthcare sector, AI is used for disease diagnosis,
personalized medicine,
and drug discovery. In the business world, AI-powered systems are optimizing customer
service, fraud detection,
and supply chain management. AI's impact on everyday life is significant, from smart
assistants to recommendation
systems in streaming platforms. As AI continues to evolve, it promises even greater
advancements in fields like
education, transportation, and environmental sustainability.
"""

# Step 4: Summarize the input passage

summary = summarizer(long_text, max_length=50, min_length=20,
do_sample=False)[0]["summary_text"]

# Step 5: Print the summarized text

print("Summarized Text:")
print(summary)

Output:
What This Program Does:

1. Uses Hugging Face's pipeline("summarization") to load a pre-trained

summarization model.
2. Processes a long text passage and reduces it to a concise summary.
3. Prints the summarized version, which highlights the key points.
Experiment 8: Install langchain, cohere (for key), langchain-community. Get the api key( By
logging into Cohere and obtaining the cohere key). Load a text document from your google
drive . Create a prompt template to display the output in a particular manner.

Step-by-Step Explanation

 Install necessary libraries: We will install langchain, cohere, and langchain-

community.
 Set up the Cohere API: Obtain your Cohere API key by logging into Cohere's platform.
 Load a text document from Google Drive.
 Create a Langchain Prompt Template to process the document and return the result
in a particular format.

# Step 1: Install necessary libraries

!pip install langchain cohere langchain-community

# Step 2: Import the required modules

from langchain.llms import Cohere
from langchain.prompts import PromptTemplate
from langchain import LLMChain
from google.colab import drive

# Step 3: Mount Google Drive to access the document

drive.mount('/content/drive')

# Step 4: Load the text document from Google Drive

file_path = "/content/drive/MyDrive/sample_text.txt" # Change this path to your file location
with open(file_path, "r") as file:
text = file.read()

# Step 5: Set up Cohere API key

cohere_api_key = "YOUR_COHERE_API_KEY" # Replace with your actual Cohere API key

# Step 6: Create a prompt template

prompt_template = """
Summarize the following text in three bullet points:
{text}
"""

# Step 7: Configure the Cohere model with Langchain

llm = Cohere(cohere_api_key=cohere_api_key)
prompt = PromptTemplate(input_variables=["text"], template=prompt_template)

# Step 8: Create an LLMChain with the Cohere model and prompt template
chain = LLMChain(llm=llm, prompt=prompt)

# Step 9: Run the chain on the loaded text

result = chain.run(text)

# Step 10: Display the formatted output

print("Summarized Output in Bullet Points:")
print(result)
What This Program Does:

1. Mounts Google Drive to access a text document (sample_text.txt).

2. Reads the document's content and prepares it for processing.
3. Uses Langchain’s PromptTemplate to create a structured request for summarization.
4. Cohere LLM processes the text and returns the summarized output in a bullet-point
format.

Output:

Summarized Output in Bullet Points:

- AI is transforming industries like healthcare, business, and education.

- Smart assistants and recommendation systems are examples of AI's impact on

daily life.

- Future advancements will bring improvements in transportation and

sustainability.
Experiment 9: Take the Institution name as input. Use Pydantic to define the schema for
the desired output and create a custom output parser. Invoke the Chain and Fetch Results.
Extract the below Institution related details from Wikipedia:The founder of the Institution.
When it was founded. The current branches in the institution . How many employees
are working in it. A brief 4-line summary of the institution.

Step-by-Step Explanation:

1. Install necessary libraries: Install langchain, pydantic, and

wikipedia-api.
2. Take institution name as input.
3. Use Pydantic to define a schema for the output (structured format).
4. Fetch institution details from Wikipedia and format the output according to
the schema.

# Step 1: Install necessary libraries

!pip install langchain pydantic wikipedia-api

# Step 2: Import required modules

from langchain.llms import Cohere

from langchain.prompts import PromptTemplate

from langchain import LLMChain

from pydantic import BaseModel

import wikipediaapi

# Step 3: Define a Pydantic schema for the institution's details

class InstitutionDetails(BaseModel):

founder: str

founded: str

branches: str

employees: str

summary: str
# Step 4: Function to fetch details from Wikipedia with user-agent specified

def fetch_wikipedia_summary(institution_name):

wiki_wiki = wikipediaapi.Wikipedia(language='en',
user_agent="InstitutionInfoBot/1.0 (contact: youremail@example.com)")

page = wiki_wiki.page(institution_name)

if page.exists():

return page.text

else:

return "No information available on Wikipedia for this institution."

# Step 5: Prompt template for extracting relevant details

prompt_template = """

Extract the following information from the given text:

- Founder

- Founded (year)

- Current branches

- Number of employees

- 4-line brief summary

Text: {text}

Provide the information in the following format:

Founder: <founder>

Founded: <founded>

Branches: <branches>

Employees: <employees>

Summary: <summary>
"""

# Step 6: Take institution name as input

institution_name = input("Enter the name of the institution: ")

# Step 7: Fetch Wikipedia data for the institution

wiki_text = fetch_wikipedia_summary(institution_name)

# Step 8: Set up Cohere (Replace YOUR_COHERE_API_KEY with your actual key)

cohere_api_key = "YOUR_COHERE_API_KEY"

llm = Cohere(cohere_api_key=cohere_api_key)

# Step 9: Create the Langchain prompt and chain

prompt = PromptTemplate(input_variables=["text"], template=prompt_template)

chain = LLMChain(llm=llm, prompt=prompt)

# Step 10: Run the chain and parse the output

response = chain.run(wiki_text)

# Step 11: Parse the response using Pydantic

try:

details = InstitutionDetails.parse_raw(response)

print("Institution Details:")

print(f"Founder: {details.founder}")
print(f"Founded: {details.founded}")

print(f"Branches: {details.branches}")

print(f"Employees: {details.employees}")

print(f"Summary: {details.summary}")

except Exception as e:

print("Error parsing the response:", e)

Output:

Enter the name of the institution: Google

Institution Details:

Founder: Larry Page, Sergey Brin

Founded: 1998

Branches: Global offices in more than 50 countries

Employees: Over 100,000

Summary: Google is a multinational technology company specializing in internet-

related services and products. It is known for its search engine, online advertising,
cloud computing, and software. Google is one of the Big Five tech companies. It was
founded by Larry Page and Sergey Brin in 1998.
Experiment 10: Build a chatbot for the Indian Penal Code. We'll start by downloading the
official Indian Penal Code document, and then we'll create a chatbot that can interact with it.
Users will be able to ask questions about the Indian Penal Code and have a conversation with
it.

Step-by-step Explanation

1. Load the IPC text document.

2. Create a chatbot using a basic question-answering chain.
3. Users can ask questions, and the chatbot will retrieve relevant sections from the IPC.

Sample Program:
# Step 1: Install necessary packages
!pip install langchain pydantic wikipedia-api openai

# Step 2: Import required modules

from langchain.chains import load_qa_chain
from langchain.docstore.document import Document
from langchain.llms import OpenAI

# Step 3: Load the Indian Penal Code text from a file

ipc_file_path = "path_to_your_ipc_file.txt" # Replace with the actual path to your IPC text file

# Read the IPC document

with open(ipc_file_path, "r", encoding="utf-8") as file:
ipc_text = file.read()

# Step 4: Create a Langchain Document object

ipc_document = Document(page_content=ipc_text)

# Step 5: Set up OpenAI (or any other LLM of your choice)

llm = OpenAI(openai_api_key="YOUR_OPENAI_API_KEY", temperature=0.3) # Use
temperature=0.3 for more factual responses

# Step 6: Create a simple question-answering chain

qa_chain = load_qa_chain(llm, chain_type="stuff")

# Step 7: Chat with the chatbot

print("Chatbot for the Indian Penal Code (IPC)")
print("Ask a question about the Indian Penal Code (type 'exit' to stop):")

while True:
user_question = input("\nYour question: ")
if user_question.lower() == "exit":
print("Goodbye!")
break

# Use the QA chain to answer the question

response = qa_chain.run(input_documents=[ipc_document], question=user_question)
print(f"Answer: {response}")
Output:
Chatbot for the Indian Penal Code (IPC)
Ask a question about the Indian Penal Code (type 'exit' to stop):

Your question: What is Section 302 of the IPC?

Answer: Section 302 of the Indian Penal Code refers to punishment for murder,
which is punishable with death or life imprisonment and a fine.

Your question: exit

Goodbye!

Computer Laboratory II
No ratings yet
Computer Laboratory II
49 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
BIS613D
No ratings yet
BIS613D
2 pages
ST(21ISL66) MANUAL
No ratings yet
ST(21ISL66) MANUAL
49 pages
Sinchana Manual Ise
No ratings yet
Sinchana Manual Ise
61 pages
PDF CompTIA Network+ certification exam guide: (exam N10-007) Seventh Edition Meyers download
100% (5)
PDF CompTIA Network+ certification exam guide: (exam N10-007) Seventh Edition Meyers download
55 pages
1745-lp101 User Manual
No ratings yet
1745-lp101 User Manual
12 pages
Generative AI Manual 6th Sem.
No ratings yet
Generative AI Manual 6th Sem.
15 pages
Fsd Question Bank
No ratings yet
Fsd Question Bank
1 page
CssyllCG LAB
No ratings yet
CssyllCG LAB
28 pages
ADA LAb Manual_May_2025 (1)
No ratings yet
ADA LAb Manual_May_2025 (1)
31 pages
CS3491 Unit 1 I
No ratings yet
CS3491 Unit 1 I
48 pages
Lab Manual of Basics of c (1)
No ratings yet
Lab Manual of Basics of c (1)
31 pages
Genai Manual
No ratings yet
Genai Manual
103 pages
HPE ProLiant DL380 Gen10 Server Data Sheet-PSN1010026818USEN
No ratings yet
HPE ProLiant DL380 Gen10 Server Data Sheet-PSN1010026818USEN
6 pages
783683 Dune House Atreides Ornithopter Kit 1655ae2a e51a 4c69 959f Ab7459ab772a
No ratings yet
783683 Dune House Atreides Ornithopter Kit 1655ae2a e51a 4c69 959f Ab7459ab772a
20 pages
Microchip Mid-Range PIC MCU Peripherals
No ratings yet
Microchip Mid-Range PIC MCU Peripherals
74 pages
Ada Manual
No ratings yet
Ada Manual
72 pages
Any Time Anywhere Computing Mobile Computing Concepts and Technology 1st edition by Abdelsalam Helal 0792386108 9780792386100 download
100% (4)
Any Time Anywhere Computing Mobile Computing Concepts and Technology 1st edition by Abdelsalam Helal 0792386108 9780792386100 download
43 pages
Functions in C
No ratings yet
Functions in C
27 pages
Santander-Bank-US-EBA-Stress-Test-Automation-Project-Playbook (1)
No ratings yet
Santander-Bank-US-EBA-Stress-Test-Automation-Project-Playbook (1)
20 pages
Data Mining Python Lab
No ratings yet
Data Mining Python Lab
208 pages
AI NEW Lab Manual-R22 BATCH-CSE
No ratings yet
AI NEW Lab Manual-R22 BATCH-CSE
32 pages
fsd (1)
No ratings yet
fsd (1)
14 pages
XYZware User Manual - EN - V3.3
No ratings yet
XYZware User Manual - EN - V3.3
50 pages
CS3501 Compiler Design Laboratory Record
No ratings yet
CS3501 Compiler Design Laboratory Record
44 pages
DDCO Lab Manual
No ratings yet
DDCO Lab Manual
76 pages
UMCONF User Manual
No ratings yet
UMCONF User Manual
35 pages
Shivarth Sports[1]
No ratings yet
Shivarth Sports[1]
27 pages
NLP Manual
No ratings yet
NLP Manual
34 pages
UI and UX Design Manual
No ratings yet
UI and UX Design Manual
70 pages
System Report
No ratings yet
System Report
25 pages
IT Workshop
No ratings yet
IT Workshop
83 pages
PE[1]Dee
No ratings yet
PE[1]Dee
21 pages
Analysis and design of algorithm lab
No ratings yet
Analysis and design of algorithm lab
30 pages
mod-5.pptx
No ratings yet
mod-5.pptx
21 pages
Alarm Clock
No ratings yet
Alarm Clock
8 pages
AI in Math
100% (1)
AI in Math
32 pages
SDC lab manual (1)
No ratings yet
SDC lab manual (1)
53 pages
EC Integration
0% (1)
EC Integration
2 pages
6cssyll
No ratings yet
6cssyll
49 pages
Generative AI 2
No ratings yet
Generative AI 2
24 pages
Machine Learning Lab Manual(BCSL606)
No ratings yet
Machine Learning Lab Manual(BCSL606)
19 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
SIPP Module-01-Social-Context-of-Computing
No ratings yet
SIPP Module-01-Social-Context-of-Computing
12 pages
ecschsyll-2
No ratings yet
ecschsyll-2
30 pages
Ada Lab Manual-2024
No ratings yet
Ada Lab Manual-2024
29 pages
ST LabManual 21ISL66
No ratings yet
ST LabManual 21ISL66
80 pages
FSD_Assignment_All_Expanded_10marks
No ratings yet
FSD_Assignment_All_Expanded_10marks
6 pages
AI Lab Manual (1)
No ratings yet
AI Lab Manual (1)
36 pages
Final Ada Ise Dept
No ratings yet
Final Ada Ise Dept
40 pages
It Workshop Lab Manual
No ratings yet
It Workshop Lab Manual
85 pages
Syllabus (1)
No ratings yet
Syllabus (1)
2 pages
AI Project Cycle: 2.1. Problem Scoping
100% (1)
AI Project Cycle: 2.1. Problem Scoping
6 pages
LPF Brochure 20 21
No ratings yet
LPF Brochure 20 21
10 pages
BRMK557 MQP-1 SOLUTIONS
No ratings yet
BRMK557 MQP-1 SOLUTIONS
22 pages
J of Business Logistics - 2023 - Richey - Artificial Intelligence in Logistics and Supply Chain Management
No ratings yet
J of Business Logistics - 2023 - Richey - Artificial Intelligence in Logistics and Supply Chain Management
18 pages
Brmk557 Mqp-2 Solutions
No ratings yet
Brmk557 Mqp-2 Solutions
22 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
88 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
61 pages
GENAI lab
No ratings yet
GENAI lab
29 pages
Electrical Impedance Tomography (EIT) and Its Medical Applications - A Review PDF
No ratings yet
Electrical Impedance Tomography (EIT) and Its Medical Applications - A Review PDF
6 pages
M.L Lab Syllabus Copy
No ratings yet
M.L Lab Syllabus Copy
4 pages
Cir 2330
No ratings yet
Cir 2330
8 pages
MD-100: Windows 10 Chapter 12 - Windows Tools
No ratings yet
MD-100: Windows 10 Chapter 12 - Windows Tools
16 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
ML Lab
No ratings yet
ML Lab
13 pages
Vfnological University
No ratings yet
Vfnological University
7 pages
FONTLOG
No ratings yet
FONTLOG
2 pages
Design For Health Chatbot
No ratings yet
Design For Health Chatbot
20 pages
Final LP-VI Lab Manual 23-24
No ratings yet
Final LP-VI Lab Manual 23-24
71 pages
PHP Programming - 22UAI318L Syllabus
No ratings yet
PHP Programming - 22UAI318L Syllabus
5 pages
syllabus
No ratings yet
syllabus
4 pages
18CSL66 - SS Lab Manual
No ratings yet
18CSL66 - SS Lab Manual
83 pages
Syllabus POP-BESCK204E
No ratings yet
Syllabus POP-BESCK204E
4 pages
Module 2
No ratings yet
Module 2
7 pages
CSE - Data Visualization Lab Manual
No ratings yet
CSE - Data Visualization Lab Manual
110 pages
CS Module 1
No ratings yet
CS Module 1
14 pages
Data Structure - Syllabus
No ratings yet
Data Structure - Syllabus
3 pages
IOTLAB Syllabus
No ratings yet
IOTLAB Syllabus
2 pages
Lab Manual ML Final
No ratings yet
Lab Manual ML Final
47 pages
Tableau Handbook v2021.01.06
No ratings yet
Tableau Handbook v2021.01.06
61 pages
Sample Question Paper 1
No ratings yet
Sample Question Paper 1
2 pages
HTML Quiz Questions
100% (1)
HTML Quiz Questions
11 pages
CC-LAB MANUAL
No ratings yet
CC-LAB MANUAL
8 pages
BAIL606-MLL
No ratings yet
BAIL606-MLL
3 pages
SE Lab Manual 3rd Sem
No ratings yet
SE Lab Manual 3rd Sem
66 pages
A Riddle Wrapped in An Enigma - Neal Koblitz, Alfred J. Menezes
No ratings yet
A Riddle Wrapped in An Enigma - Neal Koblitz, Alfred J. Menezes
21 pages
PDF05
No ratings yet
PDF05
22 pages
DSA Lab
No ratings yet
DSA Lab
4 pages
Robotic Interfase Digipulse ESAB
100% (1)
Robotic Interfase Digipulse ESAB
28 pages
ADE and CO Sysll
No ratings yet
ADE and CO Sysll
7 pages
Albert Gräf Dept. of Music Informatics: Interfacing PD With Faust
No ratings yet
Albert Gräf Dept. of Music Informatics: Interfacing PD With Faust
8 pages
BAIL657C-G-AI
100% (1)
BAIL657C-G-AI
3 pages
INT426
No ratings yet
INT426
2 pages
Data Visualization With Python: BCS358D
No ratings yet
Data Visualization With Python: BCS358D
5 pages
@vtucode - in BCSL404 Syllabus 2022 Scheme
No ratings yet
@vtucode - in BCSL404 Syllabus 2022 Scheme
3 pages
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
From Everand
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
Manish Soni
No ratings yet
AP Computer Science Principles Premium, 2026: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
From Everand
AP Computer Science Principles Premium, 2026: Prep Book with 6 Practice Tests + Comprehensive Review + Online Practice
Barron's Educational Series
No ratings yet
Agile Foundation Courseware – English
From Everand
Agile Foundation Courseware – English
Nader Rad
No ratings yet
Web Designing and Publishing: O' Level Made Simple
From Everand
Web Designing and Publishing: O' Level Made Simple
Satish Prof Jain
No ratings yet
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

GENAI lab

Uploaded by

GENAI lab

Uploaded by

GENERATIVE

Continuous Internal Evaluation (CIE):

Semester End Evaluation (SEE):

Web links and Video Lectures (e-Resources):

What Are Pre-trained Word Vectors?

Why Use Pre-trained Word Vectors?

 Efficient: No need to train a model from scratch.

Vector Arithmetic in Word Vectors

Vector arithmetic allows us to perform mathematical operations on word vectors. By adding

Word Relationships with Real-Time Examples

Example 1: Animal Relationships

 Vector("kitten") - Vector("cat") + Vector("dog") ≈ Vector("puppy")

Example 2: Fruit Relationships

 Vector("orange") - Vector("fruit") + Vector("tropical") ≈ Vector("mango")

Sample Program: Exploring Animal and Fruit Relationships

!pip install gensim

from gensim.models import KeyedVectors

# Load pre-trained GloVe vectors (100-dimensional)

from gensim.downloader import load

word_vectors = load('glove-wiki-gigaword-100') # Automatically downloads the model

# Example 1: Animal relationship (kitten → cat, puppy → dog)

result = word_vectors.most_similar(positive=['kitten', 'dog'], negative=['cat'], topn=1)

# Example 2: Fruit relationship (orange → fruit, mango → tropical fruit)

result = word_vectors.most_similar(positive=['orange', 'tropical'], negative=['fruit'], topn=1)

print("Result of 'orange - fruit + tropical':", result[0][0]) # Expected output: 'mango' or a related

Dimensionality Reduction for Word Embeddings

Word embedding’s like GloVe or Word2Vec represent words in high-dimensional spaces

1. Principal Component Analysis (PCA): A linear method to reduce dimensions while

Real-Time Visualization and Semantic Similarity Generation

Step 1: Visualize 10 Words from a Specific Domain

Step 2: Generate 5 Semantically Similar Words for a Given Input

!pip install gensim matplotlib scikit-learn numpy

import matplotlib.pyplot as plt

from sklearn.manifold import TSNE

import numpy as np # Import NumPy for array conversion

# Load pre-trained word vectors (GloVe - 100 dimensions)

tech_words = ['computer', 'internet', 'software', 'hardware', 'network', 'data', 'cloud', 'robot',

tech_words = [word for word in tech_words if word in word_vectors.key_to_index]

# Extract word vectors and convert to a NumPy array

vectors = np.array([word_vectors[word] for word in tech_words])

# Reduce dimensions using t-SNE

tsne = TSNE(n_components=2, random_state=42, perplexity=5) # Perplexity is reduced to match

# Plot the 2D visualization

for i, word in enumerate(tech_words):

plt.scatter(reduced_vectors[i, 0], reduced_vectors[i, 1], label=word)

plt.text(reduced_vectors[i, 0] + 0.02, reduced_vectors[i, 1] + 0.02, word, fontsize=12)

plt.title("t-SNE Visualization of Technology Words")

# Generate 5 semantically similar words for a given input word

similar_words = word_vectors.most_similar(input_word, topn=5)

print(f"5 words similar to '{input_word}':")

for word, similarity in similar_words:

print(f"{word} (similarity: {similarity:.2f})")

print(f"'{input_word}' is not in the vocabulary.")

1. Train a custom Word2Vec model on a small domain-specific dataset (medical text).

# Install required library

!pip install gensim

from gensim.models import Word2Vec

# Step 1: Create a small dataset (list of medical-related word lists)

# Step 2: Train a Word2Vec model

# Step 3: Find similar words for a given input word

What This Code Does:

Here’s how we can enhance prompts using Word2Vec embeddings:

Use Word Embeddings:

Retrieve Similar Words:

"ai": ["machine learning", "deep learning", "data science"],

"data": ["information", "dataset", "analytics"],

"science": ["research", "experiment", "technology"],

"learning": ["education", "training", "knowledge"],

"robot": ["automation", "machine", "mechanism"]

# Step 3: Function to enrich a prompt with similar words

for word in words:

enriched_words.append(f"{word} ({', '.join(similar_words)})")

return " ".join(enriched_words)

# Step 4: Original prompt

original_prompt = "Explain AI and its applications in science."

# Step 5: Enrich the prompt using similar words