GENAI lab
GENAI lab
Lab Manual
Generative AI Semester 6
Course Code BAIL657C CIE Marks 50
Teaching Hours/Week (L:T:P: S) 0:0:1:0 SEE Marks 50
Credits 01 Exam Hours 100
Examination type (SEE) Practical
Course objectives:
● Understand the principles and concepts behind generative AI models
● Explain the knowledge gained to implement generative models using Prompt design frameworks.
● Apply various Generative AI applications for increasing productivity.
● Develop Large Language Model-based Apps.
Sl.NO Experiments
1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic. Perform arithmetic
operations and analyze results.
2. Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings for Q 1. Select 10 words from a
specific domain (e.g., sports, technology) and visualize their embeddings. Analyze clusters and relationships.
Generate contextually rich outputs using embeddings. Write a program to generate 5 semantically similar words
for a given input.
3. Train a custom Word2Vec model on a small dataset. Train embeddings on a domain-specific corpus (e.g., legal,
medical) and analyze how embeddings capture domain-specific semantics.
4. Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word
embeddings. Use the similar words to enrich a GenAI prompt. Use the AI model to generate responses for the
original and enriched prompts. Compare the outputs in terms of detail and relevance.
5. Use word embeddings to create meaningful sentences for creative tasks. Retrieve similar words for a seed word.
Create a sentence or story using these words as a starting point. Write a program that: Takes a seed word. Generates
similar words. Constructs a short paragraph using these words.
6. Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a real-world application, Load the
sentiment analysis pipeline. Analyze the sentiment by giving sentences to input.
7. Summarize long texts using a pre-trained summarization model using Hugging face model. Load the
summarization pipeline. Take a passage as input and obtain the summarized text.
8. Install langchain, cohere (for key), langchain-community. Get the api key( By logging into Cohere and obtaining
the cohere key). Load a text document from your google drive . Create a prompt template to display the output in
a particular manner.
9. Take the Institution name as input. Use Pydantic to define the schema for the desired output and create a custom
output parser. Invoke the Chain and Fetch Results. Extract the below Institution related details from Wikipedia:
The founder of the Institution. When it was founded. The current branches in the institution . How many
employees are working in it. A brief 4-line summary of the institution.
10 Build a chatbot for the Indian Penal Code. We'll start by downloading the official Indian Penal Code document,
and then we'll create a chatbot that can interact with it. Users will be able to ask questions about the Indian Penal
Code and have a conversation with it.
Course outcomes (Course Skill Set):
At the end of the course the student will be able to:
● Develop the ability to explore and analyze word embeddings, perform vector arithmetic to investigate word
relationships, visualize embeddings using dimensionality reduction techniques
● Apply prompt engineering skills to real-world scenarios, such as information retrieval, text generation.
● Utilize pre-trained Hugging Face models for real-world applications, including sentiment analysis and text
summarization.
● Apply different architectures used in large language models, such as transformers, and understand their
advantages and limitations.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together
Objective:
To understand pre-trained word vectors and how they represent words as numbers in a
continuous space.
To explore word relationships using vector arithmetic.
To perform arithmetic operations on word vectors and analyze the results using
simple examples.
In this experiment, we will learn about pre-trained word vectors and how they help us
represent words in a way that computers can understand. These vectors capture the meaning
and context of words. For example, the word "apple" can be represented as a set of numbers
that encode its meaning. Words with similar meanings will have similar vectors.
We will also explore vector arithmetic, which is a way to perform mathematical operations
on these word vectors to discover relationships between words.
Example:
If you subtract the vector for "cat" from "kitten" and add the vector for "puppy," you get a
word related to young dogs—"dog".
Pre-trained word vectors are created by training models on large text datasets. Each word is
mapped to a numerical vector, typically with 100 to 300 dimensions, which captures the
meaning and context of the word.
Example:
The word "banana" might be represented as a vector like this:
[0.4, -0.7, 0.1, ..., 0.9]
Example:
If we want to find out what "lion" is to "cub" as "dog" is to "puppy," we can use the
following equation:
cub≈lion−adult+young\text{cub} ≈ \text{lion} - \text{adult} +
\text{young}cub≈lion−adult+young
print("Result of 'kitten - cat + dog':", result[0][0]) # Expected output: 'puppy' or a related word
Output:
Experiment 2: Visualizing Word Embedding’s and Generating Semantically
Similar Words.
Objective:
To visualize word embedding’s using dimensionality reduction techniques like PCA or t-SNE.
To select 10 words from a specific domain (e.g., sports, technology) and analyze the clusters
and relationships between them.
To generate contextually rich outputs by finding semantically similar words using pre-trained
word embedding’s.
Techniques:
We will select 10 words from the technology domain and visualize their embeddings using t-
SNE.
Given an input word, we will use pre-trained word vectors to find the 5 most semantically
similar words.
Sample Program
# Install required libraries
word_vectors = load('glove-wiki-gigaword-100')
# Select 10 words from the "technology" domain (ensure words exist in the model)
reduced_vectors = tsne.fit_transform(vectors)
plt.figure(figsize=(10, 6))
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.legend()
plt.show()
input_word = 'computer'
if input_word in word_vectors.key_to_index:
else:
Output:
Experiment 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus (e.g., legal, medical) and analyze how
embeddings capture domain-specific semantics.
Objective:
Output:
When interacting with Generative AI models (like GPT), the quality of the output often
depends on how well the input prompt is framed. Enhancing prompts using word
embeddings helps improve the model's understanding and provides more contextually rich
and detailed responses.
Example:
Original Prompt: "Explain the impact of AI on technology."
Enriched Prompt: "Explain the impact of AI, machine learning, deep learning, and data
science on technology."
Generate Responses:
Use a Generative AI model (e.g., OpenAI GPT) to generate responses for both the
original and enriched prompts.
Comparison: The enriched prompt will usually yield a more detailed and relevant
response.
# Step 1: Pre-defined dictionary of words and their similar terms (static word
embeddings)
word_embeddings = {
}
# Step 2: Function to find similar words using the static dictionary
def find_similar_words(word):
if word in word_embeddings:
return word_embeddings[word]
else:
return []
def enrich_prompt(prompt):
words = prompt.lower().split()
enriched_words = []
similar_words = find_similar_words(word)
if similar_words:
else:
enriched_words.append(word)
enriched_prompt = enrich_prompt(original_prompt)
# Step 6: Print the original and enriched prompts
print("Original Prompt:")
print(original_prompt)
print("\nEnriched Prompt:")
print(enriched_prompt)
Output:
Experiment 5: Use word embeddings to create meaningful sentences for creative
tasks. Retrieve similar words for a seed word. Create a sentence or story using these
words as a starting point. Write a program that: Takes a seed word. Generates similar
words. Constructs a short paragraph using these words.
# Step 3: Function to create a short paragraph using the seed word and similar words
def create_paragraph(seed_word):
similar_words = get_similar_words(seed_word)
if "No similar words found" in similar_words:
return f"Sorry, I couldn't find similar words for '{seed_word}'."
# Construct a short story using the seed word and similar words
paragraph = (
f"Once upon a time, there was a great {seed_word}. "
f"It was full of {', '.join(similar_words[:-1])}, and {similar_words[-1]}. "
f"Everyone who experienced this {seed_word} always remembered it as a remarkable tale."
)
return paragraph
Output:
What This Program Does:
1. Uses a static dictionary of word embeddings to find similar words for a given seed word.
2. Constructs a short paragraph using the seed word and its similar words.
3. Prints the paragraph, creating a small story based on the seed word.
Experiment 6: Use a pre-trained Hugging Face model to analyze sentiment in text.
Assume a real-world application, Load the sentiment analysis pipeline. Analyze the
sentiment by giving sentences to input.
Pre-trained model: A model that has already been trained on a large dataset and can
perform sentiment analysis without needing additional training.
Hugging Face Pipeline: An easy way to use pre-trained models for tasks like sentiment
analysis, text generation, translation, etc.
Real-world application: Analyzing customer reviews or social media comments to
understand user feedback.
Sample Program:
# Step 1: Install and import the necessary library
# You can uncomment and run this in Google Colab
# !pip install transformers
Output:
What This Program Does:
Sample Program:
# Step 1: Import the Hugging Face pipeline
from transformers import pipeline
Output:
What This Program Does:
Step-by-Step Explanation
# Step 8: Create an LLMChain with the Cohere model and prompt template
chain = LLMChain(llm=llm, prompt=prompt)
Output:
Step-by-Step Explanation:
import wikipediaapi
class InstitutionDetails(BaseModel):
founder: str
founded: str
branches: str
employees: str
summary: str
# Step 4: Function to fetch details from Wikipedia with user-agent specified
def fetch_wikipedia_summary(institution_name):
wiki_wiki = wikipediaapi.Wikipedia(language='en',
user_agent="InstitutionInfoBot/1.0 (contact: youremail@example.com)")
page = wiki_wiki.page(institution_name)
if page.exists():
return page.text
else:
prompt_template = """
- Founder
- Founded (year)
- Current branches
- Number of employees
Text: {text}
Founder: <founder>
Founded: <founded>
Branches: <branches>
Employees: <employees>
Summary: <summary>
"""
wiki_text = fetch_wikipedia_summary(institution_name)
cohere_api_key = "YOUR_COHERE_API_KEY"
llm = Cohere(cohere_api_key=cohere_api_key)
response = chain.run(wiki_text)
try:
details = InstitutionDetails.parse_raw(response)
print("Institution Details:")
print(f"Founder: {details.founder}")
print(f"Founded: {details.founded}")
print(f"Branches: {details.branches}")
print(f"Employees: {details.employees}")
print(f"Summary: {details.summary}")
except Exception as e:
Output:
Institution Details:
Founded: 1998
Step-by-step Explanation
Sample Program:
# Step 1: Install necessary packages
!pip install langchain pydantic wikipedia-api openai
while True:
user_question = input("\nYour question: ")
if user_question.lower() == "exit":
print("Goodbye!")
break