0% found this document useful (0 votes)

22 views45 pages

Manish Capstone

Uploaded by

dorifol685

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views45 pages

Manish Capstone

Uploaded by

dorifol685

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Hyperparameter Tuning for Poetry Generation to

Enhance Creativity

Submitted in partial fulfillment of the requirements for the degree of

Bachelor of Technology
in
Computer Science and Engineering

MANISH GOYAL (20BCE2849)

Under the guidance of

DR DINESH R
Assistant Professor Grade 1
SCOPE, VIT, Vellore

School of Computer Science and Engineering,

VIT, Vellore

May, 2024

1
DECLARATION
I hereby declare that the thesis entitled “Hyperparameter Tuning for Poetry
Generation to Enhance Creativity” submitted by me, for the award of the degree of Bachelor
of Technology in Programme to VIT is a record of bonafide work carried out by me under the
supervision of Dr.Dinesh R.

I further declare that the work reported in this thesis has not been submitted and will not
be submitted, either in part or in full, for the award of any other degree or diploma in this
institute or any other institute or university.

Place : Vellore

Date : 09/05/2024

Signature of the Candidate

2
CERTIFICATE
This is to certify that the thesis entitled “Hyperparameter Tuning for Poetry
Generation to Enhance Creativity” submitted by Manish Goyal (20BCE2849), School of
Computer Science and Engineering, VIT, for the award of the degree of Bachelor of
Technology in Programme, is a record of bonafide work carried out by him under my
supervision during the period, 01. 12. 2023 to 30.04.2024, as per the VIT code of academic and
research ethics.

The contents of this report have not been submitted and will not be submitted either in
part or in full, for the award of any other degree or diploma in this institute or any other institute
or university. The thesis fulfills the requirements and regulations of the University and in my
opinion meets the necessary standards for submission.

Place : Vellore

Date : 09/05/2024

Signature of the Guide

Internal Examiner External Examiner

Head of the Department

B. Tech – CSE

3
ACKNOWLEDGEMENTS
I extend my heartfelt gratitude and appreciation to all the previous research whose
contributions made my Capstone Project course a truly enriching and rewarding journey.
Foremost, I express my gratitude to VIT Vellore and SCOPE for giving me the opportunity to
partake in this course and for their steadfast support throughout my project. Their guidance,
wealth of knowledge, and expertise have been pivotal in deepening my understanding of
essential concepts. Special appreciation goes to my mentor, Dr. Dinesh R, whose unwavering
guidance, generous sharing of time and resources, and expert insights have empowered me to
acquire practical skills and insights in the field. His mentorship has played a pivotal role in
shaping my comprehension of industry dynamics and preparing me for future career endeavors.
Furthermore, I am indebted to the support and encouragement of my reviewers. Their keen
interest and fervor for the project have served as a constant source of inspiration, driving me
forward and instilling in me a deeper passion for the subject matter. I am truly grateful for the
invaluable learning opportunities they provided. In closing, I extend my sincerest thanks to all
those who have supported and believed in me throughout this journey. Your contributions have
been instrumental in my growth and development, and I am profoundly grateful for your
unwavering support.

Manish Goyal

i
Executive Summary
Our project focuses on developing advanced models for automated poetry generation,
leveraging cutting-edge natural language processing techniques. By harnessing pretrained
models such as T5, Mistral 7B, and GPT2, we have created a versatile framework capable of
crafting poetry in various forms including ballads, epics, and odes. Through iterative training
and fine-tuning processes, we continuously enhance the models' capabilities, striving for
improved poetry generation quality. A key aspect of our work involves comparative analysis
among different models, enabling us to identify their strengths and weaknesses. This informs
our decision-making process regarding model selection and further refinement strategies.
Additionally, we prioritize documentation of system modifications and enhancements,
ensuring that our framework evolves in alignment with advancements in NLP technologies. In
summary, our project represents a significant contribution to AI-driven creative expression,
advancing the field of automated poetry generation while exploring broader applications of
natural language processing in creative domains.

ii
Table of Contents

ACKNOWLEDGEMENTS i

Executive Summary ii

Table of Contents iii

List of Figures v

List of Tables vi

Abbreviations vii

1. Introduction 1

1.1 Objectives 1

1.2 Motivation 1

1.3 Background 2

2. Project Description and Goals 2

2.1 Survey on Existing System 2

2.2 Research Gap 7

2.3 Problem Statement 8

3. Technical Specification 9

3.1 Requirements 9
3.1.1 Functional Requirements 9
3.1.2 Non-functional Requirements 10

3.2 Feasibility Study 11

3.2.1 Technical Feasibility 11
3.2.2 Economic Feasibility 11
3.2.3 Social Feasibility 12

3.3 System Specification 13

3.3.1 Hardware Specification 13
3.3.2 Software Specification 13
3.3.3 Standard and Policies 14

iii
4. Design Approach and Details 14

4.1 System Architecture 14

4.1.1 Understanding the Optimizer Used 16

4.2 Design 17
4.2.1 Data Flow Diagram 17
4.2.2 Use Case Diagram 20
4.2.3 Class Diagram 20
4.2.4 Sequence Diagram 21

4.3 Constraints, Alternatives and Tradeoffs 21

5. Schedule, Task and Milestones 22

5.1 Gantt Chart 22

5.2 Module Description 23

5.2.1 T5Model Module 23
5.2.2 Mistral Module 23
5.2.3 GPT2 Module 24
5.2.4 GPT2 Form Module 24

5.3 Testing 25

6. Result and Discussion 26

7. Summary 30

8. References 31

Appendix A – Sample Code 32

iv
List of Figures
Figure i. Basic System Architecture
Figure ii. Transformer Architecture
Figure iii. Adam vs AdamW
Figure iv. Level 0 DFD
Figure v. Level 1 DFD 18
Figure vi. Data Processing Process Level 2 DFD 18
Figure vii. Prompt Engineering Level 2 DFD 19
Figure viii. Model Selection and Fine-Tuning Process Level 2 DFD 19
Figure ix. Use Case Diagram 20
Figure x. Class Diagram 20
Figure xi. Sequence Diagram 21
Figure xii. Gantt Chart 22
Figure xiv. Poem Analysis Example
Figure xiii. Evaluating Train and Loss
Figure xv. An Example of Sentiment Analysis 26
Figure xvi. Mistral 7B GPU Utilisation
Figure xvii Mistral Training Stats 26
Figure xviii Mistral GPU Usage
Figure xix. Mistral Generated Poem 27
Figure xx. GPT2 Generated Poem 28
Figure xxi FastAI's Suggested LR 28
Figure xxii. The T5 Models's Start 29
Figure xxiii. Continuous Model Generation GPT2 Modular 29
Figure xxiv. Different Generated forms 30

v
List of Tables
NA

vi
Abbreviations

LLM Large Language Model

AI Artificial Intelligence
NLP Natural Language Processing
GPT Generative Pre-trained Transformer
T5 Text-to-Text Transfer Transformer
MMLU Massive Multi-task Language Understanding
FLAN Finetuned Language Net
DPO Direct Preference Optimization
LM Language Model
RLHF Reinforcement Learning from Human Feedback
RL Reinforcement Learning
MTF Multitask prompted Finetuning
DEFT Data-Efficient Finetuning
MLE Multi-Label Edit
LLaVA Large Language-and-Vision Assistant
BART Bidirectional and Auto-Regressive Transformer
GDPR General Data Protection Regulation
CCPA California Consumer Privacy Act
GPU Graphics Processing Unit
NLTK Natural Language Toolkit
CTRL Conditional Transformer Language Model
SGD Stochastic Gradient Descent
NER Named Entity Recognition
LoRA Low Rank Adapters
SFT Selective Fine-Tuning

vii
1. Introduction

1.1 Objectives

The domain of our work lies at the intersection of Natural Language Processing (NLP)
and creative writing, specifically focusing on the refinement of Large Language Models
(LLMs) to generate poetry with stylistic coherence and creativity. Leveraging advanced
techniques in hyperparameter tuning and fine-tuning pre-trained models like FLAN T5, Mistral
7B and GPT2, the project aims to augment the capability of LLMs in crafting poetry across
specific styles and themes. Ultimately, we seek to demonstrate the potential of LLMs as tools
for fostering artistic expression and creativity within the poetry generation. By overcoming
limitations associated with generic text production, this project underscores the potential of
LLMs to make a meaningful contribution to the field of creative writing.

1.2 Motivation

Poetry is art that thrives when we express ourselves creatively and emotionally. While
there have been a lot of NLP advancements recently, generating creative text formats like
poetry remains a challenge. This project is driven by the potential of LLMs to bridge this gap
and empower artistic exploration. The problem with many current systems designed
specifically for generating poems is that they often produce generic lines instead of original
ones due partially or wholly to limitations within their programming paradigms – meaning
there’s always room at the top for improvement when it comes to artistic AI design. Powerful
models exist, but they're often closed source and computationally expensive. This project seeks
to address this by refining open-source LLMs using readily available resources like free tier
GPUs offered by Google Colab/Kaggle notebooks. Our motivation lies not only in pushing
LLM capabilities but also in fostering a future where these models become valuable tools for
artists and writers not as a replacement by as partners for enhancing creativity. We believe this
project can demonstrate the power of LLMs as collaborators, unlocking new avenues for
creative exploration within poetry generation. By making LLM exploration more accessible
and cost-effective, this research can empower a wider range of researchers and students to
contribute to this exciting field.

1
1.3 Background

Among the diverse applications of LLMs, poetry generation stands out as a particularly
intriguing and challenging endeavor. Traditional approaches to poetry generation have often
relied on rule-based systems or statistical methods, which may struggle to capture the
intricacies of poetic language and artistic expression. In contrast, LLMs based on transformer
architectures have demonstrated remarkable capabilities in producing coherent and
contextually relevant text across different domains. LLMs offer a promising avenue for
automated poetry generation due to their ability to learn complex patterns and structures from
vast amounts of textual data. Hyperparameter tuning, a process of optimizing the configuration
parameters of a machine learning model, has emerged as a key technique for enhancing the
performance and capabilities of LLMs. Through this endeavor, the project aims to demonstrate
the potential of LLMs as powerful tools for fostering artistic expression and creativity within
the realm of poetry.

2. Project Description and Goals

2.1 Survey on Existing System

Recent research in Natural Language Processing has witnessed a surge in the application
of Large Language Models for various tasks, including question answering and machine
translation. However, a gap exists in leveraging LLMs for creative writing tasks, particularly
those requiring stylistic coherence and artistic expression, such as poetry generation. Some of
the findings are listed below along with their respective research papers:

1. Hyperband, A Novel Bandit-Based Approach to Hyperparameter Optimization:

A novel approach to hyperparameter optimization, known as Hyperband, was
developed by Lisha Li and co-researchers in this paper. Hyperband formulates
hyperparameter optimization as an adaptive resource allocation problem and is
characterized by its efficient allocation of resources among randomly sampled
hyperparameter configurations using a principled early-stopping strategy. The
algorithm enables the evaluation of significantly more configurations compared to
traditional black-box methods like Bayesian optimization. Extensions to the original
work include a comprehensive theoretical analysis of Hyperband and additional
insights to facilitate practical implementation. Furthermore, experiments on various

2
model selection tasks demonstrate the effectiveness of Hyperband in optimizing
hyperparameters for complex machine learning models.
2. Bits of Grass, Does GPT already know how to write like Whitman: The ability of
GPT-3.5, GPT-3.5-turbo (ChatGPT), and GPT-4 models to generate poems in the style
of specific authors using zero-shot and many-shot prompts was examined in this study.
The performance of models not fine-tuned for generating poetry in the style of specific
authors was assessed via automated evaluation. Findings indicated that without fine-
tuning, even when provided with maximum examples, these models did not generate
poetry in the desired style. Recommendations were made for future work to analyse
GPT's ability to write poetry in the style of other poets, particularly those using
structured and rhymed writing, and to investigate how few-shot prompt engineering
can enhance the models' ability to generate poetry in requested styles.
3. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: This
study explores how the ability of large language models to perform complex reasoning
can be significantly enhanced through the generation of a chain of thought—a series
of intermediate reasoning steps. The emergence of such reasoning abilities in
sufficiently large language models is demonstrated through a method called chain of
thought prompting, where a few demonstrations of chain of thought are provided as
exemplars in the prompt. Experiments on three large language models reveal that chain
of thought prompting improves performance on various arithmetic, commonsense, and
symbolic reasoning tasks, with striking empirical gains observed. For instance,
prompting a 540B-parameter language model with just eight chain of thought
exemplars achieves state-of-the-art accuracy on the GSM8K benchmark of math word
problems, surpassing even fine-tuned GPT-3 with a verifier.
4. Training language models to follow instructions with human feedback: This paper
explores the alignment of LLMs with user intent through fine-tuning with human
feedback. Through a method termed InstructGPT, the models are trained using
supervised learning on labeller demonstrations of desired model behaviour, followed
by reinforcement learning from human feedback. Human evaluations demonstrate that
outputs from the 1.3B parameter InstructGPT model are preferred to those from the
175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models
exhibit improvements in truthfulness and reductions in toxic output generation, with
minimal performance regressions on public NLP datasets. While InstructGPT still

3
makes simple mistakes, findings suggest that fine-tuning with human feedback holds
promise for aligning language models with human intent.
5. Scaling Instruction-Finetuned Language Models: This paper explores instruction
finetuning of language models with a focus on scaling the number of tasks, model size,
and finetuning on chain-of-thought data. Results demonstrate dramatic improvements
in performance across various model classes, prompting setups, and evaluation
benchmarks. For instance, Flan-PaLM 540B instruction finetuned on 1.8K tasks
achieves state-of-the-art performance on several benchmarks, including 75.2% on five-
shot MMLU. Flan-T5 models are also released, surpassing baseline T5 models by a
large margin. The study underscores the generality of instruction finetuning in
enhancing the performance and usability of pretrained language models.
6. The Flan Collection: Designing Data and Methods for Effective Instruction
Tuning: This study examines the design choices of instruction tuning methods,
focusing on the development of Flan 2022. Through ablation studies, it highlights the
importance of task balancing and enrichment techniques in improving model
performance. Training with mixed prompt settings, including zero-shot, few-shot, and
chain-of-thought, yields stronger performance across all settings. Flan-T5 requires less
finetuning and converges faster than T5 on single downstream tasks, making it a more
computationally-efficient starting checkpoint. The Flan 2022 collection, including
datasets, templates, and methods, is publicly available for further research.
7. Direct Preference Optimization: Your Language Model is Secretly a Reward
Model: The paper introduces a novel method called DPO for fine-tuning large-scale
unsupervised LMs to align with human preferences. Unlike existing reinforcement
learning methods, DPO optimizes the language model's behaviour using a simple
classification loss, eliminating the need for complex RL procedures. The study
demonstrates that DPO achieves comparable or better performance than existing
RLHF algorithms, particularly in sentiment modulation, summarization, and dialogue
tasks. The authors also discuss potential limitations and future research directions, such
as generalization of DPO policies and scaling to larger models. Overall, DPO offers a
computationally lightweight and effective approach to steering language model
behaviour according to human preferences.
8. Unsupervised Cross-Task Generalization via Retrieval Augmentation: The paper
introduces a retrieval-augmentation method called ReCross, aimed at enhancing the
cross-task generalization ability of language models. By leveraging unlabelled

4
examples as queries, ReCross retrieves a small subset of upstream data, which is then
used to update the multi-task model for improved generalization. The method
combines efficient dense retrieval and pair-wise reranking techniques, resulting in
significant performance improvements over non-retrieval methods and other baseline
methods. The research demonstrates the effectiveness of ReCross in improving cross-
task generalization in unsupervised settings, particularly in sentiment modulation,
summarization, and dialogue tasks. They also discuss potential future directions for
enhancing the re-learning stage, extending distant supervision mining, and analysing
the correlation between upstream data and target tasks.
9. Crosslingual Generalization through Multitask Finetuning: The paper explores the
effectiveness of MTF in enhancing the zero-shot task generalization ability of large
multilingual language models. MTF was applied to pretrained multilingual models
such as BLOOM and mT5, resulting in finetuned variants called BLOOMZ and mT0.
It was found that finetuning these large multilingual models on English tasks with
English prompts enables task generalization to non-English languages present in the
pretraining corpus. Moreover, finetuning on multilingual tasks with English prompts
further improves performance on both English and non-English tasks, leading to
various state-of-the-art zero-shot results. Additionally, the study investigates the
effectiveness of finetuning on multilingual tasks with prompts that have been machine-
translated from English to match the language of each dataset, demonstrating better
performance on human-written prompts in respective languages. Surprisingly, the
models exhibit zero-shot generalization capabilities to tasks in languages they have
never intentionally seen, suggesting the acquisition of higher-level task- and language-
agnostic capabilities. The paper introduces xP3, a composite of supervised datasets in
46 languages with English and machine-translated prompts.
10. Data-Efficient Finetuning Using Cross-Task Nearest Neighbours: The paper
introduces DEFT, a method for improving the efficiency of finetuning large language
models by leveraging cross-task nearest neighbours retrieved from a multitask data
pool. The study demonstrates that DEFT significantly outperforms traditional
finetuning methods in terms of data efficiency, yielding superior performance on
various held-out tasks. DEFT also provides better initialization for few shot finetuning
on target-task data, showcasing its effectiveness in scenarios with limited labelled data.
11. ChatPLUG: Open-Domain Generative Dialogue System with Internet-
Augmented Instruction Tuning for Digital Human: The paper introduces

5
ChatPLUG, a Chinese open-domain dialogue system for digital human applications,
trained on a diverse range of dialogue tasks using internet-augmented instruction
tuning. Unlike other models focusing solely on large-scale pre-training, ChatPLUG
aims to achieve practicality and versatility by incorporating various skills through
instruction tuning. It combines large-scale pre-training with instruction tuning to align
the dialogue agent with user intent and task-specific skills, outperforming existing
Chinese dialogue systems. ChatPLUG demonstrates strong multi-task generalization
and possesses fundamental skills such as open-world knowledge, distinct personality,
and multi-turn memory. The system's deployment in real-world applications like Smart
Speaker and Instant Message applications showcases its practical utility.
12. Unveiling the Pitfalls of Knowledge Editing for Large Language Models: The
paper explores the risks associated with editing knowledge in LLMs, emphasizing
concerns of knowledge conflict and distortion that could lead to unintended
consequences. It introduces benchmark datasets and innovative evaluation metrics to
assess these risks. Knowledge conflict arises when groups of edited facts clash
logically, magnifying inconsistencies in LLMs, while knowledge distortion occurs
when editing parameters warp the innate knowledge structure of LLMs. Experimental
results demonstrate that knowledge editing may inadvertently introduce unintended
consequences, warranting attention in future research. The CONFLICTEDIT dataset
and metrics like Conflict Magnitude are introduced to quantify knowledge conflicts,
while MLE serves as a simple solution to evaluate the impact of knowledge distortion
on post-edited models.
13. InstructZero: Efficient Instruction Optimization for Black-Box Large Language:
INSTRUCTZERO is a method that optimizes a soft prompt to generate instructions for
black-box language models, outperforming other auto-instruction methods across
various tasks. It leverages an open-source LLM to convert the soft prompt into a
human-readable instruction, which is then evaluated on the target task. Bayesian
optimization guides the optimization process by proposing new prompts to improve
zero-shot performance. Despite its limitations in using only one open-source LLM and
simpler tasks, INSTRUCTZERO provides a promising approach for optimizing
instructions without direct access to the black-box model. It reduces the complexity of
instruction optimization by operating in a low-dimensional space and demonstrates
competitive performance in generating task-relevant instructions.

6
14. LongForm: Optimizing Instruction Tuning for Long Text Generation with
Corpus Extraction: The LongForm dataset enables instruction tuning for language
models, enhancing their understanding of user intent and improving generalization
across various tasks. It leverages human-written documents augmented with
instructions, providing a cost-effective and clean dataset suitable for long text
generation. Finetuning models on LongForm outperforms larger models without
instruction tuning and prior instruction-tuned models, achieving significant
improvements in tasks like story/recipe generation and long-form question answering.
Additionally, LongForm models demonstrate effectiveness in following and answering
multilingual instructions. While the approach excels in long text generation, it has
limitations in structured prediction tasks and may encounter hallucination issues
similar to other large language models. Nonetheless, the benefits of LongForm models
for research outweigh the associated risks, especially in facilitating exploration and
improvement of instruction-following language models.
15. Visual Instruction Tuning: The paper introduces LLaVA, a large multimodal model
combining a vision encoder and a LLM to enhance visual and language understanding.
By leveraging language-only GPT-4 to generate multimodal language-image
instruction-following data, LLaVA achieves impressive chat abilities and outperforms
GPT-4 on a synthetic multimodal instruction-following dataset by 85.1%. When fine-
tuned on Science QA, LLaVA and GPT-4 achieve a new state-of-the-art accuracy. This
innovative approach addresses the less-explored realm of instruction tuning in the
multimodal field, demonstrating the potential of multimodal models in understanding
and following instructions across different modalities. The availability of GPT-4
generated visual instruction tuning data, along with the model and code, fosters further
research in visual instruction following and multimodal understanding.

2.2 Research Gap

Based on the research carried out, some existing gaps were identified. Some of them are
explored below:

1. Hyperparameter Tuning: While studies [1, 9] explore hyperparameter tuning for

various tasks, limited research focuses specifically on optimizing creativity in text
generation tasks like poetry.

7
2. Prompt Engineering: Research suggests that LLMs without fine-tuning struggle with
specific author styles [2].
3. Evaluation Metrics: Current evaluation methods often combine human evaluation
with automated metrics [2, 8]. These approaches though valuable don’t incorporating
additional metrics specific to poetry, such as rhyme scheme adherence, meter detection,
or sentiment analysis tailored for poetic language.
4. Knowledge Integration: Studies on knowledge editing for LLMs highlight potential
risks [12]. However, incorporating domain-specific knowledge (e.g., poetry thesaurus,
rhyme dictionaries) could potentially enhance the LLM's ability to generate creative
and stylistically coherent poems.
5. Multimodal Poetry Generation: Research on multimodal instruction tuning focuses
primarily on language and vision [15]. There exists a potential of incorporating visual
elements (e.g., paintings, photographs) as prompts or inspiration for the LLM to
generate poems. This could be a novel approach to explore the interplay between visual
and poetic creativity.
6. Generalizability and Explainability: While instruction tuning offers promising
results [6, 14], limited research explores the generalizability of these techniques across
different poetry styles or forms. We can investigate how well models trained on specific
poetic styles perform on unseen styles.
7. Human-in-the-Loop Poetry Generation: While some studies employ human
feedback for fine-tuning [4], we can interactive poem generation systems. This could
involve allowing users to iteratively refine prompts or provide feedback on generated
poems to guide the LLM towards a desired style or theme.

These are just some potential research gaps that we can explore in the project. By
addressing these gaps, we can contribute to the advancement of creative LLM applications and
enhance the quality and variety of AI-generated poetry.

2.3 Problem Statement

The challenge lies in optimizing LLMs to produce poems that not only adhere to specific
styles, themes, and authorial voices but also resonate with aesthetic sensibilities. There is a
pressing need to explore novel strategies, such as hyperparameter tuning and architectural
exploration, to enhance the creative output and stylistic coherence of LLM-generated poetry.
By addressing these limitations, this project aims to demonstrate the potential of LLMs as

8
powerful tools for fostering artistic expression and creativity within the domain of poetry
generation. This endeavor underscores the potential of fine-tuning LLMs for specialized tasks
like poem generation, emphasizing the significance of deliberate data selection and evaluation
methods in achieving desired stylistic outcomes.

3. Technical Specification

3.1 Requirements

3.1.1 Functional Requirements

The functional requirements describe the specific features and capabilities that the
software system must possess to fulfil its intended purpose. These requirements serve as the
basis for defining the system's behavior and functionality. Some of them are listed below:

1. Training Pipeline: The system should have a training pipeline capable of fine-tuning
pre-trained LLM models on a curated poetry corpus. It should support the integration of
different pre-trained LLM models such as GPT-2, T5, or BART. The pipeline should
facilitate hyperparameter tuning and optimization for poetry generation, including
temperature, sampling strategies, and prompt engineering.
2. Data Management: The system should manage training data effectively, including the
ingestion of poetry corpora from various sources such as Gutenberg, Kaggle, and web-
scraped collections. It should provide preprocessing capabilities to clean and format the
training data for model training.
3. Model Integration: System should integrate with a pre-trained LLM suitable for text
generation (e.g., GPT-2, T5). System should be capable to fine-tune the LLM for the
specific task of poetry generation.
4. Prompt Engineering: System should provide an interface for creating and managing
different types of prompts for poem generation. There should be functionality to specify
style, theme, poem length, and potentially rhyme scheme within the prompt.
5. Poem Generation: System should generate poems based on user-provided prompts.
Users should be able to control the length and potentially the rhyme scheme of the
generated poems.
6. Model Evaluation: The system should include mechanisms for evaluating the quality of
generated poems using both automated metrics and human assessment. It should support

9
the calculation of readability scores, grammatical correctness, and stylistic coherence of
generated poems.
7. Model Deployment: The system should enable the deployment of trained LLM models
for real-time poetry generation. It should support integration with external applications
or platforms for accessing generated poems.

3.1.2 Non-functional Requirements

The non-functional requirements specify the qualities or characteristics that the system
must exhibit. These requirements address the system's performance, usability, and other
attributes that contribute to its overall quality and effectiveness. Some of them are listed below:

1. Performance: The system should be capable of handling large-scale training tasks

efficiently, considering the computational resources required for fine-tuning LLM
models. It should minimize training time while maintaining high-quality results in poetry
generation.
2. Scalability: The system should be scalable to accommodate increasing volumes of
training data and larger model sizes as the project progresses. It should support parallel
processing and distributed computing for efficient training on large datasets.
3. Usability: The system should have a user-friendly interface for researchers and
developers to interact with, including intuitive controls for configuring training
parameters and monitoring training progress. It should provide comprehensive
documentation and tutorials to facilitate ease of use and adoption.
4. Reliability: The system should be robust and resilient, capable of handling errors and
failures gracefully during training and evaluation processes. It should include
mechanisms for data backup and recovery to prevent loss of training progress or
generated poems.
5. Security and Privacy: The system should adhere to security best practices to protect
sensitive data, including user credentials and training datasets. It should comply with
privacy regulations and guidelines when handling user data or accessing external
resources.

10
3.2 Feasibility Study

3.2.1 Technical Feasibility

• Technical Skills: The project requires expertise in natural language processing, LLM
fine-tuning techniques, and potentially software development for the user interface.

• Available Resources: Several pre-trained LLMs are publicly available, and open-
source libraries can be used for LLM fine-tuning and text generation. Cloud platforms
offer powerful computing resources for training and running the model.

• Technical Challenges:
o Fine-tuning LLMs specifically for poetry generation requires expertise and
experimentation to achieve optimal results.
o Prompt engineering plays a crucial role in guiding the LLM towards desired styles.
Considerable research and development efforts might be needed to refine prompt
creation techniques.
o Evaluating the creativity and stylistic coherence of generated poems remains
challenging. Developing robust evaluation metrics is an ongoing research area.
• Overall: The project is technically feasible with the available resources and
advancements in NLP. Addressing challenges related to fine-tuning, prompt
engineering, and evaluation will require research and development efforts.

3.2.2 Economic Feasibility

• Costs:
o Computational Resources: Training and fine-tuning LLMs can be
computationally expensive, requiring access to powerful GPUs or cloud-based
platforms. This might incur significant costs depending on the chosen LLM model
and training duration.
o Data Collection: Curating a high-quality poetry corpus might require acquiring
data from paid sources or dedicating resources to scraping and cleaning publicly
available data.
o Development: Developing the system, including the user interface and integration
with the LLM, might involve developing new tools and interfaces.

11
• Benefits:
o Educational Tools: The project could lead to educational applications, potentially
creating interactive platforms for learning about poetry styles and fostering
creativity.
o Content Creation: The system could be used for generating creative text formats
like poems for marketing materials, advertisements, or personalized greetings.
o Accessibility: The project could offer new avenues for artistic expression,
potentially making poetry creation more accessible to non-writers.
• Overall: Economic feasibility depends on the project's scale and monetization strategy.
Open-source tools and exploring free data sources could reduce financial burden.

3.2.3 Social Feasibility

• Positive Impacts:
o Promote Creativity: The project can encourage creative expression and provide a
tool for exploring different poetic styles.
o Accessibility: The system can make poetry creation more accessible to individuals
who might not consider themselves writers.
o Educational Value: The project has the potential to be used for educational
purposes, enhancing learning about poetry and creative writing.
• Negative Impacts:
o Plagiarism Concerns: Generated poems might be misused for plagiarism if proper
attribution is not established.
o Job Displacement: There's a minimal risk of the project displacing professional
poets or writers, but it could potentially affect freelance content creation jobs that
rely on writing generic poems.
o Ethical Considerations: AI-generated content raises ethical concerns about
potential biases or offensive outputs. The project should prioritize responsible AI
development practices to mitigate these risks.
• Overall: The social impact is mostly positive, promoting creativity and accessibility.
Addressing ethical concerns through responsible development practices is crucial.

12
3.3 System Specification

3.3.1 Hardware Specification

• Computing Power:
o High-performance GPUs are essential for efficient training and fine-tuning of large
language models.
o Cloud-based platforms like Google Colab, Kaggle notebooks, Amazon SageMaker,
or Microsoft Azure Machine Learning that offer access to powerful GPUs on a pay-
as-you-go basis.
• Storage:
o Sufficient storage capacity is required to house the pre-trained LLM model, the
curated poetry corpus, and any intermediate training files.
• System Memory:
o Large amounts of RAM are crucial for processing large text datasets and running
the LLM during generation. The specific amount required will depend on the
chosen LLM model.
• Networking:
o Reliable internet connectivity is essential for accessing online resources,
downloading datasets, and collaborating with team members.
o High-bandwidth internet connections will facilitate efficient data transfer and
communication.

3.3.2 Software Specification

• Operating System:
o A Linux-based operating system (preferred if the model runs on system).
• Frameworks:
o Huggingface Transformers will be used for accessing the pre-trained models.
o PyTorch and TensorFlow will be used to implement LLM fine-tuning and poem
generation functionalities.
• Programming Language and Software:
o Python will be used along with Jupyter to execute and visualize the execution and
monitor the whole process.
o CUDA and CUDNN for accessing GPU abilities through python.

13
3.3.3 Standard and Policies

1. Data Privacy and Security: The project must comply with relevant data privacy
regulations, such as GDPR or CCPA, when collecting, storing, and processing user data
or personal information. Measures should be implemented to secure sensitive data and
protect against unauthorized access or data breaches.
2. Ethical Guidelines: Adherence to ethical guidelines for AI research and development
is essential, including principles of fairness, transparency, and accountability. The
project should strive to mitigate biases in training data and model outputs, and avoid
harmful content generation.
3. Intellectual Property Rights: Respect for intellectual property rights, including
copyright and licensing agreements, is paramount when using third-party datasets, pre-
trained models, or poetry corpora. Proper attribution and permission should be obtained
for any copyrighted materials used in the project, and licensing terms should be
followed accordingly.
4. Model Deployment and Accessibility: Considerations should be given to the
accessibility and usability of the developed model, ensuring that it can be easily
deployed and integrated into applications or platforms. Documentation and tutorials
should be provided to facilitate usage by developers and end-users, and efforts should
be made to address potential biases or limitations in the model's performance.

4. Design Approach and Details

4.1 System Architecture

Figure i. Basic System Architecture

14
1. Data Collection and Preprocessing: Data sources such as the poetry foundation,
Kaggle poetry dataset, and web-scraped collections are collected. Data preprocessing
techniques are applied to clean, tokenize, and format the raw text data for training.

2. Model Training and Fine-Tuning: Pre-

trained LLM models, such as GPT-2, T5, or
Mistral, are selected based on a literature
review and preliminary experiments. The
curated poetry corpus is used to fine-tune the
selected LLM models for poetry generation.
Hyperparameter tuning techniques like grid
search, random search, and Bayesian
optimization are applied to optimize model
performance for specific poetry styles and
themes.

3. Evaluation and Validation: The creative

output of the fine-tuned LLMs is evaluated
using a combination of human assessment and
automated metrics. Human evaluators assess
the creativity, stylistic coherence, and overall Figure ii. Transformer Architecture

quality of the generated poems.

4. Prompt Engineering: Effective prompt engineering strategies are developed and

implemented to guide the LLM models towards specific styles and themes of poetry.
Prompts act as guiding instructions for the models, influencing the generated poems'
stylistic characteristics and thematic elements.

5. Deployment and Accessibility: The trained and fine-tuned LLM models, along with
the prompt engineering strategies, are deployed and made accessible for further
experimentation and integration. Documentation and tutorials are provided to facilitate
the usage of the developed models by developers and end-users. Efforts are made to
address potential biases or limitations in the model's performance, ensuring fairness and
inclusivity in its usage.

6. Iterative Improvement: The system undergoes iterative improvement based on

feedback from evaluators, users, and ongoing research in the field of natural language

15
processing and creative AI. Updates and enhancements are made to the system
architecture, data collection, preprocessing techniques, model training, and evaluation
methodologies to further enhance the creative capabilities of the LLMs for poetry
generation.

4.1.1 Understanding the Optimizer Used

We are using Adam Weight Decay for

optimizing the minimizing the loss and to
adjust the parameters during training.
Transformers, like other neural networks,
rely on an optimization algorithm to adjust
their internal parameters (weights and
biases) during training. This optimization
process minimizes a loss function, which Figure iii. Adam vs AdamW

signifies how well the transformer

performs on a given task.

AdamW provides improved convergence. AdamW's adaptive learning rates and weight
decay can help the transformer converge to a good solution faster compared to simpler
optimizers like SGD. It creates better generalization. By preventing overfitting through weight
decay, AdamW can lead to models that perform well on unseen data. Figure iii shows the
training loss curves for both Adam and AdamW optimizers. The loss decreases over epochs for
both optimizers, but AdamW may converge slightly faster and with less fluctuation due to the
inclusion of weight decay regularization.

• Exponential moving averages of gradients (𝒎𝒕 and 𝒗𝒕 ):

o These track the average and uncentered variance of historical gradients for each
parameter.
• 𝑚𝑡 = 𝛽1 . 𝑚𝑡−1 + (1 − 𝛽1 ) . 𝑔𝑡
• 𝑣𝑡 = 𝛽2 . 𝑣𝑡−1 + (1 − 𝛽2 ). 𝑔𝑡2
o 𝑡 - current iteration/step
o 𝛽1 and 𝛽2 - hyperparameters controlling the decay rates of the averages
(typically set to 0.9 and 0.999 respectively)
o 𝑔𝑡 - gradient of the loss function with respect to the parameter at current
iteration

16
• Corrected estimated moment (𝒎
̂ 𝒕 ):
o This adjusts for bias in the initial estimates of the mean.
• ̂ 𝑡 = 𝑚𝑡 / (1 − 𝛽1𝑡 )
𝑚
• Corrected variance (𝒗
̂𝒕 ):
o Similar to 𝑚
̂ 𝑡 , this corrects for bias in the initial variance estimate and adds a
small epsilon value for numerical stability.
• 𝑣̂𝑡 = 𝑣𝑡 / (1 − 𝛽2𝑡 ) + 𝜀
• Parameter update:
o The learning rate (𝜂) is scaled by the inverse square root of the variance and
then used to update the parameter based on the corrected mean.
̂𝑡
𝑚
• 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 .
√𝑣̂𝑡 + ℇ

• Decoupled Weight Decay in AdamW:

• AdamW incorporates weight decay (𝑤𝑑 ) as a separate term in the parameter update:
̂𝑡
𝑚
• 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 . − 𝑤𝑑 . 𝜃𝑡
√𝑣̂𝑡 + ℇ

o This avoids the potential issues of weight decay affecting bias updates in the
standard Adam formulation.

4.2 Design

4.2.1 Data Flow Diagram

Level 0 DFD

Figure iv. Level 0 DFD

17
Level 1 DFD

Figure v. Level 1 DFD

Level 2 DFD

Figure vi. Data Processing Process Level 2 DFD

18
Figure vii. Prompt Engineering Level 2 DFD

Figure viii. Model Selection and Fine-Tuning Process Level 2 DFD

19
4.2.2 Use Case Diagram

Figure ix. Use Case Diagram

4.2.3 Class Diagram

Figure x. Class Diagram

20
4.2.4 Sequence Diagram

Figure xi. Sequence Diagram

4.3 Constraints, Alternatives and Tradeoffs

In undertaking this project, it's essential to consider various constraints, alternatives, and
tradeoffs. Firstly, constraints include the availability of computational resources, which may
limit the scale and complexity of hyperparameter tuning and architectural exploration.
Additionally, the quality and availability of poetry datasets pose challenges, impacting the
model's ability to generate varied and stylistically coherent poetry. Time constraints further
limit the depth of experimentation achievable within the project's timeline, while the
interpretability of complex LLM architectures adds another layer of constraint, hindering
understanding and interpretation of hyperparameter effects.

To address these constraints, several alternative approaches can be considered. Transfer

learning offers the possibility of leveraging pre-trained LLMs on large text corpora to reduce
computational costs and training time. Data augmentation techniques could enrich the poetry
dataset, enhancing model generalization and diversity. Ensemble methods provide another
alternative by combining outputs from multiple LLMs or models trained with different
hyperparameters, potentially boosting the robustness and creativity of poetry generation.

21
Lastly, integrating human feedback loops during model training could offer valuable guidance,
albeit with added complexity and resource requirements.

Navigating this project requires careful consideration of various tradeoffs. Balancing

computational efficiency and model performance is crucial, as increasing optimization
complexity may enhance performance but at the cost of more computational resources and
time. Achieving harmony between diversity and coherence involves compromises in model
architecture and optimization strategies. Furthermore, there's a tradeoff between generalization
and specificity, as optimizing LLMs for specific styles or themes may sacrifice their ability to
generalize across different poetry generation tasks. Finally, there's a tension between
interpretability and performance, where opting for simpler model architectures may improve
interpretability but could limit the capture of complex poetic nuances and thus performance.

5. Schedule, Task and Milestones

5.1 Gantt Chart

Figure xii. Gantt Chart

22
5.2 Module Description

5.2.1 T5Model Module

The module represents a comprehensive framework designed to harness the T5 model's

formidable capabilities in the realm of poetry generation. Its journey begins with meticulous
data preparation and preprocessing, where a dataset of poems and associated metadata is
carefully filtered and cleaned to ensure quality inputs. This process extends to crafting prompts
for poem generation, drawing from the rich context provided by poet names, titles, and tags.
Following data preparation, the stage is set for training, with careful consideration given to
hardware resources, hyperparameters, and library imports. Custom classes like T5Dataset and
T5DataLoad seamlessly orchestrate the data pipeline, ensuring that the model is fed a
nourishing diet of meticulously curated inputs. Training and evaluation unfold as a delicate
dance, with optional integrations like Weights and Biases enhancing the monitoring and
visualization of training progress. Post-training, the module shifts its focus to the artistry of
poem generation, leveraging the T5 model's sophisticated architecture and beam search
algorithms to weave intricate tapestries of verse from the threads of provided prompts. This
holistic approach not only showcases the T5 model's technical prowess but also underscores its
potential as a tool for creative expression, bridging the realms of art and artificial intelligence
with elegance and finesse.

5.2.2 Mistral Module

The module showcased here exemplifies the process of fine-tuning the Mistral 7B large
language model using LoRA (Low-Rank Adapters) for the purpose of poetry generation. It
embarks on this journey with a meticulous setup of the environment, initiating library
installations via pip to ensure access to essential tools for quantization, model architecture, and
training acceleration. Moreover, the integration of Hugging Face and Weights & Biases adds a
layer of functionality for model management and performance monitoring throughout the
training process. Configuration and data loading follow suit, with careful consideration given
to paths for model and dataset access, thereby laying the groundwork for subsequent model
manipulation and training. Loading and quantizing the base Mistral 7B model mark a pivotal
moment in the process, with the integration of BitsAndBytesConfig enabling efficient
quantization for model optimization. The subsequent addition of Low-Rank Adapters (LoRA)
further enhances model flexibility and efficiency, setting the stage for hyperparameter setup
and training. This phase sees the meticulous tuning of various training parameters, from

23
optimizer selection and learning rate scheduling to gradient accumulation and sequence length
grouping. The utilization of SFTTrainer for Selective Fine-tuning encapsulates the training
process, orchestrating the fine-tuning of the Mistral 7B model with LoRA adapters on the poem
dataset. Upon completion of training, the fine-tuned model is saved, and evaluation ensues,
exemplified by text generation using a provided prompt. This comprehensive approach not
only demonstrates the technical intricacies of model fine-tuning but also showcases the model's
creative potential in the domain of poetry generation. Through each step of the process, from
environment setup to evaluation, the module underscores the fusion of innovation and artistry
inherent in modern language model development.

5.2.3 GPT2 Module

This module exemplifies the process of fine-tuning the GPT-2 language model for poetry
generation using TensorFlow and the Hugging Face Transformers library. It commences with
data preprocessing, where a dataset of poetry is read from a CSV file and cleaned by removing
any NaN values. The poems are then concatenated into a single string, ensuring a continuous
flow of text for training. Subsequently, the dataset is saved into a text file in the specified data
location. The GPT-2 tokenizer is initialized, followed by configuration setup for the GPT-2
model, defining parameters such as vocabulary size and special tokens. The model is
instantiated with the configured settings, setting the stage for training. The text data is
tokenized, and examples are generated by segmenting the tokenized text into blocks of a fixed
size. These examples are then split into inputs and labels, forming the basis of the training
dataset. TensorFlow's dataset API is utilized to create an efficient input pipeline, shuffling and
batching the data for training. The model is compiled with an Adam optimizer and sparse
categorical cross-entropy loss function. Training commences on the dataset, iterating through
epochs to refine the model's parameters. Following training, the fine-tuned model is ready for
text generation. This is demonstrated by providing prompts to the model and generating text
sequences using beam search, showcasing the model's ability to produce coherent and
contextually relevant poetry. The module encapsulates the entire pipeline of fine-tuning and
utilizing the GPT-2 model for poetry generation, underscoring the synergy between deep
learning frameworks and natural language processing libraries in creative AI applications.

5.2.4 GPT2 Form Module

This model illustrates the process of fine-tuning the GPT-2 language model for poetry
generation using the Fastai and Hugging Face Transformers libraries. Initially, the dataset,

24
organized into folders based on forms and topics, is read using Fastai's get_text_files function,
facilitating easy access to the poems. The number of poems in the dataset is printed for
reference. Next, the poems are grouped based on their forms, and forms with fewer than 25
sample poems are filtered out. This preprocessing step ensures that the model is trained on
forms with a sufficient number of examples to learn meaningful patterns. A function is defined
to create a dictionary containing forms with at least 25 poems as keys and their corresponding
poems as values. Training is then performed iteratively for selected forms. For each form, the
poems are loaded, split into training and validation sets, and tokenized using the GPT-2
tokenizer. The model is trained using the Fastai Learner class, which handles the training loop,
optimization, and evaluation. Training progress is monitored, and the model is saved after
training completion. The module also includes a function for generating poetry based on user
prompts. The trained model is utilized to generate poetry given a starting prompt. Beam search
is employed to improve the quality of generated sequences, ensuring coherence and relevance.

5.3 Testing

The functionality of the modules undergoes rigorous testing by generating poems and
evaluating their quality. This evaluation process involves analyzing the poems using natural
language processing (NLP) techniques and comparing the results using various graphical
representations. Fig xiv. Analyze the poem
generated by evaluating the theme,
language, literacy device, sound and rhythm
of it using a LLM. During the training
phase, the train/loss is continuously tracked
to measure the performance of the fine
tuning. Mistral performed better than flan
T5 and GPT 2 in this aspect with below 3%
loss consistently.

Figure xiii. Poem Analysis Example Figure xiv. Evaluating Train and Loss

25
The testing phase also utilizes the spacy library to analyse sentiment in the generated
poem text. This helps to read the mood of the poem that is being generated by the model. This
provide us the feedback of if the sentiment is what we really needed from the model or not.

Figure xv. An Example of Sentiment Analysis

6. Result and Discussion

The utilization of various models empowers us to effectively craft poems across a rich
spectrum of forms, encompassing diverse styles like ballads, epics, odes, and more. By
harnessing a range of models, including those pretrained on T5, Mistral 7B, and GPT2
architectures, we not only gain the ability to deploy existing models but also embark on the
journey of refining them through iterative training sessions. This iterative process enables us
to fine-tune the models to better capture the nuances of poetry generation, thereby enhancing
their creative output.

Figure xvi. Mistral 7B GPU Utilisation

Figure xvii Mistral Training Stats

26
We finetuned Mistral 7B
instruct model using bitsandbytes,
peft, lora, etc. which made the
training efficient. The GPU
utilization in fig. xvi, xvii and xviii
show the training statistics. T5 had
less parameters than mistral so it
could be trained without the use of
above techniques, but the
performance was hampered. We
trained for 10 epoch. Mistral had Figure xviii Mistral GPU Usage
better performance than flan T5 it
has more hyperparameters and was trained more efficiently. We also experimented with the
GPT 2 model. By loading the trained weights of the model, performing the prompt engineering
the generated results were good. We utilized FastAI’s GPT2 trained weights and trained them
on our own dataset to generate poems of different genres. Overall, the quality of poems
generated my LLM improved utilizing open source LLM’s, dataset and free tire GPUs provided
by Kaggle and Google colab. And we were able to achieve our objectives without over reliance
on expensive GPUs and hardware resources.

Figure xix. Mistral Generated Poem

Furthermore, our research methodology involves meticulous documentation of system

modifications and enhancements. By systematically tracking changes and identifying areas for
improvement, we ensure that our poetry generation framework evolves in tandem with the
evolving landscape of natural language processing technologies. This commitment to
continuous improvement underscores our dedication to advancing the state-of-the-art in AI-
powered creative expression.

27
Figure xx. GPT2 Generated Poem

Figure xxi FastAI's Suggested LR

28
Figure xxii. The T5 Models's Start

Figure xxiii. Continuous Model Generation GPT2 Modular

29
Figure xxiv. Different Generated forms

7. Summary

Our project is mainly based around the development and refinement of models for
automated poetry generation. By adopting various natural language processing (NLP)
techniques, we have built a framework capable of crafting poems in diverse forms such as
ballads, epics, odes, and more.

The basis to our approach is the refining and further training of various pretrained models
based on T5, Mistral 7B, and GPT2 architectures. These models serve as the foundation upon
which we conduct iterative training sessions, fine-tuning their parameters to enhance their
ability to generate high-quality poetry.

A significant aspect of our project involves comparative analysis among different models.
Through meticulous evaluation, we discern the strengths and weaknesses of each model,
informing our decision-making process regarding model selection and further refinement
strategies. By systematically tracking changes and identifying areas for improvement, we
ensure that our poetry generation framework evolves in tandem with advancements in NLP
technologies.

Overall, our project represents a comprehensive exploration of AI-powered creative

expression, contributing to the advancement of automated poetry generation and broader
applications of natural language processing in creative domains.

30
8. References

1. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016, August).
Hyperband: A novel bandit-based approach to hyperparameter optimization. In
Proceedings of the 33rd International Conference on Machine Learning (ICML) (pp. 3560-
3568).
2. Sawicki, P., Grześ, M., Goes, F., et al. (2023, June). Bits of Grass: Does GPT already know
how to write like Whitman? In Proceedings of the International Conference on
Computational Creativity (ICCC) (pp. 1-7).
3. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... Le, Q. V. (2022,
January 25). Chain-of-thought prompting elicits reasoning in large language models.
4. Ouyang, L., Wu, J., Jiang, X., et al. (2022, March 7). Training language models to follow
instructions with human feedback.
5. Chung, H. W., Hou, L., Longpre, S., et al. (2022, October 26). Scaling instruction-
finetuned language models.
6. Longpre, S., Hou, L., Vu, T., et al. (2023, January 18). The Flan collection: Designing data
and methods for effective instruction tuning.
7. Rafailov, R., Sharma, A., Mitchell, E., et al. (2023). Direct preference optimization: Your
language model is secretly a reward model.
8. Lin, B. Y., Tan, K., Miller, C., et al. (2022, April 20). Unsupervised cross-task
generalization via retrieval augmentation.
9. Muennighoff, N., Wang, T., Sutawika, L., et al. (2022, November 1). Crosslingual
generalization through multitask finetuning.
10. Ivison, H., Smith, N. A., Hajishirzi, H., & Dasigi, P. (2022, December 1). Data-efficient
finetuning using cross-task nearest neighbors.
11. Tian, J., Chen, H., Xu, G., et al. (2023, April 19). ChatPLUG: Open-domain generative
dialogue system with internet-augmented instruction tuning for digital human.
12. Li, Z., Zhang, N., Yao, Y., et al. (2023). Unveiling the pitfalls of knowledge editing for
large language models.
13. Chen, L., Chen, J., Goldstein, T., et al. (2023). InstructZero: Efficient instruction
optimization for black-box large language models.
14. Koksal, A., Schick, T., Korhonen, A., & Schutze, H. (2023). LongForm: Optimizing
instruction tuning for long text generation with corpus extraction.
15. Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning.

31
Appendix A – Sample Code
import pandas as pd
import tensorflow as tf
from transformers import GPT2Config, TFGPT2LMHeadModel, GPT2Tokenizer
from transformers import WEIGHTS_NAME, CONFIG_NAME
import os
data = pd.read_csv('../input/poetry-foundation-poems/PoetryFoundationData.csv')
data = data.dropna()
data = data['Poem'].str.lower()
string = ''
for x in data:
string += x + "</s>"
data_location = "data"
if not os.path.exists(data_location):
os.makedirs(data_location)

with open('./data/poetry.txt', 'w', encoding='utf-8') as f:

f.write(string)
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
configuration = GPT2Config(
vocab_size=tokenizer.vocab_size,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id
)
model = TFGPT2LMHeadModel(configuration)
textfile = open("./data/poetry.txt", "r", encoding='utf-8')
text = textfile.read()
textfile.close()
string_tokenized = tokenizer.encode(text)
print("Done tokenizing")
examples = []

32
block_size = 100
BATCH_SIZE = 12
BUFFER_SIZE = 1000
for i in range(0, len(string_tokenized) - block_size + 1, block_size):
examples.append(string_tokenized[i:i + block_size])
inputs, labels = [], []
for ex in examples:
inputs.append(ex[:-1])
labels.append(ex[1:])
dataset = tf.data.Dataset.from_tensor_slices((inputs, labels))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
print("Done creating dataset")
optimizer = tf.keras.optimizers.Adam(
learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
model.fit(dataset, epochs=30)
text = "I used to love a girl"
input_ids = tokenizer.encode(text, return_tensors='tf')

beam_output = model.generate(
input_ids,
max_length=50,
num_beams=5,
temperature=0.7,
no_repeat_ngram_size=2,
num_return_sequences=5
)

print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

text = "There's a town"

33
input_ids = tokenizer.encode(text, return_tensors='tf')

beam_output = model.generate(
input_ids,
max_length=50,
num_beams=5,
temperature=0.7,
no_repeat_ngram_size=2,
num_return_sequences=5
)
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

training_arguments = TrainingArguments(
output_dir="./results",
num_train_epochs=1,
per_device_train_batch_size=4,
gradient_accumulation_steps=1,
optim="paged_adamw_32bit",
save_steps=50,
logging_steps=25,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False,
bf16=False,
max_grad_norm=0.3,
max_steps=250,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant",
report_to="wandb"
)

34
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
max_seq_length= None,
dataset_text_field="text",
tokenizer=tokenizer,
args=training_arguments,
packing= False,
)

SAP Business Technology Platform (SAP BTP)
No ratings yet
SAP Business Technology Platform (SAP BTP)
5 pages
Review 2 Document 20BCE2849
No ratings yet
Review 2 Document 20BCE2849
43 pages
Review 2 20BCE2849
No ratings yet
Review 2 20BCE2849
41 pages
FinalMP 2 Removed
No ratings yet
FinalMP 2 Removed
46 pages
Voice Sample
No ratings yet
Voice Sample
44 pages
Ranim Copy
No ratings yet
Ranim Copy
69 pages
Textual Reasoning - 1 - Merged
No ratings yet
Textual Reasoning - 1 - Merged
36 pages
B Tech Report Format Latex Final 2-1-2025
No ratings yet
B Tech Report Format Latex Final 2-1-2025
88 pages
B Tech Report Format Latex Final 2-1-2025
No ratings yet
B Tech Report Format Latex Final 2-1-2025
79 pages
3 - Round The Clock Virtual Friend - Report
No ratings yet
3 - Round The Clock Virtual Friend - Report
41 pages
Final Rev
No ratings yet
Final Rev
60 pages
Speech Emotion Recognition Using DL
No ratings yet
Speech Emotion Recognition Using DL
70 pages
Mini Project Doc
No ratings yet
Mini Project Doc
56 pages
AI Mini Project
No ratings yet
AI Mini Project
22 pages
Gptzero Ai Scan
No ratings yet
Gptzero Ai Scan
9 pages
21bce1450
No ratings yet
21bce1450
61 pages
Final Eval Report PDF
No ratings yet
Final Eval Report PDF
89 pages
Minor
No ratings yet
Minor
48 pages
Final Report - 12
No ratings yet
Final Report - 12
60 pages
Final Year Report PP
No ratings yet
Final Year Report PP
251 pages
Major Report Final 000000
No ratings yet
Major Report Final 000000
36 pages
Updated Project File
No ratings yet
Updated Project File
77 pages
Project
No ratings yet
Project
76 pages
Fyp Report VCM 1
No ratings yet
Fyp Report VCM 1
61 pages
Voice Based System Assistant Using NLP and Deep Learning-1
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning-1
82 pages
Final Year Project Report
No ratings yet
Final Year Project Report
57 pages
Pardeep Major Project Report
No ratings yet
Pardeep Major Project Report
48 pages
Fire Extinguisher Robot (Report)
No ratings yet
Fire Extinguisher Robot (Report)
93 pages
4 2final
No ratings yet
4 2final
34 pages
Multi-Modal Emotion Detection: University of Mumbai
No ratings yet
Multi-Modal Emotion Detection: University of Mumbai
9 pages
Sample Project Final Document
No ratings yet
Sample Project Final Document
68 pages
AIML Major Minor FORMAT DeepFake
No ratings yet
AIML Major Minor FORMAT DeepFake
21 pages
Mini Project Report
No ratings yet
Mini Project Report
31 pages
Minor Project-1 R21-Cse Report Template Ss2425
No ratings yet
Minor Project-1 R21-Cse Report Template Ss2425
39 pages
Visvesvaraya Technological University: "Virtual Traffic Police Automatic Helmet Detection and Number Plate Recognition"
No ratings yet
Visvesvaraya Technological University: "Virtual Traffic Police Automatic Helmet Detection and Number Plate Recognition"
57 pages
Final Project Report: AI Based Smart Teaching Assistant For Personalized Exam Preparation
No ratings yet
Final Project Report: AI Based Smart Teaching Assistant For Personalized Exam Preparation
16 pages
Draft
No ratings yet
Draft
80 pages
517 Modified
No ratings yet
517 Modified
52 pages
Hand2voice Final Report PDF
No ratings yet
Hand2voice Final Report PDF
67 pages
Design and Analysis of Oversampling ADC
No ratings yet
Design and Analysis of Oversampling ADC
36 pages
Mini Project Report4.2
No ratings yet
Mini Project Report4.2
27 pages
Minor PROJECT WS 21 22
No ratings yet
Minor PROJECT WS 21 22
37 pages
Iotmini
No ratings yet
Iotmini
21 pages
Capstone Ragchatbot
No ratings yet
Capstone Ragchatbot
84 pages
Capstone Project Final Report - 19BCE2549
No ratings yet
Capstone Project Final Report - 19BCE2549
43 pages
Group Number - 2 - MOVING OBJECT DETECTION USING YOLO Algorithm - Kaustav
No ratings yet
Group Number - 2 - MOVING OBJECT DETECTION USING YOLO Algorithm - Kaustav
44 pages
Final Modified Document
No ratings yet
Final Modified Document
63 pages
Sample Report
No ratings yet
Sample Report
74 pages
Prashant Project Report Latest
No ratings yet
Prashant Project Report Latest
49 pages
Tamil Tej Sarguru Project
No ratings yet
Tamil Tej Sarguru Project
55 pages
Thesis Format
0% (1)
Thesis Format
36 pages
CBIT B E AIDS Major Project Part 1 Report Template 2
No ratings yet
CBIT B E AIDS Major Project Part 1 Report Template 2
62 pages
All Document Reader 1731492384732
No ratings yet
All Document Reader 1731492384732
45 pages
Classification of Endangered Bird Species of Nepal Using Deep Learning
No ratings yet
Classification of Endangered Bird Species of Nepal Using Deep Learning
43 pages
TIP Project Report-MihirPatil
No ratings yet
TIP Project Report-MihirPatil
40 pages
First One Merged
No ratings yet
First One Merged
31 pages
Digital Eye An Eye Which Makes The Life of Visually Impaired More Colorful
No ratings yet
Digital Eye An Eye Which Makes The Life of Visually Impaired More Colorful
8 pages
Ouyang Aouyang Meng Eecs 2023 Thesis Transformer Arithmetic Intensity
No ratings yet
Ouyang Aouyang Meng Eecs 2023 Thesis Transformer Arithmetic Intensity
93 pages
Blackbook Finalversion
No ratings yet
Blackbook Finalversion
39 pages
Improving Reading Comprehension Skill of Civil Engineering Students Through Collaborative Strategy
No ratings yet
Improving Reading Comprehension Skill of Civil Engineering Students Through Collaborative Strategy
9 pages
3GPP Decoder For LTE, UMTS and GSM - Download FREE
No ratings yet
3GPP Decoder For LTE, UMTS and GSM - Download FREE
9 pages
Sample Church Survey For Pastor Searches Template Vanderbloemen Search Group
No ratings yet
Sample Church Survey For Pastor Searches Template Vanderbloemen Search Group
5 pages
Cap540 Download Problems
No ratings yet
Cap540 Download Problems
6 pages
Lab 5
No ratings yet
Lab 5
10 pages
NS LogMessages
No ratings yet
NS LogMessages
54 pages
B.E (2019 Pattern)
No ratings yet
B.E (2019 Pattern)
2 pages
Pronoun
No ratings yet
Pronoun
25 pages
Contoh Format Skrip Role Play (F2F)
No ratings yet
Contoh Format Skrip Role Play (F2F)
7 pages
Stephen Hawking's First Paper
No ratings yet
Stephen Hawking's First Paper
10 pages
1 Acca Style Guide: Page Subject
No ratings yet
1 Acca Style Guide: Page Subject
32 pages
JHS 1 Eng WK7
No ratings yet
JHS 1 Eng WK7
5 pages
Loop IBM
No ratings yet
Loop IBM
3 pages
Word Warlock: How To Present Your Vocabulary Words To The Group
No ratings yet
Word Warlock: How To Present Your Vocabulary Words To The Group
3 pages
Comic Strips: Comic Strip Definition & Meaning
No ratings yet
Comic Strips: Comic Strip Definition & Meaning
20 pages
01 Introduction
No ratings yet
01 Introduction
8 pages
Unit - V: Principles of HDL
No ratings yet
Unit - V: Principles of HDL
56 pages
NANDARAJJAAT Thefinaldraft
No ratings yet
NANDARAJJAAT Thefinaldraft
165 pages
L1 Index
No ratings yet
L1 Index
2 pages
Shady Hekmat Nasser
No ratings yet
Shady Hekmat Nasser
607 pages
3 Files-06-11-23
No ratings yet
3 Files-06-11-23
23 pages
Ajaj PLC - LAB - REPORT
No ratings yet
Ajaj PLC - LAB - REPORT
13 pages
A1 Módulo 1 Multifluent Book
No ratings yet
A1 Módulo 1 Multifluent Book
22 pages
Holiday Homework XII 2025-26
No ratings yet
Holiday Homework XII 2025-26
2 pages
Principles of Australian Equity and Trusts 5th Edition Peter Radan PDF Download
100% (1)
Principles of Australian Equity and Trusts 5th Edition Peter Radan PDF Download
41 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
206 pages
Social N Regional Notes
No ratings yet
Social N Regional Notes
5 pages
S Ios: Flew B!F Ihe Moon
No ratings yet
S Ios: Flew B!F Ihe Moon
17 pages
Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
No ratings yet
Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.