PAPER Prompt Engineering For LLM
PAPER Prompt Engineering For LLM
DOI: 10.5923/j.se.20241102.02
1
IT Senior Product Manager, Amazon.com Inc., Seattle WA, USA
2
Computer Science Engineer, Integral University, Lucknow UP, India
Abstract Large language models (LLMs) have revolutionized the field of natural language processing, demonstrating
remarkable capabilities in tasks such as text generation, summarization, and question-answering. However, their performance
is highly dependent on the quality of the prompts or inputs provided. Prompt engineering, the practice of crafting prompts to
guide LLMs towards desired outputs, has emerged as a critical area of study and application. This paper provides an analysis
of various prompt engineering techniques, ranging from basic methods to advanced strategies, aimed at enhancing the
performance and reliability of generative AI systems. Through illustrative examples and theoretical insights, we demonstrate
how these techniques can unlock the full potential of LLMs across diverse real-world use cases. Our findings contribute to the
growing body of knowledge in this rapidly evolving field and offer practical guidance for Information (IT) professionals and
researchers working with generative AI applications.
Keywords Generative AI, Prompt engineering, Chain of thoughts, Large language models (LLMs), RAG
* Corresponding author:
aamiraarfi@gmail.com (Syed Aamir Aarfi)
Received: Sep. 19, 2024; Accepted: Oct. 3, 2024; Published: Nov. 19, 2024
Published online at http://journal.sapub.org/se Figure 1. Prompt input and output to LLMs
20 Syed Aamir Aarfi and Nashrah Ahmed: Prompt Engineering for Generative AI: Practical Techniques and Applications
Prompts are instructions given to the LLMs to generate a LLM to generate step-by-step reasoning before producing
response for a specific task. We can elaborate the task set the final answer, potentially improving performance on tasks
conditions, show examples, or even define the output format requiring logical reasoning. Through examples, we will learn
of the expected response from the model. how these basic LLM prompt techniques can be used in
A prompt may have following parts (Table 1): different use cases.
Instructions: Task or instructions for the LLM model (e.g.,
summarise, give examples, calculate, etc). 4.1. Zero-Shot Prompt
Context: Other relevant information to augment the model Zero-shot prompting is the simplest prompting strategy,
response. (e.g., use this structure, recent news, etc) where the model is given only the task description or input,
Output structure: Type or format of the expected output without any additional examples or guidance. This approach,
generated by the LLM. (e.g., bullet points, json file, etc) as shown in Table 2, tests the broad capabilities acquired by
Input: Input query or question to get responses from the the LLM during pretraining.
LLM model.
Table 2. Zero-shot LLM Prompt
Table 1. Parts of an LLM prompt – An example
Summarize the key points of the article: "The Future of
Prompt
Context Below is the recent customer chat history. Robotics and Its Impact on society… [article text]".
Instruction Summarize the pain point of the customer [LLM generates a summary based solely on its
LLM
pretraining knowledge, without any additional context
Output Structure in one line. Response
or examples.]
Recent chat history of the customer
Agent: “Hi there, how can we help?” 4.2. One-Shot and Few-Shot Prompt
Input Customer: “I can’t buy ABC book”
One-shot prompting involves providing a single example
Customer: “The buy button isn’t working”
before the test input (Table 3), while few-shot prompting
Agent: “This book is out of stock”
provides multiple examples (Table 4). These examples
Different LLM models may require prompts to be provided demonstrate the desired output format or structure, grounding
in a specific format or structure. Anthropic’s Claude LLM the task for the model.
model is trained on human or assistant way to interaction, Table 3. One-shot LLM Prompt
and this needs to be incorporated in the prompt. Claude e.g.,
Human: What is the sky blue? While, Open Assistant LLM TRUE or FALSE: Aurora lights see often seen in
Prompt Northern hemisphere? The answer is TRUE
model requires specific tokens to be configured as parts of
TRUE or FALSE: Does it snow in near equator?
the prompt. Open Assistant e.g., <lprefix_beginl> You are a
LLM Response The answer is FALSE
large language model that wants to be helpful. <lprefix_endl>
<lprompterl> Why is the sky blue? <lendoftextl> <lassistantl>. One-shot is suitable for simple task using a capable LLM
To generate meaningful responses from LLMs, it is model where one example might be enough. However,
important to write clear, and specific instructions. It is few-shot prompts can be used for complex tasks, and less
preferable to highlight the part of the prompt that model capable models. Examples only guide the model but do
should consider, add details, and define constraints to force not generally enhance it. Sometimes with capable models,
the model generate responses that can be logical and useful. single-shot prompts may yield effective responses.
This will require an iterative approach, and if the model
requires multiple steps, we may instruct the model to follow Table 4. Few-shot LLM Prompt
a step-by-step approach. Depending upon the complexity Lamb :: Sheep: Kid :: ? Answer is Goat
of the task, the need for prompts engineering technique Prompt Baby :: Human: Calf :: ? Answer is Whale
implementation may vary from basic to advance. Puppy :: Dog : Cub :: ?
LLM Response Answer is Lion
While Few-shot CoT can improve results, Kojima et al particularly for complex or specialized tasks. This has given
(2022) [3] discovered that appending “Let’s think step by rise to automated prompt engineering methods that aim to
step” instructions to prompt elicits reasoning chain by the streamline and optimize the prompting process itself, further
LLM which generates more accurate responses. LLMs unlocking the potential of these powerful language models.
models when forced to reason, they tend to generate more
precise answer by using zero shot prompts techniques as 5.1. Self-Consistency Prompt
demonstrated by Kojima et al (2022) [3] in Table 6. Self-consistency builds on chain-of-thought (CoT)
CoT can improve the performance of the LLMs on prompting by generating multiple, diverse reasoning paths
tasks where reasoning is required such as common sense, (Figure 2) through few-shot CoT prompting and using them
symbolic or arithmetic. These prompts can be designed for to verify the consistency of the responses. Wang et al. (2022)
specific types of problems. As per Wei et al (2022), the [3] demonstrated that self-consistency increases the accuracy
performance gains are visible in LLM models with over 100 of CoT reasoning in LLMs. This approach aims to boost
billion parameters. Smaller models may generate illogical performance on arithmetic and common-sense reasoning
CoT based responses, with a lower precision. tasks by exploring multiple CoTs of reasonings, and then
deriving a final answer through voting as given in Table 7.
Table 6. Zero-shot vs Zero-shot CoT: Kojima et al (2022) LLM Prompt instruction with reasoning “Let’s think step by step”
Generate three short and crisp copies for an advertisement promoting a new eco-friendly product line.
Thought 1: Highlight the product's environmental benefits and sustainability features.
Prompt Thought 2: Appeal to consumers' desire to make a positive impact on the planet.
Thought 3: Emphasize the high quality and durability of the products.
Thought 4: Contrast the eco-friendly line with conventional, wasteful alternatives.
LLM Response [LLM expands on the thought tree, evaluating and refining the potential ad copy ideas.]
Self-consistency is compute intensive as compared to Tao et al (2023) [4], and Long et al (2023) [5] research
CoT. Practically, it elicits 5-10 smaller number of reasoning indicate that Tree-of-thoughts elicits explorations over
paths, and often responses saturate. Another drawback of “thoughts” which paves way for general problem solving via
self-consistency is that if the CoT itself is not eliciting error LLMs. In table 8, we added four thoughts as an input to the
prone reasoning, then self-consistency will not show any prompt, and it will elicit more human like problem solving
improvement. Further model tuning and training will be a technique to generate responses.
better option in that case.
5.3. Retrieval-Augmented Generation (RAG) Prompt
5.2. Tree-of-Thoughts Prompt
While LLMs possess remarkable knowledge acquired
Tree-of-thoughts (ToT) prompting is a strategy that during pretraining, there may be instances where additional
guides LLMs to generate, evaluate, expand on, and decide information or knowledge is required to generate accurate
among multiple potential solutions. Similar to how humans and relevant responses. Retrieval-augmented generation
approach problem-solving, ToT encourages the exploration (RAG) is a technique that allows LLMs to incorporate
of "thoughts" that serve as intermediate steps toward solving knowledge from external sources, such as documents,
a problem. The LLM generates these thoughts via chain-of- databases, or APIs, into their generated outputs.
thought prompting, and a tree-branching technique (e.g., The RAG approach involves three main steps (Figure 4):
breadth-first or depth-first search) is used to systematically (1) retrieving relevant data from an external source, (2)
explore and evaluate the potential solutions, enabling look- augmenting the context of the prompt with the retrieved
ahead and backtracking as given in Figure 3. data, and (3) generating a response based on the prompt
and retrieved data using the LLM. As demonstrated by
Lewis et al (2020) [7], long documents are split into chunks
to fit within the LLM's context window, and vectorized
representations of these chunks are stored in a vector
database. The query is converted into a vector, and similar
chunks are retrieved from the database to augment the
prompt. The RAG prompts can be thought of a dynamic
prompt which can be used for repeatable tasks (e.g., extract
data from the large documents - Table 9), or answer questions
(e.g., chatbot assistant).
The RAG prompts also have its limitation, and the chunking,
and retrieval process does not consider context, and the
search is semantic in nature. The key is to use the relevant
information to augment the prompt, and get more accurate
Figure 3. Tree of thoughts (ToT) Prompt tree branch technique response. However, RAG is not free from hallucinations but
Software Engineering 2024, 11(2): 19-24 23
prevents it to some extent by using relevant information aims to reduce the human effort involved in prompt engineering
retrieved from the supplied documents. RAG can often get by automating various aspects of the process, such as prompt
expensive, as the document size or indexed data size becomes generation, prompt optimization, and prompt evaluation.
too large, and costly to store. APE techniques can involve methods like least-to-most
prompting, where the model is encouraged to generate its
own prompts by starting with a minimal prompt and
iteratively adding more information until the desired output
is achieved. Another approach is reasoning without observation
(ReWOO), which separates the reasoning process from
external observations, allowing the model to optimize its
token consumption and potentially improve performance on
tasks that require breaking down problems into subproblems.
Table 10. APE Prompt techniques – An example
6. Conclusions
Figure 4. RAG prompt implementation using embeddings Prompt engineering has emerged as a critical area of study
and application in the field of generative AI, enabling IT
Table 9. Dynamic RAG Prompt – An Example
professionals and researchers to unlock the full potential
What are the key features of Tesla Cybertruck, and of large language models (LLMs) for a wide range of
how does its self-driving control work? natural language processing tasks. This is a comprehensive
[System retrieves relevant information dynamically overview of various prompt engineering techniques, ranging
from a document] from basic methods like zero-shot, one-shot, and few-shot
Retrieved Context: "The Tesla Cybertruck is an
prompting, to advanced strategies such as chain-of-thought
electric pickup truck with a stainless-steel body that
was first introduced as a concept in 2019. It's known prompting, self-consistency, tree-of-thoughts prompting,
for its controversial design and is considered to be in retrieval-augmented generation, and automated prompt
Prompt a category of its own in the automotive industry. engineering.
(Dynamic) Key features: Through illustrative examples and theoretical insights,
Design: It has a flat, stainless-steel body. we have demonstrated how these techniques can enhance
Range: It has an EPA estimated range of 250–340 the performance and reliability of LLM deployments across
miles, but an external battery pack can extend the
range to over 470 miles.
diverse real-world use cases, such as text generation,
Towing capacity: It has a maximum towing capacity
summarization, question-answering, and decision support
of 11,000 pounds. systems. By carefully crafting prompts and leveraging the
Payload: It has a payload capacity of up to 2,500 appropriate prompt engineering techniques, IT professionals
pounds. [+ more text] can effectively guide LLMs towards desired outputs,
[LLM generates a response summarizing the key enabling more robust and efficient generative AI solutions.
LLM
Response
features and self-driving control based on the prompt While prompt engineering holds immense promise for
and retrieved context] unlocking the potential of large language models, it is a
discipline fraught with challenges that must be navigated
5.4. Automated Prompt Engineering (APE) skillfully. As this study has demonstrated, achieving
As prompt engineering becomes more complex, there consistent and reliable results through prompting requires
is a growing need to automate the process of crafting and a delicate balance of specificity and flexibility, as well as
validating prompts. Automated prompt engineering (APE) a deep understanding of the unique strengths and constraints
24 Syed Aamir Aarfi and Nashrah Ahmed: Prompt Engineering for Generative AI: Practical Techniques and Applications
of these powerful models. Even seemingly simple tasks By confronting these limitations and charting a course
can demand extensive experimentation and trial-and-error, for future research, the prompt engineering community can
highlighting the need for diverse skill sets and multidisciplinary unlock the full transformative potential of large language
knowledge in this field. models, driving meaningful advancements across a wide
Looking ahead, the demand for sophisticated prompt range of natural language processing applications. Through
engineering techniques will only intensify as large language sustained innovation and a commitment to responsible
models continue to advance and find application in development, prompt engineering can shape the future of
increasingly complex domains. Addressing issues related to how we interact with and leverage the power of artificial
hallucinations, biases, and safety will be paramount, as will intelligence.
the development of more automated, scalable, and adaptive
prompting methods. The prompt engineering community
must embrace these challenges head-on, fostering a spirit of
innovation that can harness the full potential of these models
in a responsible and impactful manner. REFERENCES
Future research should prioritize the development of [1] Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud,
robust and reliable prompting techniques capable of consistently S., & Liang, P. (2022). Chain-of-thought prompting elicits
producing accurate and contextually relevant outputs, even reasoning in large language models. arXiv preprint arXiv:
2201.11903.
for highly specialized tasks. Integrating reinforcement learning
algorithms to enable models to adapt their prompting [2] Kojima, T., Gu, S. S., Reid, M., IE, Y., Shah, R., Liang, P., &
strategies based on experience could be a promising avenue. Ieong, S. (2022). Large language models are zero-shot
Additionally, the creation of automated prompt generation reasoners. arXiv preprint arXiv:2205.11916.
frameworks tailored to specific output requirements could [3] Wang, Y., Gan, Z., Wang, J., Zhao, D., Liu, J., & Deng, Z.
streamline the engineering process and broaden accessibility. (2022). Self-consistency improves chain of thought reasoning
Crucially, mitigating hallucinations and biases in model in language models. arXiv preprint arXiv:2212.00398.
outputs must be a key focus area. Prompt engineering [4] Yao, S., Shuster, K., Wong, Y. S., & Weld, D. (2023). Tree of
strategies that incorporate external knowledge sources or Thoughts: Deliberate Problem Solving with Large Language
employ specialized validation prompts could help ensure the Models. arXiv preprint arXiv:2305.10601.
trustworthiness and reliability of these models, particularly
[5] Long, H. (2023). Large Language Model Guided Tree-of-
in high-stakes applications. Thought. arXiv preprint arXiv:2305.14952.
As the field of prompt engineering evolves, addressing
scalability and adaptability challenges will also be essential. [6] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,
Developing prompting methods that can seamlessly V., Goyal, N., & Hosseini, H. (2020). Retrieval-augmented
generation for knowledge-intensive nlp tasks. Advances in
accommodate changes in model architectures, training data, Neural Information Processing Systems, 33, 9459-9474.
and use cases will be critical to maintaining the relevance
and effectiveness of these techniques in the rapidly advancing [7] Shin, R., Lin, C. Y., Thomson, S., Chen, O., Kernion, C., Li,
landscape of generative AI. P., & Dagan, I. (2022). Automated prompt engineer for large
language models. arXiv preprint arXiv:2211.01910.