Generative AI For Software Practitioners
Generative AI For Software Practitioners
Generative AI for
Software Practitioners
Christof Ebert and Panos Louridas
30 useI Elimited
Authorized licensed E E S to:
O FDr.
T WD.
A Y. | PEducational
R EPatil U B L I S H E Complex
D BY TH E I E EDownloaded
Akurdi. E COMPUT onE July
R SO 05:44:55 UTC from IEEE Xplore.
C I E T Yat
23,2024 0 7 4 0 - 7Restrictions
4 5 9 / 2 3 © 2 0apply.
23IEEE
SOFTWARE TECHNOLOGY
Generative AI Technologies To provide meaningful answers, gen- can be used to guide the AI in gener-
Generative AI has been around for erative AI undergoes further train- ating contextually relevant and well-
many years. With no means to prove ing based on human feedback. Many informed responses.
validity, researchers hesitated to bring human trainers pose questions and Underlying all this, generative AI
such technology to the mass market provide feedback on the generated an- is powered by large language models
of rather naïve data citizens. As we swers, rewarding good answers and (LLMs). As the name again suggests,
have observed many times in recent punishing unsatisfying ones. This kind these are large neural network models
IT history, the perceived gold rush
makes people close their eyes to ob-
vious risks. Even tools designed for
good will eventually have devastating The basic idea is to use a large
consequences. When ChatGPT was
finally released to a wide public audi- language corpus to train a neural
ence in 2022, the AI arms race started network to learn the language, by
at a speed never seen before. It took
just two months for ChatGPT to reach
hiding part of the text and asking the
100 million users. Figure 1 shows this network to guess the missing parts.
fast evolution for different technolo-
gies spanning a mere 100 years of
recent human history. A technology
like the wheel even took thousands of of reinforcement learning guides the that are trained on big language cor-
years to reach 100 million users. system toward providing more accu- pora. Technically, they have a trans-
For every developer, turning to rate answers, while guarding against former architecture, which is based on
StackOverflow or Google has been a harmful responses. This has led to a mechanism called attention. The pub-
natural part of the job for years now. glimpses of a new way of working, lication of the attention mechanism, by
Condemning the “not invented here where the focus is on “prompt engi- Google researchers, must now rank
syndrome” to the dustbin, our first neering”: find the most appropriate among the most influential papers in
reaction when in doubt how to code way to frame a question or a whole computer science.1 Two early LLMs
something has been to look it up on dialogue. Generative AI does not work were the Bidirectional Encoder Repre-
the Internet. Search engines have be- with individual question and answers: sentations from Transformers (BERT),
come better at indexing code reposito- it maintains a context window, which developed by Google in 2018,2 and
ries, myriads of which exist online, and
community advice sites, such as Stack-
Time to Reach 100 Million Users
Overflow, provide reasoned solutions
and valuable commentary on user ChatGPT 2
questions. What is common in search
engines and question-and-answer web- TikTok 9
sites is that you can look up informa-
WhatsApp 40
tion that has already been stored there.
Generative AI is different. As the Internet 80
name suggests, it can synthesize—or
generate—the answers to the ques- Mobile Phone 190
tions you pose. Instead of trawling a
Car 400
prefabricated answer as classic search
engines are doing, it will create an Telephone 900
answer for you. The answer is based
on vast amounts of data on which 0 100 200 300 400 500 600 700 800 900 1,000
it has been trained, such as those ar- FIGURE 1. Time to reach 100 million users for different technologies in months after
chived and indexed by search engines. initial deployment.
J U05:44:55
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at LY/A U G UST
UTC 2 3 |Xplore.
2 0IEEE
from I E E E Restrictions
S O F T WA R E
apply. 31
SOFTWARE TECHNOLOGY
Generative Pretrained Transformer 1 word as its embedding. Once you can through an encoder, which is a series of
(GPT-1), developed by OpenAI, who represent the words in the appropriate attention mechanisms. Attention mech-
went on to develop subsequent GPT way, you can use these representations anisms are an algorithm used in LLMs
models, getting to GPT-4 today.3 to generate new material by trans- that enables the AI to focus on specific
The basic idea is to use a large lan- forming the representations to new parts of the input text when generating
guage corpus to train a neural net- words: for instance, the answer to a an output. The output of the encoder
work to learn the language, by hiding query. When we talk about language is a vector representation of the input,
part of the text and asking the net- and words, we are not restricted to which is produced by analyzing sur-
work to guess the missing parts.4 The human language: it can be computer rounding context and attentions. You
neural network does that by paying code, having code tokens instead of can think of the encoder’s output as the
selective attention to the words com- words; the idea is the same. meaning of the input, as understood
prising the surrounding context of the You can see a simplified depiction by the neural network. The meaning is
missing parts. The words themselves of the internals of an LLM in Figure 2. a vector, corresponding to a point in a
are represented as vectors, called em- The model follows the transformer ar- multidimensional space.
beddings, in a multidimensional space. chitecture.4 The inputs, which are lan- Once we have the encoded input,
In essence, the neural network learns guage or code tokens, are represented we need to transform it to the desired
and represents the meaning of each as vector embeddings. Then they go output; that is, take it from a vector
representation and transform it back
to a language or code token. To do this
Probabilities we feed it to another series of attention
mechanisms, the decoder. The output
of the decoder are candidate tokens,
which are then assigned probabilities.
Calculate The most probable token is the final
Probabilities
output. These probabilities result from
training the entire transformer model,
including both the encoder and decoder,
Encoder Decoder with vast amounts of text. ChatGPT is
Output Output said to be trained with the “entire inter-
net”. The training process is referred to
as self-supervised learning (or masked
language modeling), which is achieved
Attention Attention
by hiding some parts of known text and
checking the quality of how it is auto-
matically completed. In this way the de-
Decoder
Encoder
32 useI Elimited
Authorized licensed E E Sto:
O FDr.
TW D.AY.
RE | Educational
Patil W W W. C O Complex
M P U T E Akurdi.
R . O R GDownloaded
/ S O F T W Aon | @
R EJuly I E E E Sat
23,2024 O05:44:55
F T W A RUTC
E from IEEE Xplore. Restrictions apply.
SOFTWARE TECHNOLOGY
mentioned above, a vital part of gen- GPT-4 can generate code from doc- can respond to particular questions and
erative AI models is that once they strings and solve coding questions in act on them.
can produce sensible outputs from a software engineering interviews on a Code completion, at a line or whole
prompt, they can be further trained par with or surpassing human perfor- function level, is offered by Tabnine,
by having humans provide feedback mance. It can program for the front- which positions itself apart from the
to their output. The feedback is used
to fine-tune the model using rein-
forcement learning techniques.
ChatGPT https://chat.openai.com/ GPT-4 USD$20 per month for the chat Code completion, code generation,
interface; pricing for application code comprehension, reverse
programming interface use engineering, pseudo code reasoning
depends on usage. and execution.
CoPilot https://github.com/ OpenAI CodeX, USD$10 per month/USD$100 Code completion for CoPilot; CoPilot
features/copilot GPT-4 per year for individuals, $19 per X uses the more advanced GPT-4
user per month for business model and can answer questions
plans. based on code documentation; aid in
pull requests, shell commands, and
scripting.
Tabnine https://www.tabnine. Proprietary ML Basic tier is free, Pro tier starts Code completion. Runs also on private
com/ engine trained with at USD$12 month/user; also desktop to protect IPR.
OSS possible to self-host
Hugging Face https://huggingface.co/ Transformer, details Free and open source Code completion and code generation,
(various different docs/transformers/index vary depending on depending on the model.
models) the model
J U05:44:55
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at LY/A U G UST
UTC 2 3 |Xplore.
2 0IEEE
from I E E E Restrictions
S O F T WA R E
apply. 33
SOFTWARE TECHNOLOGY
34 useI Elimited
Authorized licensed E E Sto:
O FDr.
TW D.AY.
RE | Educational
Patil W W W. C O Complex
M P U T E Akurdi.
R . O R GDownloaded
/ S O F T W Aon | @
R EJuly I E E E Sat
23,2024 O05:44:55
F T W A RUTC
E from IEEE Xplore. Restrictions apply.
SOFTWARE TECHNOLOGY
A lot of trusted safety-critical into any other language, e.g., a the selection of appropriate test
software in domains, such as Python code into JavaScript. In cases. This can help develop-
power plants or defense, is de- many cases, the software not ers to write more accurate and
cades old. Many federal systems only points out possible quality efficient code, and to identify
are based on Cobol and other problems, it also provides you and fix bugs more quickly. New
antiquated languages. The chal- with concrete alternatives. challenges arise, such as how to
lenge is increasingly to find • Improving software quality: ensure that AI-based systems
people able to maintain such a Generative AI can analyze large would not be validated by AI
legacy. Generative AI tools in amounts of data and identify systems that are programmed
the future might explain how patterns that human develop- to overlook certain defects or
the code works and translate it ers might miss. An example is backdoors. Deep fake applies to
At Vector we are often called to improve the quality of software issues, such as ambiguous or conflicting requirements.
systems. One typical finding is insufficient requirements and It can also help to identify missing requirements or
test strategy. Some software is tested several times, while inconsistencies in the requirements documentation.
some requirements and scenarios remain untested, making •• Predictive analytics: AI can be used to analyze
increasingly complex software systems impossible to trust. historical data and predict the likelihood of certain
While software development is simple, identifying the requirements being implemented successfully. This
right requirements and test cases is a challenge for practically can help to identify potential risks and prioritize
all companies. This is a high risk, especially when automating requirements accordingly.
critical systems, such as autonomous vehicles, medical •• Testing automation: AI can be used to automate test-
devices, and finance systems. Policy makers demand trusted ing processes and ensure that each requirement is
AI, which demands a clear specification of intended functional- properly tested and verified. This can help to reduce
ity, border cases, and clear demarcation lines of what must not the time and effort required for manual testing and
happen. With today’s level of requirements engineering and improve overall testing accuracy.
test methodology, we are far away from trusted AI-systems.
In such cases, requirements traceability ensures that The return of generative AI in connecting require-
requirements are properly tracked, verified, and validated ments engineering and testing comes in different curren-
throughout the development cycle. It is perceived as highly cies, namely less effort for maintaining traceability, higher
necessary and demanded by all standards along functional product quality by test case updates even across hetero-
safety and cybersecurity. However, in practice traceability geneous tool chains, and better understanding of complex
is not maintained, and most software systems today are systems. The risks of generative AI remain as mentioned
insufficiently tested.S1 Here are some ways in which AI can in the article. Be aware of your intellectual property rights
help with requirements traceability: and never upload software to external platforms. Do not
take results of generative AI as sufficient to automate
•• Automated tagging and categorization: AI can be quality checks, because these tools are neither determin-
used to automatically tag and categorize require- istic nor explainable in their chain-of-thought.
ments based on their type, priority, and other
characteristics. This can help to ensure that require- Reference
ments are properly tracked and easily searchable. S1. C. Ebert, D. Bajaj, and M. Weyrich, “Testing of software sys-
•• Natural language processing: AI can be used to analyze tems,” IEEE Softw., vol. 39, no. 4, pp. 8–17, Jul./Aug. 2022,
natural language requirements and identify potential doi: 10.1109/MS.2022.3166755.
J U05:44:55
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at LY/A U G UST
UTC 2 3 |Xplore.
2 0IEEE
from I E E E Restrictions
S O F T WA R E
apply. 35
SOFTWARE TECHNOLOGY
software even more than only data developed by generative AI and what is completely made up for
pictures. around corner cases and critical misinformation. The answers and so-
• Improving data quality: An im- scenarios facilitate development, lutions are generated by models based
portant feature of using generative testing, approval, and homologa- on probabilities, not necessarily found
AI for software-related tasks is tion of automatic, robotic, and from some authoritative source. That
the ability to fine-tune an existing autonomous systems in critical means that they may be wrong. AI
model on specific data. LLMs industries, such as medical, aero- tools can hallucinate, responding in
are trained on open data that is space, mobility, and industrial a wildly erroneous manner while be-
trawled from the Internet, and production.5 ing supremely confident that they are
their answers are based on what right. Things are improving (GPT-4
they can learn from that data. It Hints for the Practitioner seems to be better than its predeces-
is possible, through appropriate While generative AI can help compa- sors), but the user should always check
APIs, to give to LLMs our own nies to grow competences, there are the answers. Relying on AI tools for
data. The typical steps are to several risks that need to be considered tasks where you cannot determine the
prepare our training data, then and mitigated.6 Technology companies correct answer or how to verify it can
upload them to the model and let and especially their venture-capitalist lead to complications and pitfalls. For
it train on that data, on top of its backers tend to just look for fast money software development, it means that
existing training. Then we used and repeat past mistakes, namely pri- human supervision and intervention is
the fine-tuned model, which will oritizing growth over safety. OpenAI’s necessary, such as reviews.
be able to provide more relevant 2020 predecessor to ChatGPT, for in- With statistically driven synthesis of
answers to our prompts, either stance, was known for “creative” out- results, generative AI does not have a
typed in or through API calls. puts, which were as easy to read and real understanding of language. More
• Achieving trust: While tra- use as Wikipedia entries but were in- dangerous is that it has no knowledge
ditional software is based on humane and racist.7 Google explicitly of the real world. The language model
predefined algorithms, current announced plans to release a premature produces its “facts” with nice-to-read
software is adaptive, self-chang- Bard, accepting the high risk it is will- text or code. But it is the user’s re-
ing, and learning. Such systems ing to take when releasing tools based sponsibility to verify these statements.
do not behave according to on AI technology. The lessons from so- Creativity and the capability to detect
initial specifications and might cial media should guide us in develop- defects are the most important charac-
even “unlearn” what they were ing AI. What is labeled social networks teristics that distinguish you and your
initially developed to do. It is not had over the past 10 years eroded true code from AI. Even if it is tempting to
meaningful to discuss the valida- social connections and trust between let an algorithm do as much work for
tion of AI systems without using people. With several hours per day on you as possible, you should always be
nearly realistic systems and con- these networks, mental-health and in- aware that it makes mistakes, which
texts. Testing nondeterministic telligence is declining at an alarming it admittedly packages very credibly.
systems is difficult with deter- pace. Societies in many countries are When users started posting bug fixes
ministic tests. With traditional deeply polarized due to “fake news,” generated with tools such as ChatGPT,
software testing, release criteria which is consumed without much think- StackOverflow banned such posts.
are based on comparing reactions ing about it. Tools such as ChatGPT How do they identify the fake con-
to a given series of inputs with could further replace professional in- tent? The same way a professor today
expected outputs. Simulations dependent media and spread fake news has to verify homework assignments,
of cyberphysical systems suffer that are neither traceable nor explain- namely by means of AI. It is an arms
from the enormous space of AI able. As developers we must get hands- race, and quite good tools are around
systems if not applied purpose- on and deal with such risks. to identify AI-generated documents
fully. For instance, autonomous The name generative AI cues the with statistical analyses.
vehicles need several hundred major pitfall. If humans rely on AI Practitioners should also be aware
million kilometers to statisti- for information, it will be increas- of privacy and security implications.
cally prove that they are suit- ingly difficult to tell what is factual, When using a tool that analyzes your
able for real traffic. Synthetic what is an exaggerating advertisement, code, you should be careful about
36 useI Elimited
Authorized licensed E E Sto:
O FDr.
TW D.AY.
RE | Educational
Patil W W W. C O Complex
M P U T E Akurdi.
R . O R GDownloaded
/ S O F T W Aon | @
R EJuly I E E E Sat
23,2024 O05:44:55
F T W A RUTC
E from IEEE Xplore. Restrictions apply.
SOFTWARE TECHNOLOGY
T
code which is hard to understand and els, forgetting about other challenges we
test. Verification and validation of AI have. A generative AI that copies such hese are exciting times, not
will grow in relevance. Software prac- single-minded behavior might conclude just for software engineers.
titioners need to enhance their compe- to just stop human life on earth to re- Some even argue that we may
tences on the right side of the “V” in duce climate change. be witnessing the first sparks of arti-
order to verify accuracy of underlying In a world of generative AI and ficial general intelligence8—or maybe
AI and resulting artifacts. low code, it’s hard to imagine a future not (yet).9 In any case, generative AI
Software engineering for and with where software engineers are as highly is here to stay, is likely to be a game
AI must start with assertions that cre- paid as today. Many traditional roles, changer, and change software engi-
ate boundaries of what is allowed and such as programmer, will change. With neering as well. As with any new tech-
what not. Like Isaac Asimov’s robotic the current evolution speed, we can ex- nology, it can be used, misused, and
rules, our society—and specifically pect that within the next three years abused. Elon Musk, who is known for
J U05:44:55
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at LY/A U G UST
UTC 2 3 |Xplore.
2 0IEEE
from I E E E Restrictions
S O F T WA R E
apply. 37
SOFTWARE TECHNOLOGY
much, but not stopping innovations, 3. A. Radford et al., “Improving lan- no. 3, pp. 20–28, May/Jun. 2023,
demands oversight for AI, having de- guage understanding by generative doi: 10.1109/MS.2023.3242179.
scribed the technology as “potentially pre-training,” OpenAI, San Fran- 7. A. R. Chow and B. Perrigo, “The AI
more dangerous than nukes.” As those cisco, CA, USA, Jun. 2018. [Online]. arms race is changing everything,”
who develop this technology, we as Available: https://cdn.openai.com/ Time, Feb. 2023. [Online]. Avail-
leading practitioners must safeguard research-covers/language-unsuper able: https://time.com/6255952/
and control AI. vised/language_understanding ai-impact-chatgpt-microsoft-google/
_paper.pdf 8. S. Bubeck et al., “Sparks of artificial
References 4. S. Wolfram, What is ChatGPT general intelligence: Early experi-
1. A. Vaswani et al., “Attention is all Doing… and Why Does it Work? ments with GPT-4,” 2023.
you need,” in Proc. 31st Conf. Adv. (2023). [Online]. Available: [Online]. Available: https://arxiv.org/
Neural Inf. Process. Syst. (NIPS), https://wolfr.am/SW-ChatGPT abs/2303.12712
2017, vol. 30, pp. 5998–6008. [Online]. 5. C. Ebert, D. Bajaj, and 9. R. Lim, “GPT-4 is amazing but still
Available: https://papers.nips.cc/paper/ M. Weyrich, “Testing of software struggles at high school math
7181-attention-is-all-you-need.pdf systems,” IEEE Softw., vol. 39, competitions.” Cantor’s Paradise.
2. J. Devlin et al., “BERT: Pre-training no. 4, pp. 8–17, Jul/Aug. 2022, Accessed: Apr. 1, 2023. [Online].
of deep bidirectional transformers doi: 10.1109/MS.2022.3166755. Available: https://russelllim22.
for language understanding,” 2018. 6. C. Ebert and U. Hemel, “Technology medium.com/gpt-4-is-amazing
[Online]. Available: https://arxiv.org/ trends 2023: The competence chal- -but-still-struggles-at-high-school
abs/1810.04805 lenge,” IEEE Softw., vol. 40, -math-competitions-cbc2e73738e
38 useI Elimited
Authorized licensed E E Sto:
O FDr.
TW D.AY.
RE | Educational
Patil W W W. C O Complex
M P U T E Akurdi.
R . O R GDownloaded
/ S O F T W Aon | @
R EJuly I E E E Sat
23,2024 O05:44:55
F T W A RUTC
E from IEEE Xplore. Restrictions apply.