0% found this document useful (0 votes)

20 views30 pages

Week 14

The document discusses the intersection of Artificial Intelligence (AI) and Software Engineering (SE), focusing on AI4SE, which enhances software engineering tasks, and SE4AI, which uses software engineering techniques to build AI systems. It highlights various applications of AI in software development, including code generation, bug detection, and automated testing, as well as the workflow of integrating large language models (LLMs) into software engineering processes. Additionally, it presents specific examples of LLM applications in requirements classification, code generation, and software testing.

Uploaded by

aditya10prakashiit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views30 pages

Week 14

Uploaded by

aditya10prakashiit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

AI for Software Engineering

Jibesh Patra
Week 14 - CS20202 – Spring 2025 – IIT Kharagpur
Materials adapted from “A Survey on Large Language Models for Software Engineering - Zhang et. al.” and papers of respective authors.

Week 14 - Software Engineering – Spring 2025

Intersection of Artificial Intelligence (AI) and
Software Engineering (SE)

Week 14 - Software Engineering – Spring 2025 2

AI4SE vs SE4AI
● AI4SE: Using AI to enhance Software Engineering tasks which helps in
building robust and reliable systems.
● SE4AI: Using Software Engineering techniques to build robust AI systems.

This lecture: AI4SE

Week 14 - Software Engineering – Spring 2025 3

Popular Public Applications
● Code generation - GitHub Copilot
● Bug Detection - DeepCode
● Automated Testing - Testim
● Code Documentation - Swim (documentation + modernization)

Week 14 - Software Engineering – Spring 2025 4

Usages of AI in Software Development

Week 14 - Software Engineering – Spring 2025 5

Opinion of Developers – Do you use AI in Development Process?

Week 14 - Software Engineering – Spring 2025 6

Opinion of Developers – Part of Development Process AI used?

Week 14 - Software Engineering – Spring 2025 7

Opinion of Developers – How much do you Trust AI Output?

Week 14 - Software Engineering – Spring 2025 8

Large Language Models (LLMs) for Code

Papers over the years

A Survey on Large Language Models for Software Engineering - Zhang et. al.

Week 14 - Software Engineering – Spring 2025 9

Downstream Tasks

A Survey on Large Language Models for Software Engineering - Zhang et. al.

Week 14 - Software Engineering – Spring 2025 10

General Workflow of AI4SE Systems

Week 14 - Software Engineering – Spring 2025 11

Typical LLM4SE
1. Collect data
2. Pre-train
3. Fine-tune for speciﬁc tasks
4. Integrate into SE workﬂows

Week 14 - Software Engineering – Spring 2025 12

Data Collection
Data collection is the ﬁrst step of the system:

● Source code from open source code repositories such as GitHub, GitLab.
● Bug reports, Issues, Commits.
● Documentations, Code reviews.
● Example: The Stack Dataset
○ Uses GitHub archive to extract the dataset
○ Contains data from more than 350 programming languages
○ 6 TB of code data

Week 14 - Software Engineering – Spring 2025 13

Pre-training
● Trained on massive corpora of code and natural language obtained from
○ Public repositories such as GitHub, GitLab
○ Q&A websites frequently visited by developers such as Stack Overﬂow, Reddit
○ Documentations such as API docs
● Quality of the data is very important.
● Example training objectives:
○ Causal language modeling
○ Masked language modeling
○ Replaced token detection

Week 14 - Software Engineering – Spring 2025 14

Fine-tuning
Reﬁne the pre-trained language model for particular tasks by training it further
using speciﬁc datasets.

Example: CodeBERT, a language model is ﬁne-tuned for natural language code

search.

Week 14 - Software Engineering – Spring 2025 15

Integration Into SE Workflows
Integrate LLM for practical software engineering tasks.

Example: GitHub Copilot has been integrated as a plugin for VS Code and assists
developers with AI code completion, Natural language chats etc.

Week 14 - Software Engineering – Spring 2025 16

Software Requirement & Design

Week 14 - Software Engineering – Spring 2025 17

SpecGen
SpecGen: Automated Generation of Formal Program Speciﬁcations via Large
Language Models by Lezhi Ma et al. (2025)

● Program speciﬁcations encompass precise statements that describe the

intended or actual behaviors of a particular program.

Program and corresponding speciﬁcations generated by SpecGen.

Week 14 - Software Engineering – Spring 2025 18
SpecGen
● Motivation: Existing automated program specification generation rely on
manually created templates resulting in simple and trivial specifications.
● Approach:
a. Given input code and a prompt, LLM generates a specification and verified by verifier.
b. This is refined further using conversation.
c. For the the still failing generated specs, mutate the code:
■ Take locations of code where LLM generates wrong spec.
■ Mutate the locations with various mutation operators.
d. Check with the verifier and select the best specification.

Week 14 - Software Engineering – Spring 2025 19

Examples of LLM Applications
● Requirement classification: Categorization of software requirements into
different classes or types e.g., functional or non-functional.
○ NoRBERT by Hey et al. (2020) fine-tunes an existing LM called BERT. Existing automatic
classification performs poorly on unseen projects which is improved by the authors.
■ Evaluate on a dataset containing classes of requirement.
● Requirement ambiguity detection: Ambiguity (where the natural language
description may be interpreted in more than one way) in software
requirements may result in production of poor quality software.
○ TABASCO by Moharil et al. (2023) finds ambiguities using BERT to capture different meanings a
word can have depending on its context within a requirement.

Week 14 - Software Engineering – Spring 2025 20

Software Development

Week 14 - Software Engineering – Spring 2025 21

ARCHCODE
ARCHCODE: Incorporating Software Requirements in Code Generation with Large
Language Models by Han et al.
Motivation
● A natural language description include both functional and nonfunctional
requirements.
● This can lead to LLM generated code that is functionally correct but violates
certain requirements.

Week 14 - Software Engineering – Spring 2025 22

ARCHCODE
Approach

● Start with textual software requirements. Use LLMs to extrapolate

unexpressed or implicit requirements and structure it.
● Incorporates structured requirement in the prompt and generate code and
test cases.

Week 14 - Software Engineering – Spring 2025 23

Comparing Code and Test
generation between
ARCHCODE and existing
approaches.

● Existing approaches
directly generates code
from description.
● In comparison
ARCHCODE introduces
structure.

Week 14 - Software Engineering – Spring 2025 24

Examples of LLM Applications
● Code search: Given a natural language query, retrieve functionally relevant
code.
○ CodeRetriever by Li et al. (2022) performs code-text contrastive pre-training to learn
function-level code semantics. This aids in better code search.
● Code Translation: Translating code in one programming language to another.
○ TransMap by Wang et al. (2023) detects semantic mistakes in code translated by ChatGPT
using tests from source and translated program.
● Code Summarization: Use code as input and generates high-level natural
language summaries.
○ ESALE by Fang et al. (2024) uses multi-task learning (unidirectional language modeling,
masked language modeling, action word prediction) to improve code summarization.

Week 14 - Software Engineering – Spring 2025 25

Software Testing

Week 14 - Software Engineering – Spring 2025 26

Fuzz4All
Fuzz4All: Universal Fuzzing with Large Language Models by Xia et al. (2024)

Motivation

● Discover bugs using fuzzing.

● Traditional approach often target a speciﬁc language or features and can not
easily applied to other languages or features.

Week 14 - Software Engineering – Spring 2025 27

Fuzz4All
Approach

● Use LLMs for input generation and mutation engines.

● Since LLMs are trained on multiple programming languages, they are
applicable to multiple languages.

Week 14 - Software Engineering – Spring 2025 28

Examples of LLM Applications
● Fault localization: Identify speciﬁc locations in a software system where
faults or bugs are present.
○ TROBO by Zhu et al., (2021) performs cross-project knowledge transfer of bug report and code
ﬁle.
● Vulnerability detection: Identify potential security bugs in software systems.
○ VulLLM by Du et al., (2024) performs multi-task learning (vulnerability localization task,
vulnerability interpretation) with LLMs to detect vulnerabilities.
● Unit test generation: Creating a set of test cases for testing the adequacy of
software programs.
○ MuTAP by Dakhel et al., (2023) performs mutation testing to augment prompts to guide LLMs
in generating test cases that can detect bugs.

Week 14 - Software Engineering – Spring 2025 29

THE END

Week 14 - Software Engineering – Spring 2025 30

C Programming
100% (2)
C Programming
453 pages
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
No ratings yet
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
6 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
Tosem2hshzh024 5 PDF
No ratings yet
Tosem2hshzh024 5 PDF
79 pages
r19 Uml Lab Manual
No ratings yet
r19 Uml Lab Manual
35 pages
LLM's For Code Generation
No ratings yet
LLM's For Code Generation
31 pages
EV10003 SWM Aut2023
No ratings yet
EV10003 SWM Aut2023
71 pages
Week 13
No ratings yet
Week 13
89 pages
Solving Computer Forensic Case Using Autopsy: Scenario
No ratings yet
Solving Computer Forensic Case Using Autopsy: Scenario
26 pages
Large Language Models For Software Engineering - A Systematic Literature Review
No ratings yet
Large Language Models For Software Engineering - A Systematic Literature Review
79 pages
Pretraining and Evaluation CodeLLMs
No ratings yet
Pretraining and Evaluation CodeLLMs
71 pages
Lecture 2 - Matter, Energy & Ecosystems
No ratings yet
Lecture 2 - Matter, Energy & Ecosystems
55 pages
Challenges and Paths Towards AI For Software Engineering
No ratings yet
Challenges and Paths Towards AI For Software Engineering
76 pages
A Survey On Language Models For Code
No ratings yet
A Survey On Language Models For Code
125 pages
AIEngineering
No ratings yet
AIEngineering
25 pages
Generative AI Content
No ratings yet
Generative AI Content
7 pages
Large Language Models For Software Engineering: Survey and Open Problems
No ratings yet
Large Language Models For Software Engineering: Survey and Open Problems
23 pages
E1. Code Language Models
No ratings yet
E1. Code Language Models
40 pages
Screenshot 2024-01-13 at 16.31.04
No ratings yet
Screenshot 2024-01-13 at 16.31.04
56 pages
CSEN1031 2024 Course Plan
No ratings yet
CSEN1031 2024 Course Plan
8 pages
Large Language Models For Software Engineering
No ratings yet
Large Language Models For Software Engineering
79 pages
AKTI Gen A.I Course Outline
No ratings yet
AKTI Gen A.I Course Outline
4 pages
User Guide Exp - COPYLOT 2021 English
No ratings yet
User Guide Exp - COPYLOT 2021 English
48 pages
03-Towards An Understanding of Large Language
No ratings yet
03-Towards An Understanding of Large Language
41 pages
A Survey On Large Language Models For Code Generation: Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim
No ratings yet
A Survey On Large Language Models For Code Generation: Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim
49 pages
System Power Supplies 4-PPS-M
No ratings yet
System Power Supplies 4-PPS-M
4 pages
Natural Language Generation and Understanding of Big Code For AI-Assisted Programming A Review
No ratings yet
Natural Language Generation and Understanding of Big Code For AI-Assisted Programming A Review
23 pages
Leveraging Open Source LLMs For Software Engineering Education and Training
No ratings yet
Leveraging Open Source LLMs For Software Engineering Education and Training
10 pages
Logi Training Manual
No ratings yet
Logi Training Manual
310 pages
Using An LLM To Help With Code Understanding
No ratings yet
Using An LLM To Help With Code Understanding
13 pages
INFO 7375 & Prompt Engineering For Generative AI
No ratings yet
INFO 7375 & Prompt Engineering For Generative AI
7 pages
Automatic Programming: Large Language Models and Beyond
No ratings yet
Automatic Programming: Large Language Models and Beyond
33 pages
Case Study For Procurement
No ratings yet
Case Study For Procurement
62 pages
Se-Es 2024
No ratings yet
Se-Es 2024
16 pages
Non - Authoritative Applications - 1
No ratings yet
Non - Authoritative Applications - 1
33 pages
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
No ratings yet
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
10 pages
EV10003 C5 Climate Change Aut2023
No ratings yet
EV10003 C5 Climate Change Aut2023
54 pages
AI For TPMs EdgeUp Curriculum
No ratings yet
AI For TPMs EdgeUp Curriculum
12 pages
AI Roadmap Weekly Plan
No ratings yet
AI Roadmap Weekly Plan
3 pages
Using An LLM To Help With Code Understanding: Daye Nam Andrew Macvean Vincent Hellendoorn
No ratings yet
Using An LLM To Help With Code Understanding: Daye Nam Andrew Macvean Vincent Hellendoorn
13 pages
Fin Irjmets1715742677
No ratings yet
Fin Irjmets1715742677
6 pages
EV10003-Water Pollution-II-aut2023
No ratings yet
EV10003-Water Pollution-II-aut2023
47 pages
rfc430 Interbus
No ratings yet
rfc430 Interbus
128 pages
Flat Es 2022
No ratings yet
Flat Es 2022
7 pages
Lecture#1 - Causes of Modern Environmental Concerns
No ratings yet
Lecture#1 - Causes of Modern Environmental Concerns
43 pages
Android Building Blocks - Part 1
No ratings yet
Android Building Blocks - Part 1
14 pages
Latest Trends Sep 2023
No ratings yet
Latest Trends Sep 2023
6 pages
References of Research Paper 5
No ratings yet
References of Research Paper 5
2 pages
Assessing Large Language Models For Code Generation: A Comprehensive Framework
No ratings yet
Assessing Large Language Models For Code Generation: A Comprehensive Framework
6 pages
Technical ABAP Standards v1.1 (1446)
No ratings yet
Technical ABAP Standards v1.1 (1446)
63 pages
Software Engineereing LLM
No ratings yet
Software Engineereing LLM
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
EV10003-Water Pollution-I-aut2023
No ratings yet
EV10003-Water Pollution-I-aut2023
29 pages
Timetable
No ratings yet
Timetable
3 pages
HR Zaman Now
No ratings yet
HR Zaman Now
32 pages
DS 3WR15X - 1500Mbps Wi Fi 6 Router - 20230310
No ratings yet
DS 3WR15X - 1500Mbps Wi Fi 6 Router - 20230310
4 pages
Company Profile CUBE V2
No ratings yet
Company Profile CUBE V2
6 pages
Basic 3 Unit 2 English Vocabulary
No ratings yet
Basic 3 Unit 2 English Vocabulary
6 pages
Experiment 1.3: 1. Aim/Overview of The Practical: To Create An Application To Calculate Interest For FDS
No ratings yet
Experiment 1.3: 1. Aim/Overview of The Practical: To Create An Application To Calculate Interest For FDS
12 pages
AZ 400T00A ENU TrainerPrepGuide
No ratings yet
AZ 400T00A ENU TrainerPrepGuide
8 pages
RLC PDF
No ratings yet
RLC PDF
9 pages
Ads1293 Uc
No ratings yet
Ads1293 Uc
75 pages
Cookies and Token and Session Research
No ratings yet
Cookies and Token and Session Research
6 pages
Sourav Sadhu: Experience Skills
No ratings yet
Sourav Sadhu: Experience Skills
2 pages
Text To Web Application Using LLM
No ratings yet
Text To Web Application Using LLM
4 pages
BMS Keystrokes Defaults
No ratings yet
BMS Keystrokes Defaults
18 pages
Multitech MT9234ZBA Datasheet
No ratings yet
Multitech MT9234ZBA Datasheet
2 pages
Functional Specification: Project SAP Support
No ratings yet
Functional Specification: Project SAP Support
7 pages
Gradient Flow Trend 2023 Report Final
No ratings yet
Gradient Flow Trend 2023 Report Final
16 pages
October 2023 Menu
No ratings yet
October 2023 Menu
2 pages
Hydroponic System: Deep Water Culture System
No ratings yet
Hydroponic System: Deep Water Culture System
3 pages
Steam Turbine Control Solutions: Features
No ratings yet
Steam Turbine Control Solutions: Features
4 pages
ML DL Engineer Plan
No ratings yet
ML DL Engineer Plan
2 pages
John Dulemba 08292011
No ratings yet
John Dulemba 08292011
6 pages
Java JDBC PreparedStatement Example - HowToDoInJava
No ratings yet
Java JDBC PreparedStatement Example - HowToDoInJava
1 page
Ford 1977
No ratings yet
Ford 1977
16 pages
Ingles 3 Challenger 1 Marcela
No ratings yet
Ingles 3 Challenger 1 Marcela
3 pages
API Spec 7K - Clarification - 20090501
No ratings yet
API Spec 7K - Clarification - 20090501
1 page
ESD Discussion
No ratings yet
ESD Discussion
4 pages
Eleceng
No ratings yet
Eleceng
1 page
Microsoft MOC-10982
No ratings yet
Microsoft MOC-10982
7 pages
Neom Oxagon Local Control & Main PLC Panel Material Schedule 17-10-2022
No ratings yet
Neom Oxagon Local Control & Main PLC Panel Material Schedule 17-10-2022
1 page
Software Architecture Foundation: CPSA Foundation® Exam Preparation
From Everand
Software Architecture Foundation: CPSA Foundation® Exam Preparation
Alexander Lorz
No ratings yet
C# Functional Programming Made Easy: A Practical Guide with Examples
From Everand
C# Functional Programming Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Essentials for New Coders: A Practical Guide with Examples
From Everand
C# Essentials for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Software Architecture Foundation - 2nd edition: CPSA Foundation® Exam Preparation
From Everand
Software Architecture Foundation - 2nd edition: CPSA Foundation® Exam Preparation
Alexander Lorz
No ratings yet
Hands-On Visual Studio 2022: A developer's guide to new features and best practices with .NET 8 and VS 2022 for maximum productivity
From Everand
Hands-On Visual Studio 2022: A developer's guide to new features and best practices with .NET 8 and VS 2022 for maximum productivity
Hector Uriel Perez Rojas
No ratings yet
IGNOU PGDCA All in One Previous Years Unsolved Papers
From Everand
IGNOU PGDCA All in One Previous Years Unsolved Papers
Manish Soni
No ratings yet
Go Debugging from Scratch: A Practical Guide with Examples
From Everand
Go Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
C# OOP Step by Step: A Practical Guide with Examples
From Everand
C# OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Debugging from Scratch: A Practical Guide with Examples
From Everand
C# Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
From Everand
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
Gabriel Clemente
5/5 (1)
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
From Everand
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
From Everand
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
Manish Soni
No ratings yet
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
From Everand
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
DG. Junior
No ratings yet
Mastering the Craft: Unleashing the Art of Software Engineering
From Everand
Mastering the Craft: Unleashing the Art of Software Engineering
Kiran Nagesh
No ratings yet
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
From Everand
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 14

Uploaded by

Week 14

Uploaded by

AI for Software Engineering

Week 14 - Software Engineering – Spring 2025

Week 14 - Software Engineering – Spring 2025 2

This lecture: AI4SE

Week 14 - Software Engineering – Spring 2025 3

Week 14 - Software Engineering – Spring 2025 4

Week 14 - Software Engineering – Spring 2025 5

Week 14 - Software Engineering – Spring 2025 6

Week 14 - Software Engineering – Spring 2025 7

Week 14 - Software Engineering – Spring 2025 8

Papers over the years

Week 14 - Software Engineering – Spring 2025 9

Week 14 - Software Engineering – Spring 2025 10

Week 14 - Software Engineering – Spring 2025 11

Week 14 - Software Engineering – Spring 2025 12

Week 14 - Software Engineering – Spring 2025 13

Week 14 - Software Engineering – Spring 2025 14

Example: CodeBERT, a language model is ﬁne-tuned for natural language code

Week 14 - Software Engineering – Spring 2025 15

Week 14 - Software Engineering – Spring 2025 16

Week 14 - Software Engineering – Spring 2025 17

● Program speciﬁcations encompass precise statements that describe the

Program and corresponding speciﬁcations generated by SpecGen.

Week 14 - Software Engineering – Spring 2025 19

Week 14 - Software Engineering – Spring 2025 20

Week 14 - Software Engineering – Spring 2025 21

Week 14 - Software Engineering – Spring 2025 22

● Start with textual software requirements. Use LLMs to extrapolate

Week 14 - Software Engineering – Spring 2025 23

Week 14 - Software Engineering – Spring 2025 24

Week 14 - Software Engineering – Spring 2025 25

Week 14 - Software Engineering – Spring 2025 26

● Discover bugs using fuzzing.

Week 14 - Software Engineering – Spring 2025 27

● Use LLMs for input generation and mutation engines.

Week 14 - Software Engineering – Spring 2025 28

Week 14 - Software Engineering – Spring 2025 29

Week 14 - Software Engineering – Spring 2025 30

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.