Week 14
Week 14
Jibesh Patra
Week 14 - CS20202 – Spring 2025 – IIT Kharagpur
Materials adapted from “A Survey on Large Language Models for Software Engineering - Zhang et. al.” and papers of respective authors.
A Survey on Large Language Models for Software Engineering - Zhang et. al.
A Survey on Large Language Models for Software Engineering - Zhang et. al.
● Source code from open source code repositories such as GitHub, GitLab.
● Bug reports, Issues, Commits.
● Documentations, Code reviews.
● Example: The Stack Dataset
○ Uses GitHub archive to extract the dataset
○ Contains data from more than 350 programming languages
○ 6 TB of code data
Example: GitHub Copilot has been integrated as a plugin for VS Code and assists
developers with AI code completion, Natural language chats etc.
● Existing approaches
directly generates code
from description.
● In comparison
ARCHCODE introduces
structure.
Motivation