2025ECSEEGenerativeAIinStudentSoftwareDevelopmentProjects
2025ECSEEGenerativeAIinStudentSoftwareDevelopmentProjects
net/publication/391566023
CITATIONS READS
0 144
3 authors:
Jannis Schopp
University of the Bundeswehr Munich
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Uwe M. Borghoff on 08 May 2025.
161
ECSEE 2025, June 02–04, 2025, Seeon, Germany Borghoff, Minas & Schopp
assess which tasks are better performed by engineers or by AI pro- these methods based on their ability to incorporate external knowl-
cesses, enabling more efficient collaboration, see also [26]. Waseem edge and adapt to new contexts. While fine-tuning enables LLMs to
et al. [34] show that ChatGPT significantly addresses skill gaps in deliver consistent and repeatable results, it struggles to incorporate
software development education, improving efficiency, accuracy, new information. In contrast, RAG can adapt almost instantly to
and collaboration. Code quality through AI-powered tools is dis- updated contexts. In addition, Ovadia et al. [25] demonstrate that
cussed in [20]. In another study, Bhandari et al. [2] discuss new RAG outperforms unsupervised fine-tuning in generating results
emerging areas of fusion between AI and software engineering. that integrate both existing and new knowledge.
Taken together, these studies suggest that integrating AI into soft- These developments also provide opportunities to use LLMs to
ware engineering projects has significant implications for learning improve code quality and support novice learners [16]. AI systems
and skill development. While AI tools can automate routine tasks such as Amazon’s CodeWhisperer6 offer real-time suggestions to
and increase productivity, they also require a shift in skill sets. prevent security vulnerabilities, while tools such as Replit’s Ghost-
This could lead to new insights into novel AI approaches involving Writer,7 Codeium,8 and Codota/Tabnine9 assist with real-time
human-in-the-loop to support SE tasks. code completion, reducing syntax errors. Other notable tools in-
The position paper by Daun and Brings [8] discusses Chat- clude mutable.ai,10 CodePal,11 and Sourcegraph’s Cody12 .
GPT’s ability to understand and generate human language and More and more programming tasks that traditionally required
thus provide personalized feedback to students. In this way, it can human expertise can now be performed by AI systems, either au-
be used as a tutor to support the education of software engineers. tonomously or with human input. Raman and Kumar [28] analyze
In [4], Borghoff, Minas, and Mönch demonstrate the potential of the Coding & Testing phase using a six-step framework. They evalu-
automated tools for program evaluation, grading, and code gener- ate the impact of these automated systems at each stage, including
ation in software development courses. These AI-driven systems GitHub’s Copilot [21], and conclude that future education should
offer valuable support by streamlining assessment processes and prioritize code comprehension over mere coding skills [15]. Moroz
enhancing the learning experience. et al. [22] conclude that even when using Copilot, finding a specific
Systems for automatically assessing student programs, known as solution is not always the main goal, but finding strategies that
autograders [1], have a long history and can be broadly divided into provide solution instances is equally important.
several categories. In test-based assessment systems, instructors But critics and bad experiences have also been reported. For
define test cases for assignments, and students’ programs are exe- example, Choudhuri et al. [6] state in their study that there were
cuted against these cases. Programs are considered correct if they no statistical differences in the productivity or self-efficacy of the
pass all tests, with early systems dating back to the 1960s, see [10] participants when using ChatGPT compared to conventional re-
for a review. sources, but instead they found a significantly higher level of frus-
A contemporary example is ArTEMiS1 [18], which provides an tration. They name pitfalls such as limited advice on niche topics,
online code editor with interactive tutorials that provide immediate incomplete assistance, or hallucination, among others. Pudari and
and consistent feedback in large classes. Current platforms such Ernst [27] show that while syntax support for the programming lan-
as Praktomat2 are web-based, support languages such as Java, guage is well on the way to providing useful AI support, the more
and use JUnit for test case implementation. Newer methods also abstract problems, such as language idioms and complex design
incorporate machine learning techniques, for example, [33]. rules, are still far from being solved by AI.
Chrysafiadi et al. [7] introduce a fuzzy-based personalized assess-
ment approach that develops customized test cases for each student 3 The Programming Project
based on their knowledge level, programming experience, common For most of our students, the programming project is their first
error patterns, and item difficulty. This adaptability is a key focus of software development project, where they work in a team of about
recent research [14]. Praktomat not only tests student programs seven students and realize a complete computer game in Java. The
against JUnit cases, but also integrates plagiarism detection tools syllabus for this course is worth 9 credits, which corresponds to a
such as JPlag3 and Checkstyle4 to ensure coding standards are workload of about 270 hours in the period from October to Decem-
met. ber.
With deep learning and Large Language Models (LLMs) capable We have offered and improved this course over many years (see
of generating code from large code bases, new challenges arise. Sect. 3.3); the following paragraphs describe its organization in the
Tools such as DeepMind’s AlphaCode [19], Google’s PaLM,5 Mi- last term, from October to December 2024.
crosoft’s CodeBERT [11], and OpenAI’s ChatGPT and Codex [12]
can generate reasonable code from natural language descriptions. 3.1 Basic Organization
Gao et al. [13] present techniques to improve model performance
Each team was assigned a tutor and a supervisor. The tutors were
and usability by comparing Retrieval-Augmented Generation (RAG)
students who had taken the course the previous year, and they
for LLMs with fine-tuning and prompt engineering. They evaluate
6 https://aws.amazon.com/de/codewhisperer/
7 https://blog.replit.com/ai
1 https://github.com/ls1intum/ArTEMiS 8 https://codeium.com/
2 https://www.waxmann.com/automatisiertebewertung/ 9 https://www.tabnine.com/blog/codota-is-now-tabnine/
3 https://github.com/jplag/JPlag 10 https://mutable.ai/
4 https://checkstyle.org/ 11 https://codepal.ai/
5 https://ai.google/discover/palm2/ 12 https://about.sourcegraph.com/cody
162
Generative AI in Software Engineering Education: A User Study of Student Adoption and Applications ECSEE 2025, June 02–04, 2025, Seeon, Germany
assisted the team members with the usual problems encountered in the team’s progress and artifacts. As a result, each team’s individual
a software development project. The supervisors were members of phases could take longer than planned, causing teams to fall behind
the academic staff who provided higher-level support to the teams, schedule. This was the case for some teams (especially in the Design
monitored and evaluated the teams’ progress, approved artifacts, phase), but all were able to catch up by the end of the term and
and controlled when the teams were allowed to proceed to the next participate in the public game presentation with an executable game.
phase and when the project was successfully completed.
Prior to the course, a list of twelve game proposals was compiled, 3.2 Project Phases
each of which was sketched on a page by the tutors. All games The phases, as shown in Figure 1, were as follows. The first phase
had to be multiplayer 3D games with network support, based on was the Preparation phase (2 weeks), during which each team mem-
popular board and card games (e.g., Monopoly, Ludo, Risk), and ber worked on training tasks and practiced programming tech-
about equally difficult to implement. Students enrolled in the course niques. This phase was independent of the specific games that the
were asked to form seven teams and prioritize a list of three games teams would implement in the later phases. Instead, a realization
from this list of all games before the course began. Based on these of the simple Battleship game was available to all teams from the
priorities, games were assigned to these teams so that each team start. Like the games to be realized later, it is a multiplayer 3d game,
realized a different game. playable over the network, and implemented on top of the freely
The course then began with the kick-off meeting, where all stu- available 3d game engine jMonkeyEngine. Battleship was designed
dents, supervisors, and tutors met in the lecture hall to discuss to demonstrate the architecture and various concepts of 3d games,
organizational issues. In the following weeks, each team worked and also served as a starting point and blueprint for the teams’ own
independently on their projects, following the traditional waterfall game development. The task of the Preparation phase was therefore
process model as described below (Sect. 3.2). This process model to familiarize the students with Battleship and its structure and to
was deliberately chosen for three reasons. First, it is simple and extend it.
clearly structured, and it is the first software development project in The following three phases follow the waterfall process model.
a team for most of our students. As a result, they are unfamiliar with During the Analysis phase (2 weeks), the teams created a product
the different phases and artifacts involved in software development. requirement specification based on the one-page sketch of their
Second, the course lasts only 11 weeks, which is very little time to game. The teams had to produce artifacts such as an extended game
learn and try out a more modern, agile process model. Finally, and description as prose text, a data model described by a class diagram,
most importantly, the more rigid waterfall setup makes it easier to a list of all use cases and their descriptions based on a use case
measure the students’ use of AI in concrete parts of the project. template, a description of acceptance tests as a table of test cases,
At the end of the term, the teams presented their games to the and a first version of the user manual including sketches of the
other teams and invited guests, e.g. other students, in a so-called GUI.
public game presentation. This event also included a competition The software specification document was created by the teams
where each participant (including all guests) had the opportunity in the following Design phase (2 weeks) and consisted of a prose
to rate each game presented, and the winning team was awarded a description of the game specification, in particular the chosen ar-
prize. chitecture and the communication protocol of the required net-
The project schedule for each team was organized into four work protocol. Recommended diagrams for specifying details were
sequential phases as shown in Figure 1 and explained below. Each e.g. class diagrams, BPMN diagrams, sequence diagrams and state
team had a fixed weekly one-hour meeting with the supervisor and diagrams. System, integration and unit tests were specified in a
tutor, where team members presented their progress and discussed test case table. In the Coding & Testing phase (about 3 weeks), the
problems and solutions with the supervisor and tutor. Each phase games were finally implemented and tested according to the teams’
ended at a milestone, and teams were not allowed to proceed to software specification documents. Any deviations from these doc-
the next phase until they had successfully completed the previous uments were required to be documented, and the final version of
phase. It was the supervisor’s responsibility to evaluate and approve the user manual had to be made available.
163
ECSEE 2025, June 02–04, 2025, Seeon, Germany Borghoff, Minas & Schopp
A team is considered to have successfully completed their pro- the quality of the 2024 games has increased significantly compared
gramming course when their supervisor has approved their final to the two years with assignments, considering the number of
product and they have presented their project and game to the implemented functions and the graphical design of the games.
supervisor in a final team presentation. Upon meeting these criteria, There was also a growing trend of students adopting generative
all team members were awarded the course credits. AI tools in their own workflows. While not formally integrated into
the course curriculum, the use of such tools was encouraged during
this year’s course for specific tasks such as code documentation
3.3 Course Evolution and text proofreading.
The course of our programming project has evolved over the years,
but we have kept its basic organization with teams realizing games, 4 Study
using Java and the waterfall process model, and presenting all
A user study was conducted during the final presentations to inves-
games in the public game presentation. The latter has proven to be
tigate the extent and nature of AI tool usage. Understanding how
an essential aspect of the programming project for the following
students adopt and apply general-purpose AI tools, mostly large
reason: Participants do not receive a grade at the end of the course
language models, during the programming project can provide valu-
because it is difficult to evaluate the individual performance of the
able insights for developing dedicated AI tools to better support our
participants. Therefore, only successful participation is confirmed.
software engineering course. However, these tools are neither for-
Without the “reward” of a better grade, there is less motivation
mally integrated into the course curriculum nor openly discussed
to commit to the project beyond the minimum effort. The public
by students, highlighting the need for further investigation to assess
presentation of one’s own game and, above all, the competition
their usage and impact.
between the teams for the prize of the best, most beautiful and
Rather than advocating a specific approach to AI integration we
funniest game counteracts this. This is evidenced by the fact that
explore how students naturally incorporate AI tools into a software
all teams were able to finish their game by the public presentation
development project. We aim to establish a baseline understanding
date, even if they were behind schedule before because one of the
of AI adoption in order to determine what types of support, tutoring
phases took longer than planned. In addition, many students stated
or other AI-driven interventions may be needed in the future.
that the public presentation of their game was a highlight of the
A recent experiment by Rapaka et al. [30] examines two versions
course and was fun.
of an educational game. One version followed a traditional design,
All of the major evolutionary steps over the years have been in
while the other incorporated an intelligent, AI-based process. The
response to external events.
experiment demonstrates that immersive and AI technologies can
The first major evolutionary step was triggered by the Covid
serve as valuable tools in the development of educational games and
pandemic. The entire course and teamwork had to be done online.
entertainment applications. At the same time, previous research [6]
We had to respond to user feedback as described in [3]: Unlike
highlights that using AI in learning software engineering can sig-
before, all teams had to make the same game (a dungeon crawler
nificantly increase frustration levels.
in 2022 and a racing game in 2023), but with different themes. To
To address these questions, we conducted a user study during
prevent the teams from sharing their solutions, all participants
the October-December term of 2024 to capture a snapshot of AI
had to complete assignments that were evaluated by an automatic
adoption among the student body. The goal of the study was to de-
program assessment system.
termine which aspects of the course students engaged with using AI
With the end of the pandemic, we abandoned the assignments
tools, identify successful applications, and uncover any remaining
and had different games realized. This decision was also influenced
challenges.
by the fact that the teams spent a lot of time solving the assignments,
which meant that the actual project work and the quality of the
games suffered. 4.1 Demographics
In the evaluation of the last term, a large number of students The study included 38 participants, representing approximately 78%
confirmed that they preferred to return to the programming project of the 49 students enrolled in the course. As shown in Table 1, most
without assignments and that it was more fun that way. In fact, of the participants were Computer Science students, followed by
164
Generative AI in Software Engineering Education: A User Study of Student Adoption and Applications ECSEE 2025, June 02–04, 2025, Seeon, Germany
Business Informatics and Mathematical Engineering students. The Examples: ChatGPT, Copilot, Claude, Gemini, LLaMA,
gender and academic program distribution of the study participants or other (open-ended response).
closely mirrored that of the overall course population. (c) What tasks have you worked on using AI tools?
Of the 49 students, 43 were in the middle of their bachelor’s Participants could choose from predefined tasks or provide
program, while 6 were nearing its completion. At the time of the their own description in an open-ended response.
study, all participants had completed the required undergraduate (d) How helpful was AI in this phase?
courses in computer science and object-oriented programming. Scale: 1 (not helpful) to 5 (very helpful).
(e) How much time did the AI save you during this
4.2 Methodology phase?
Data for the study were collected using a structured questionnaire Options: None, Less than 1 hour, 1—3 hours, 4—6 hours,
distributed during the final team presentation of the programming 7—9 hours, 10+ hours.
project. This timing allowed participants to reflect on their experi- (f) Did you face challenges when using AI tools?
ences during all phases of the project. Options: Yes, No. If “Yes”, participants could specify chal-
Participation in the study was voluntary, and students were lenges such as unclear results, difficulty integrating tools,
informed of their right to withdraw at any time. Anonymity of and other (open-ended response).
data was ensured, and participants were given the opportunity to Responses on the helpfulness were recorded using a five-point
request that their data be deleted. The study adhered to general data Likert scale ranging from “not helpful” to “very helpful”. Ease of use
protection regulations to ensure the ethical handling and protection was addressed by exploring the challenges of integrating AI tools
of personal data. into the workflow. Open-ended questions invited participants to
The questionnaire was structured to comprehensively evaluate describe barriers to usability, such as unclear results. This provided
the adoption and application of AI tools during the programming qualitative insights into the effort required to effectively integrate
project, following the Technology Acceptance Model [9]. This frame- the tools into students’ workflows.
work emphasizes factors that influence technology acceptance, such To assess the overall usefulness of these tools, participants pro-
as perceived usefulness and ease of use. vided a holistic rating of their experience throughout the project.
The survey also explored participants’ behavioral intentions, in-
4.3 Questionnaire cluding their openness to using AI tools in future projects.
We invited all attending students to participate in the study, re- (3) Behavioral intentions and final reflections
gardless of whether or not they used AI tools. Including students (a) How do you rate the overall usefulness of AI tools
who did not use AI was essential, as their insights provide valuable in the programming project?
context for building an authentic picture of AI adoption. For those Scale: 1 (not helpful) to 5 (very helpful).
who did not use AI, we asked them to share their reasons for not (b) Would you use AI tools in future programming pro-
using AI, providing perspective on barriers to adoption or other jects?
influencing factors. Scale: 1 (very unlikely) to 5 (very likely).
(1) General question (c) Could you have achieved similar results without AI?
Did you use AI tools during your programming pro- Options: Yes (without extra effort), Yes (with more effort),
ject? No (AI was essential).
Options: Yes, No. If “No”, participants provided reasons such (d) What new or additional AI tools or features would
as lack of knowledge about available tools, no access to you find helpful?
suitable tools, privacy or trust concerns, and other (open- Open-ended response for suggestions and improvements.
ended response). (e) Do you have any additional comments or feedback
Perceived usefulness was incorporated into the study by asking regarding the use of AI in your project?
students to rate how AI tools supported their tasks throughout the Open-ended response for qualitative feedback.
programming project phases, which included Preparation, Analysis,
Design, and Coding & Testing. The questionnaire captured several 5 Results
dimensions that are helpful in understanding the adoption and use Of the 38 participants in this study, 34 used AI tools during the
of generative AI tools. Participants were asked to describe the tasks programming project, resulting in an adoption rate of 89.4%. The
they completed using AI tools and to identify the platforms they remaining four participants all reported that they did not use AI
used. Questions focused on the effectiveness of AI in Debugging, tools, citing either a lack of need or personal reservations. In addi-
Documentation, and Code Generation, gauging how well these tools tion, one participant stated that they were unaware of the AI tools
contributed to student success and efficiency in specific project available.
contexts. As shown in Figure 2, the use of AI tools varied significantly
(2) Repeated questions for each project phase across the four phases of the programming project. During the
(a) Did you use AI tools during this phase? Preparation phase, 22 participants used AI tools, while 16 did not.
Options: Yes, No. In the Analysis phase, the use of AI dropped slightly, with 13 partic-
(b) What specific AI platforms did you use during this ipants using the tools and 25 not using them. The Design phase saw
phase? the lowest adoption rate, with only 10 participants using AI tools,
165
ECSEE 2025, June 02–04, 2025, Seeon, Germany Borghoff, Minas & Schopp
16
were tasked with implementing additional features into the existing
25 25
28 project. Of these participants, three noted issues with “inaccurate re-
20 sults” that required significant corrections to the AI-suggested code
33 snippets. This issue highlights a broader limitation of AI tools in
15
22
producing reliable code when the underlying logic or requirements
10
are highly specific.
13
5 10 Similarly, code improvement (n=3) emerged as a recurring theme,
0 where students used AI to incorporate feedback from their supervi-
Preparation Analysis Design Coding & Testing sors into their solutions. However, two participants reported frus-
AI used AI not used tration with generic or irrelevant suggestions that could not be
integrated into the existing code.
The use of AI for documentation (n=3) was also notable, con-
Figure 2: AI tool adoption across the project phases.
sistent with staff recommendations to use AI to create and refine
code documentation. Although documentation tasks were relatively
straightforward, one participant reported challenges with “inaccu-
compared to 28 who abstained. Conversely, the Coding & Testing rate results”. Debugging (n=2) was surprisingly underrepresented,
phase had the highest adoption rate, with 33 participants using AI despite being an introductory task for this phase, where students
tools and only 5 opting out. Interestingly, ChatGPT was used by were asked to identify bugs in the provided code base. Both partici-
all students who used AI tools, regardless of the phase. The next pants who used AI for debugging noted difficulties stemming from
most commonly used tool was GitHub’s Copilot, especially in the the AI’s inability to understand the larger project scope, resulting
coding phase. Other tools were rarely used. Below we take a closer in ineffective or misleading suggestions. Finally, one participant
look at the role of AI in each phase. mentioned image generation (n=1), although it remains unclear
Figure 3 illustrates the perceived helpfulness of AI tools across what specific artifact was produced.
the four project phases using a Likert scale visualization. During These findings highlight the multiple roles that AI tools played
both the Preparation and Coding & Testing phases, a significant pro- during the Preparation phase, but also reveal significant limita-
portion of participants rated AI tools as “helpful” or “very helpful”. tions in their contextual understanding and output reliability when
In contrast, the Design phase saw not only the lowest adoption rates, working with unfamiliar code. The recurring issues of inaccuracy,
but also the most critical ratings, with “neutral” and “not helpful” lack of context, and overly generic suggestions show that while AI
responses dominating. This is likely due to the limitations of most tools can improve productivity, they require careful oversight and
AI tools in generating syntactically correct diagrams, such as class integration into specific workflows.
or sequence diagrams, which are heavily used in this phase.
Overall, the students showed a willingness to incorporate AI
into their workflows and actively attempted to complete the project 5.2 Analysis Phase
tasks using AI tools. However, the nature of the tasks and the results During the Analysis phase, 13 participants reported using AI tools,
varied depending on the phase of the project. with 11 providing specific tasks that were categorized into five
different groups. None of the responses had to be discarded due to
5.1 Preparation Phase vague or uninterpretable responses.
During the Preparation phase, 19 responses were collected about Requirements specification (n=8) was the most frequently re-
the tasks that participants completed using AI tools. Of these, 18 ported task. In this task, students created their own version of the
responses were categorized into specific task groups, while one requirements document, outlining how the game to be developed
response was too vague to be included in the analysis. The catego- should work. We believe that AI helped with these tasks because
rized responses highlighted different ways in which students used text generation is one of its strengths, and many AI systems are
AI tools in their work, demonstrating both the diversity of their familiar with popular games that students adapted into software
applications and the challenges they faced. during the project. Since no complaints were reported, this seems
The most commonly reported task (n=9) was comprehension. to have been a successful use of AI tools to help formulate precise
Students relied on AI tools to understand an existing code base and comprehensive requirements. This was followed by use case
that formed the basis of their later project. This included work- generation (n=4), where AI tools helped create structured lists that
ing with the jMonkeyEngine, design patterns, and architectural outlined the use case’s ID, priority, and key details such as purpose,
choices, which required the use of AI to explain code snippets and actors, and conditions. This repetitive and systematic task benefited
clarify technical concepts. However, comprehension tasks were from AI by streamlining the process. However, one complaint was
also frequently associated with challenges. Of the nine partici- that the results were unreliable and required manual corrections.
pants who used AI for comprehension, six reported encountering Documentation tasks (n=4) focused primarily on creating user
“inaccurate results” and two mentioned “missing context”. These manuals. These required, among other things, detailed descriptions
problems occurred when the AI did not relate to the existing code of the game’s components, controls, and gameplay elements. As this
166
Generative AI in Software Engineering Education: A User Study of Student Adoption and Applications ECSEE 2025, June 02–04, 2025, Seeon, Germany
Preparation
Analysis
Design
-100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100%
Not helpful Slightly helpful Neutral Helpful Very helpful
is also a text-heavy task, it benefited from the same strengths of the Evaluating and comparing design approaches (n=4) was another
AI as those observed in the requirements specification mentioned key task category, which involved evaluating different design strate-
above. There were no complaints. gies to select effective solutions, such as thick client and thin client
Diagram generation (n=2) in this phase attempts to generate models in a network-based architecture. One participant reported
class diagrams that are simple in design and complexity, as they pri- the challenge of designing a prompt with the right amount of con-
marily outline the preliminary classes and structure of the project. text to elicit a sufficient response.
However, both mentions were accompanied by complaints that Optimizing design prototypes (n=3) was also mentioned, where
the diagrams were not correct or detailed enough. The AI’s dif- students asked AI tools to improve their ideas. Participants did not
ficulty in generating meaningful diagrams was quickly apparent report any problems with this task.
to the students. Finally, GUI mockups (n=2) were used to design Finally, creating images (n=1) was mentioned because one stu-
and present the user interface at this early stage, collaborating dent reportedly continued to improve GUI elements. No problems
with the “customer” on how the program would look. The strength were reported with this task.
of AI in generating images definitely played a role here, allowing While most of the reported tasks were completed without re-
students to visualize and iterate on their designs. Despite the low ported problems, the results must be considered in light of the
number of mentions, several groups used AI-generated images in relatively low adoption of AI tools. Only 10 of the 38 students
their projects, which was evident in the final presentations. who participated in the study reported using them. Furthermore,
While only about one-third of participants reported using AI in as shown in Figure 3, participants rated the helpfulness of AI at
this phase, the overall helpfulness was rated positively and few prob- this stage the lowest, which may be due to the high demand for
lems were reported. Text-heavy tasks such as requirements spec- diagram-based artifacts.
ification and documentation clearly benefited from the strengths
of AI tools. Challenges such as lack of detail and unreliable out- 5.4 Coding & Testing Phase
put underscore the importance of critically evaluating and refining This phase saw the highest adoption of AI tools, with 33 of 38
AI-generated output to align with project requirements. participants reporting their use. All 33 participants identified at
least one task where AI provided assistance. These were grouped
5.3 Design Phase into 6 tasks. However, problems were prevalent, with 27 participants
Of the 10 participants who reported using AI tools, nine provided reporting difficulties and 24 citing one or more challenges in their
specific tasks that were categorized into four groups. Again, no feedback.
response had to be discarded due to an uninterpretable response. The tasks of Code Generation, Code Improvement, and Code Un-
The most commonly reported task was creating diagrams and derstanding, which were grouped together due to common themes
charts (n=5), which in this phase included flowcharts, class, package, and frequent concurrent mentions (n=28), were widely used by
sequence, and state diagrams. These artifacts are meant to visualize participants. Because this phase was primarily focused on imple-
the structural and behavioral aspect of the software, and serve menting the designs from the previous phase, students reported a
as a transition from conceptual work to concrete implementation. variety of challenges. The most commonly reported challenge, cited
However, and as reported in the previous phase, this may be related by 10 participants, was incorrect code. The generated code often did
to the fact that two participants reported insufficient diagrams not integrate with the existing code base, either because of logical
and one student specifically mentioned the lack of content in the errors or because existing classes and methods were overlooked or
generated diagram. misinterpreted.
167
ECSEE 2025, June 02–04, 2025, Seeon, Germany Borghoff, Minas & Schopp
One participant specifically mentioned encountering “hallucina- improve productivity when clear prompts and well-defined work-
tions” where the AI generated incorrect code that relied on frame- flows were used. These findings underscore that while AI tools
work methods that did not exist. Six participants noted difficulties can significantly assist the Coding & Testing phase, their effective
because some AI models were not trained on domain-specific frame- use requires both user expertise and improvements in tool design,
works or libraries, such as jMonkeyEngine or Lemur, resulting in particularly for specialized contexts.
output that was incompatible with their projects. Three participants
reported that AI tools struggled to handle the complexity of their 5.5 Final Observations
projects because they could not provide the entire code base or
In addition to analyzing specific project phases, participants were
large numbers of classes to the AI tool.
asked if they believed they could have achieved the same results
This resulted in output that conflicted with existing code and
without the use of AI tools. Seven participants indicated that they
did not integrate seamlessly. Finally, two participants highlighted
could have achieved similar results without AI, while 24 indicated
problems with prompting, attributing their struggles to an inability
that they could have done so, but with significantly more effort.
to effectively communicate their requirements to the AI. Similar
Notably, three participants reported that AI was critical to their
challenges were observed across tasks, particularly in code under-
success. These responses underscore the significant role AI played
standing and improvement efforts, as the AI performed better with
in assisting participants during the project. While some were confi-
standard Java but often failed with specialized frameworks.
dent that they could achieve similar results without AI, the majority
Documentation tasks (n=24) were also among the most fre-
acknowledged the significant time and effort saved by integrating
quently reported, with participants using AI tools to create or
these tools into their workflows, as shown in Figure 4. A smaller
refine their project documentation. At this stage, documentation
but notable group felt that AI was essential.
primarily involved the creation of student-written Javadocs for
Interestingly, as shown in Figure 5, the overwhelmingly positive
the classes. Encouraged by the staff, many students found the AI to
perception of the role of AI contrasts with the problems reported
be particularly effective in analyzing the methods and generating
throughout the study. Self-reported time savings varied by project
comprehensive and contextual documentation. Only one partici-
phase, with the highest savings in the Coding & Testing phase and
pant reported encountering an erroneous AI-generated document,
the lowest in the Design phase. However, participants frequently
indicating that the tools performed well in this context.
encountered challenges such as inaccurate or incomplete output, dif-
Debugging tasks (n=20) were closely related to code generation,
ficulties integrating AI-generated code with domain-specific frame-
as both required AI tools to work within existing code bases and
works such as jMonkeyEngine, and the inability of AI tools to
project frameworks. Five participants reported that the AI struggled
handle complex project contexts.
with unfamiliar frameworks or libraries, such as jMonkeyEngine,
Despite these difficulties, respondents were generally positive
echoing similar issues observed with code generation. Three partic-
about AI. The majority recognized the potential for AI to improve
ipants reported incorrect output during debugging, including one
productivity, particularly for repetitive or structured tasks such
instance where the AI “hallucinated” by generating nonsensical
as documentation and test generation. The high likelihood that
code that was incompatible with the context. Other cases involved
respondents will use AI tools again in future projects underscores
solutions that did not address the actual problem or conflicted with
their willingness to integrate AI into their workflows despite its
the existing implementation, complicating integration efforts. One
limitations.
participant also noted difficulties with prompting, particularly in
framing complex code problems or generating contextually relevant
methods. 6 Conclusions
Test generation (n=7) was another area where AI tools were In this paper we present a user study and its results. The back-
used, specifically to test the model created within the Model-View- ground of the user study was the programming project, a software
Controller architecture. While generally effective, some problems development project aimed at integrating AI tools into the curricu-
were reported. One participant noted that poorly defined tasks for lum. The participants of the last run of the programming project
the AI resulted in outputs that did not meet the intended goals. were asked to use freely available AI tools (such as ChatGPT and
Another participant observed that in some cases the AI produced GitHub’s Copilot) in their project. In the user study at the end
nonsensical tests. Finally, general questions (n=1) and music gener- of the project, they were asked about their experiences using a
ation (n=1) were reported, but no problems with these tasks were structured questionnaire.
documented. The results of the study were ambivalent: the vast majority of
While the Coding & Testing phase saw the highest adoption participants found the AI tools in the project useful and wanted to
of AI tools among participants, it also revealed significant chal- continue using them. However, they were considered useful only
lenges, particularly with tasks such as code generation, debugging, for text-intensive tasks, such as documentation and coding, but
and improving existing implementations. The tools proved valu- with significant drawbacks when coding, as the code generated by
able for automating repetitive tasks and generating initial designs, the AI tools often could not be integrated into the existing code.
but recurring problems such as incorrect output, difficulties with Experiences with AI tools were mostly negative when it came to
domain-specific frameworks such as jMonkeyEngine, and limi- diagrams, such as class or sequence diagrams. Despite these limita-
tations in handling complex code bases underscored the need for tions, we observed a strong willingness to adopt AI, suggesting that
careful oversight. Despite these challenges, tasks such as documen- students will continue to rely on such tools regardless of potential
tation and test generation demonstrated the potential of AI tools to risks and frustrations. This raises concerns about AI dependency,
168
Generative AI in Software Engineering Education: A User Study of Student Adoption and Applications ECSEE 2025, June 02–04, 2025, Seeon, Germany
10
9
8
7
Overall usefulness
of AI in this project
Likelihood of
using AI again
-100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100%
Strongly disagree Disagree Neutral Agree Strongly agree
such as skill degradation or inhibition of basic skill development, motivate students through structured thinking and reflection. Since
and highlights the need for careful integration and monitoring. freely available AI tools have shown weaknesses in code generation,
The project has maintained a similar workload and structure at least in terms of integration into existing code, we will continue
across iterations, and previous classes also saw little or no use of to pursue the approach based on RAG [13, 25]. Our goal is to make a
AI tools. This suggests that AI adoption was driven by its increased context-aware AI tutor a more attractive alternative to the AI tools
availability and encouragement in the course rather than by project used by the students in the study. For the time being, we will focus
demands. It may be worth considering that the adoption rate of AI on more text-heavy aspects, as diagrams are not yet well supported.
tools in this study was influenced by the general encouragement of This is particularly unfortunate because diagrams are often used
AI use early in the course. However, since the focus of the study is in planning activities in software development. By designing AI’s
not on adoption rates, but rather on how students integrate AI tools role as a tutor, we aim to more effectively develop students’ skills
into the programming project, our main takeaway is to understand and competencies, while preventing them from relying on AI tools
the patterns of AI use. for definitive answers. Our findings reflect the experiences of the
We have drawn the following lessons from these results. As students who participated in this specific programming project.
we have learned from previous years, students use AI tools even Future studies should look at a wider range of software engineering
in a project course like our programming project. Therefore, we projects and courses to improve generalizability.
need to include such tools in the curriculum of the programming
project and adapt the training accordingly. In order to avoid that
solutions generated by AI tools are adopted without reflection and Acknowledgments
to increase learning success, we plan to pursue the approach of AI- We would like to thank our best students for their entertaining
based tutoring in the coming months. Instead of providing direct and humorous games, which we enjoy testing at the Institute’s
and sometimes incorrect answers, AI tools should support and traditional Christmas party.
169
ECSEE 2025, June 02–04, 2025, Seeon, Germany Borghoff, Minas & Schopp
The authors wish to acknowledge the use of DeepL Translator, Scharlau, Roger McDermott, Arnold Pears, and Mihaela Sabin (Eds.). ACM, 27–52.
DeepL Write and Grammarly in the writing of this paper. These doi:10.1145/3344429.3372501
[16] Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David
tools were used to improve the language in the paper and to trans- Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on
late some parts of the paper from the original German language. Of Supporting Novice Learners in Introductory Programming. In Proceedings of the
2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg,
course, the paper remains an accurate representation of the authors’ Germany, April 23-28, 2023, Albrecht Schmidt et al. (Eds.). ACM, 455:1–455:23.
underlying work and novel intellectual contributions. doi:10.1145/3544548.3580919
[17] Peter Kokol. 2024. The Use of AI in Software Engineering: A Synthetic Knowledge
Synthesis of the Recent Research Literature. Inf. 15, 6 (2024), 354. doi:10.3390/
References INFO15060354
[1] Kirsti Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for [18] Stephan Krusche and Andreas Seitz. 2018. ArTEMiS: An Automatic Assessment
Programming Assignments. Comput. Sci. Educ. 15, 2 (2005), 83–102. doi:10.1080/ Management System for Interactive Learning. In Proceedings of the 49th ACM
08993400500150747 Technical Symposium on Computer Science Education, SIGCSE 2018, Baltimore,
[2] Kirti Bhandari, Kuldeep Kumar, and Amrit Lal Sangal. 2023. Artificial Intelligence MD, USA, February 21-24, 2018, Tiffany Barnes, Daniel D. Garcia, Elizabeth K.
in Software Engineering: Perspectives and Challenges. In 2023 Third International Hawthorne, and Manuel A. Pérez-Quiñones (Eds.). ACM, 284–289. doi:10.1145/
Conference on Secure Cyber Computing and Communication (ICSCCC). 133–137. 3159450.3159602
doi:10.1109/ICSCCC58608.2023.10176436 [19] Yujia Li et al. 2022. Competition-level code generation with AlphaCode. Science
[3] Uwe M. Borghoff, Mark Minas, and Kim Mönch. 2023. Using Automatic Pro- 378, 6624 (2022), 1092–1097. doi:10.1126/science.abq1158
gram Assessment in a Software Development Project Course. In Proceedings [20] Boris Martinovic and Robert Rozic. 2025. Perceived Impact of AI-Based Tooling
of the 5th European Conference on Software Engineering Education, ECSEE 2023, on Software Development Code Quality. SN Comput. Sci. 6, 1 (2025), 63. doi:10.
Seeon/Bavaria, Germany, June 19-21, 2023, Jürgen Mottok (Ed.). ACM, 22–30. 1007/S42979-024-03608-4
doi:10.1145/3593663.3593669 [21] Antonio Mastropaolo et al. 2023. On the Robustness of Code Generation Tech-
[4] Uwe M. Borghoff, Mark Minas, and Kim Mönch. 2024. Automatic Program niques: An Empirical Study on GitHub Copilot. In 45th International Conference on
Assessment, Grading and Code Generation: Possible AI-Support in a Software Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE/ACM,
Development Course. In Artificial Intelligence and Soft Computing - 23rd Interna- 2149–2160. doi:10.1109/ICSE48619.2023.00181
tional Conference, ICAISC 2024, Zakopane, Poland, June 16-20, 2024, Proceedings, [22] Ekaterina A. Moroz, Vladimir O. Grizkevich, and Igor M. Novozhilov. 2022. The
Part III (Lecture Notes in Artificial Intelligence 15166), L. Rutkowski et al. (Eds.). Potential of Artificial Intelligence as a Method of Software Developer’s Produc-
Springer, 39–51. doi:10.1007/978-3-031-81596-6_4 tivity Improvement. In Proc. 2022 Conf. of Russian Young Researchers in Electrical
[5] Anita D. Carleton, Davide Falessi, Hongyu Zhang, and Xin Xia. 2024. Generative and Electronic Engineering (ElConRus). IEEE, 386–390.
AI: Redefining the Future of Software Engineering. IEEE Softw. 41, 6 (2024), 34–37. [23] Nathalia Nascimento, Paulo Alencar, and Donald Cowan. 2023. Comparing
doi:10.1109/MS.2024.3441889 Software Developers with ChatGPT: An Empirical Investigation. arXiv abs-
[6] Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, and Anita 2305.11837 (2023). doi:10.48550/arXiv.2305.11837
Sarma. 2024. How Far Are We? The Triumphs and Trials of Generative AI in [24] Christian Nitzl, Achim Cyran, Sascha Krstanovic, and Uwe M. Borghoff. 2024.
Learning Software Engineering. In Proceedings of the IEEE/ACM 46th International The Use of Artificial Intelligence in Military Intelligence: An Experimental Inves-
Conference on Software Engineering (Lisbon, Portugal). 184:1–184:13. doi:10.1145/ tigation of Added Value in the Analysis Process. arXiv abs-2412.03610 (2024),
3597503.3639201 1–28. doi:10.48550/arXiv.2412.03610
[7] Konstantina Chrysafiadi, Maria Virvou, and George A. Tsihrintzis. 2022. A [25] Oded Ovadia, Menachem Brief, Moshik Mishaeli, and Oren Elisha. 2024. Fine-
fuzzy-based mechanism for automatic personalized assessment in an e-learning Tuning or Retrieval? Comparing Knowledge Injection in LLMs. arXiv abs-
system for computer programming. Intell. Decis. Technol. 16, 4 (2022), 699–714. 2312.05934 (2024). doi:10.48550/arXiv.2312.05934
doi:10.3233/IDT-220227 [26] Alfonso Piscitelli, Gennaro Costagliola, Mattia De Rosa, and Vittorio Fuccella.
[8] Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software 2024. Influence of Large Language Models on Programming Assignments - A user
Engineering Education. In Proceedings of the 2023 Conference on Innovation and study. In Proceedings of the 16th International Conference on Education Technology
Technology in Computer Science Education V. 1, ITiCSE 2023, Turku, Finland, July and Computers, ICETC 2024, Porto, Portugal, September 18-21, 2024. ACM, 33–38.
7-12, 2023, Mikko-Jussi Laakso, Mattia Monga, Simon, and Judithe Sheard (Eds.). doi:10.1145/3702163.3702168
ACM, 110–116. doi:10.1145/3587102.3588815 [27] Rohith Pudari and Neil A. Ernst. 2023. From Copilot to Pilot: Towards AI Sup-
[9] Fred Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Ac- ported Software Development. arXiv abs-2303.04142 (2023). doi:10.48550/arXiv.
ceptance of Information Technology. MIS Quarterly 13, 3 (1989), 319–340. 2303.04142
doi:10.2307/249008 [28] Arun Raman and Viraj Kumar. 2022. Programming Pedagogy and Assessment
[10] Christopher Douce, David Livingstone, and James Orwell. 2005. Automatic test- in the Era of AI/ML: A Position Paper. In COMPUTE 2022, Jaipur, India, Novem-
based assessment of programming: A review. ACM J. Educ. Resour. Comput. 5, 3 ber 9-11, 2022, Venkatesh Choppella, Amey Karkare, Chitra Babu, and Sridhar
(2005), 4 pages. doi:10.1145/1163405.1163409 Chimalakonda (Eds.). ACM, 29–34. doi:10.1145/3561833.3561843
[11] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, [29] Nitin Rane, Saurabh Choudhary, and Jayesh Rane. 2023. Education 4.0 and 5.0:
Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: Integrating artificial intelligence (AI) for personalized and adaptive learning.
A Pre-Trained Model for Programming and Natural Languages. In Findings of Available at SSRN 4638365 (2023). doi:10.2139/ssrn.4638365
the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 [30] Anuj Rapaka, S. C. Dharmadhikari, Kishori Kasat, Chinnem Rama Mohan,
November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Kuldeep Chouhan, and Manu Gupta. 2025. Revolutionizing learning - A journey
Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. doi:10. into educational games with immersive and AI technologies. Entertain. Comput.
18653/v1/2020.findings-emnlp.139 52 (2025), 100809. doi:10.1016/J.ENTCOM.2024.100809
[12] James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and [31] Daniel Russo. 2024. Navigating the Complexity of Generative AI Adoption in
James Prather. 2022. The Robots Are Coming: Exploring the Implications of Ope- Software Engineering. ACM Trans. Softw. Eng. Methodol. 33, 5 (2024), 135:1–135:50.
nAI Codex on Introductory Programming. In ACE ’22: Australasian Computing doi:10.1145/3652154
Education Conference, Virtual Event, Australia, February 14 - 18, 2022, Judy Sheard [32] Cigdem Sengul, Rumyana Neykova, and Giuseppe Destefanis. 2024. Software
and Paul Denny (Eds.). ACM, 10–19. doi:10.1145/3511861.3511863 engineering education in the era of conversational AI: current trends and future
[13] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, directions. Frontiers Artif. Intell. 7 (2024). doi:10.3389/FRAI.2024.1436350
Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Gen- [33] Shashank Srikant and Varun Aggarwal. 2013. Automatic Grading of Computer
eration for Large Language Models: A Survey. arXiv abs-2312.10997 (2024). Programs: A Machine Learning Approach. In 2013 12th International Conference
doi:10.48550/arXiv.2312.10997 on Machine Learning and Applications, Miami, FL, USA, Vol. 1. 85–92. doi:10.1109/
[14] Carlos G. Hidalgo-Suarez, Víctor A. Bucheli, and Hugo Ordoñez. 2022. Automatic ICMLA.2013.22
Assessment of Learning Outcomes as a New Paradigm in Teaching a Program- [34] Muhammad Waseem, Teerath Das, Aakash Ahmad, Peng Liang, Mahdi Fahmideh,
ming Course: Engineering in Society 5.0. Rev. Iberoam. de Tecnol. del Aprendiz. and Tommi Mikkonen. 2024. ChatGPT as a Software Development Bot: A Project-
17, 4 (2022), 379–385. doi:10.1109/RITA.2022.3217193 Based Study. In Proceedings of the 19th International Conference on Evaluation
[15] Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin I. Cutts, Rodrigo Duran, of Novel Approaches to Software Engineering, ENASE 2024, Angers, France, April
Mirela Gutica, Birte Heinemann, Eileen T. Kraemer, Violetta Lonati, Claudio 28-29, 2024, Hermann Kaindl, Mike Mannion, and Leszek A. Maciaszek (Eds.).
Mirolo, and Renske Weeda. 2019. Fostering Program Comprehension in Novice SCITEPRESS, 406–413. doi:10.5220/0012631600003687
Programmers - Learning Activities and Learning Trajectories. In Proceedings of [35] Yanming Yang, Xin Xia, David Lo, and John C. Grundy. 2022. A Survey on
the Working Group Reports on Innovation and Technology in Computer Science Deep Learning for Software Engineering. ACM Comput. Surv. 54, 10s (2022),
Education, ITiCSE-WGR 2019, Aberdeen, Scotland Uk, July 15-17, 2019, Bruce 206:1–206:73. doi:10.1145/3505243
170