Chatgpt - Physics Class
Chatgpt - Physics Class
Int J Educ Technol High Educ (2023) 20:63 International Journal of Educational
https://doi.org/10.1186/s41239-023-00434-1
Technology in Higher Education
*Correspondence:
luding@southalabama.edu Abstract
1
Department of Counselling The latest development of Generative Artificial Intelligence (GenAI), particularly Chat-
and Instructional Sciences, GPT, has drawn the attention of educational researchers and practitioners. We have
UCOM 3858, University of South witnessed many innovative uses of ChatGPT in STEM classrooms. However, studies
Alabama, Mobile, AL 36688, USA
2
Center for Emerging Media regarding students’ perceptions of ChatGPT as a virtual tutoring tool in STEM educa-
Design and Development, Ball tion are rare. The current study investigated undergraduate students’ perceptions
State University, Muncie, IN, USA of using ChatGPT in a physics class as an assistant tool for addressing physics questions.
3
Teacher Education and Learning
Sciences, North Carolina State Specifically, the study examined the accuracy of ChatGPT in answering physics ques-
University, Raleigh, NC, USA tions, the relationship between students’ ChatGPT trust levels and answer accuracy,
4
Department of Physics, and the influence of trust on students’ perceptions of ChatGPT. Our finding indicates
University of South Alabama,
Mobile, AL, USA that despite the inaccuracy of GenAI in question answering, most students trust its
ability to provide correct answers. Trust in GenAI is also associated with students’
perceptions of GenAI. In addition, this study sheds light on students’ misconceptions
toward GenAI and provides suggestions for future considerations in AI literacy teaching
and research.
Keywords: GenAI, ChatGPT, Perception, Misconception, Physics problems
Introduction
Generative Artificial Intelligence (GenAI) has overhauled the landscape of educational
practices. GenAI is a subclass of machine learning (ML) algorithms that can learn from
text, images, audio, and video to produce new content based on trained data (Kasneci
et al., 2023). Unlike other supervised algorithms, known as conditional models, GenAI
produces artifacts with a wide variety and complexity. GenAI increased its prominence
in the zeitgeist of the world in November of 2022 when OpenAI released the third major
version of their chatbot, Chapt GPT (GTP-3). The release shocked the world with its
capability to produce human-like text and conversations (Hu, 2023); in just two months
the platform gained 100 million users and generated a plethora of headlines worldwide.
GPT (Generative Pre-trained Transformer) models are trained using a large amount of
publicly available online digital content. The data used to train the GPT-3 model came
from various sources: the Common Crawl dataset, the expanded WebText dataset, two
internet-based books corpora, and English Wikipedia (Brown et al., 2020). Since Chat-
GPT models were trained based on a large corpus of text data with a complicated lan-
guage model (more than 175 billion parameters), ChatGPT can comprehend human
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate-
rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/.
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 2 of 18
language and respond to complex and varied prompts meanwhile maintaining contex-
tual coherence in conversations (OpenAI, 2022).
Due to its ability to perform a wide range of tasks, educators have suggested that Chat-
GPT can be used as a tool to support teaching across a wide range of subjects, includ-
ing programming (Sun et al., 2022), engineering (Qadir, 2023), journalism and media
(Pavlik, 2023), nursing education (O’Connor & ChatGPT, 2023), and business (Alshater,
2022). Beyond subject-specific applications, ChatGPT has been proposed to be able to
support teachers in creating syllabi and curricula, be used for flipped classrooms (Lo,
2023), and support adaptive learning, automated essay grading, and personalized tutor-
ing (Baidoo-Anu & Ansah, 2023). Despite its competence in supporting teaching and
learning, a literature review has revealed varying performance levels of ChatGPT across
different subjects. Notably, it has demonstrated outstanding performance in economics,
and satisfactory performance in programming, but falls short of expectations in math-
ematics (Lo, 2023).
Current knowledge about how students perceive GenAI and how it can be used for
teaching and learning remains limited. It is imperative to examine GenAI from the
student’s perspective to understand how and what pedagogical solutions are needed
to minimize the challenges that GenAI introduces while maximizing its potential for
teaching and learning. In this study, we particularly tested one GenAI—ChatGPT—in
an authentic physics class to understand student perceptions toward GenAI. We imple-
mented ChatGPT in the classroom by utilizing its tutoring assistant potential as sug-
gested by Baidoo-Anu and Ansah (2023). In STEM education, an instructor usually
needs to teach a substantial number of students with various levels of proficiency and
understanding (Karabenick, 2003). Consequently, students’ questions are more likely to
be left unresolved leading to increased confusion. In this study, we are particularly inter-
ested in investigating how students perceive ChatGPT as a virtual tutor. To address this
inquiry, we aim to answer the following research questions:
Background
AI in education
AI has been widely used in education for various purposes prior to the emergence of
ChatGPT. Many intelligent tutor systems have been developed to monitor students’
learning processes, analyze their performance, and provide immediate personalized
instructions and feedback. Some dialogue-based Intelligent tutor systems, such as Auto-
Tutor, not only track students’ knowledge status and engage them in adaptive conver-
sations but also detect and respond to students’ emotional states, such as confusion
or frustration (Graesser, 2016). In addition to intelligent tutor systems, AI has played
other roles in helping students learn. For instance, following the learning-by-teaching
paradigm, AI has been used to simulate virtual teachable agents or tutees, enabling stu-
dents to act as “tutors” to enhance their learning by teaching the AI agent, as exemplified
by the SimStudent chatbot (Matsuda et al., 2013). AI has also been employed to assist
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 3 of 18
teachers in grading and assessing students’ homework. For instance, automated writing
evaluation (AWE) systems, such as the Writing Pal (W-Pal), were developed to alleviate
teachers’ workload by helping them evaluate essays and generate feedback (McNamara
et al., 2013).
The latest breakthrough in AI, particularly ChatGPT, represents a critical advance-
ment of GenAI. Unlike previous conversational AI chatbots, ChatGPT is built on deep
learning and can learn and generate human-like responses (Sahoo et al., 2023). What
makes it more distinguishable from other GenAI is its unique ability to not only pro-
vide a response but also generate related content based on subsequent questions and
prompts derived from its initial responses (Sun et al., 2022). It has been used as a writing
assisting tool to aid students, especially English as a second language (ESL) students, in
receiving feedback on their writing (Su et al., 2023). ESL instructors also use ChatGPT as
a proofreading tool to help students improve the grammatical accuracy of their writing
(Schmidt-Fajlik, 2023). Beyond language support, ChatGPT has served as a pedagogi-
cal tool to foster students’ critical thinking skills in a physics class (Bitzenbauer, 2023).
Accordingly, many discussions have been held on its potential to transform traditional
teaching practices (e.g., Adiguzel et al., 2023; Baidoo-Anu & Owusu Ansah, 2023).
Despite the widespread integration of AI in education, studies have shown that stu-
dents hold mixed attitudes toward AI. On one hand, students acknowledge the benefits
of AI and believe it can be used as a learning tool to provide personalized and immedi-
ate learning support (Chan & Hu, 2023). They also recognize the potential impact of AI
on their disciplines and future careers (Bisdas et al., 2021; Buabbas et al., 2023). On the
other hand, research has revealed students’ concerns about AI, including its accuracy
and transparency (Chan & Hu, 2023), ethical considerations (Gillissen et al., 2022), and
the potential for job displacement (Gong et al., 2019). Understanding students’ percep-
tions of AI in the educational context is essential for the effective integration of AI tools
and technologies in STEM education.
Anthropomorphism
Despite the potential weaknesses, flaws, and biases of AI, students may encounter chal-
lenges in recognizing these issues and could overly rely on or blindly trust AI for impor-
tant decisions or interactions. Even worse, students may hold misconceptions about AI,
such as the belief that “AI is infallible and can be 100% objective” (Bewersdorff et al.,
2023, p.9), or the notion that “AI is a human mind but in a machine” (Mertala et al., 2022,
p.6).
One factor contributing to these misconceptions is anthropomorphism, which
involves attributing human-like characteristics to AI, such as feelings, mental states, or
behavioral characteristics (Airenti, 2015; Epley et al., 2007). On one hand, anthropomor-
phism increases students’ perceived social connection with AI and their willingness to
adopt AI technology (Cao et al., 2022). On the other hand, anthropomorphism may mis-
lead students into believing that AI systems are both trustworthy and capable of per-
forming any task.
Particularly, “Warmth” and “Competence” are two perceived anthropomorphic fea-
tures of AI that may cause this misconception. “Warmth” refers to the “perceived friend-
liness, helpfulness, sincerity, trustworthiness, and morality” (Pizzi et al, 2023, p.1375)
of AI systems. A kind and caring AI system can increase emotional trust in AI (Aaker
et al., 2012). Due to perceived “Warmth”, people are more likely to establish and main-
tain effective connections with the AI system (Gonzalez-Jiminez, 2018). “Competence”
refers to the perceived problem-solving abilities of AI. A highly intelligent or compe-
tent AI system increases people’s rational trust and makes them believe that AI can truly
help them achieve their goals. It is worth noting that both “Warmth” and “Competence”
influence people’s trust in AI as well as their skepticism toward AI (Pizzi et al., 2023).
These anthropomorphic features may lead to the misconception of a “super AI”—an AI
that possesses human consciousness and can automatically solve problems in any area
(Kaplan & Haenlein, 2019).
In this study, we explore students’ perceptions of ChatGPT used as a virtual tutor in a
physics class and particularly their perception and trust in using ChatGPT for answer-
ing their questions. Understanding these dynamics can better assist researchers and
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 5 of 18
Method
Participants
The study took place in an introductory college-level physics class at a public university
located in the south of the United States. A total of 40 students enrolled in the class
agreed to participate in this study, of which 36 were self-identified as female (90%), 3
were male (8%), and 1 was non-binary (3%). The majority of participants were Cauca-
sian/White (n = 29, 73%) followed by African American/Black (n = 9, 23%) with 1 Asian,
and 1 Native American. The average age was 21, ranging from 19 to 38. Nearly all partic-
ipants had no prior experience using ChatGPT except three participants who had used it
for testing or occasional use.
Procedure
Approval to conduct this study was granted an exemption from the Institutional Review
Board (IRB) of the university where the study was carried out before its implementation.
The class was taught in a 16-week semester covering three physics concepts with each
concept being assessed by an exam. This study was implemented for the second exam
consisting of 50 multiple-choice questions measuring participants’ understanding of
light, radioactivity, and related information. The participants took the exam in an in-per-
son class session and the responses were graded as correct or incorrect. The participants
then had a week and a half to complete a makeup exam assignment to regain lost cred-
its from the exam by “chatting” with ChatGPT. After they completed the assignment,
the participants were instructed to complete an end-of-study survey asking about their
experience in learning with ChatGPT for this assignment. To prohibit any misconcep-
tions about AI or physics that might be introduced by ChatGPT used in this study, at the
end of the study, the instructor reviewed commonly incorrectly answered questions and
asserted that AI, particularly ChatGPT, can be error-prone.
Materials
Make-up exam assignment The makeup exam assignment was designed as a fillable form
and a maximum of 10 questions were allowed for the participants to make up for their
exam. Appendix A shows a fillable form for one question. The first page of the assign-
ment provided a 4-step instruction on how to complete the assignment including how
to create an account for ChatGPT and how to complete the assignment for each ques-
tion that they answered incorrectly in their exam (Fig. 1). The participants were also
informed that ChatGPT is an Artificial Intelligence system so they can make mistakes
and the participants were specifically requested to think critically about the answers
provided by ChatGPT. The exam key was released prior to the assignment of the make-
up exam.
For each question, the participants needed to follow a specific order when “chat-
ting” with ChatGPT; however, they also had a certain level of flexibility. As shown in
Fig. 2, first, the participants were asked to ask the original question to ChatGPT and
get an answer from it; once ChatGPT provided an answer, the participants needed to
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 6 of 18
check if their original answers (i.e., incorrect answers) were the same as ChatGPT’s
answers. If the answers were consistent, the participants were asked to decide if they
agreed or disagreed with ChatGPT’s answers and provide the rationales for their
decisions; if the answers were inconsistent, the participants would need to tell Chat-
GPT their original answers to the questions (i.e., incorrect answers) and then would
need to again decide if they agree or disagree with ChatGPT’s responses, once again
providing the rationales for their decisions. Aside from the provided prompts, stu-
dents were allowed to ask any questions that they had for ChatGPT during the entire
conversation.
Survey The end-of-study survey was administered online via Qualtrics. The sur-
vey consisted of 13 Likert-scale items for 3 subscales, with responses ranging from 1
(strongly disagree) to 5 (strongly agree). Demographic information was also collected
through the survey. At the end of the survey, the participants were required to draw a
visual representation of their perception of ChatGPT. Drawing as a research method has
been extensively employed to comprehend students’ perceptions of scientists and has
proven to be an effective approach that surpasses the limitations of verbal communica-
tion and provides in-depth insights (Finson, 2002).
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 7 of 18
Table 1 shows the measured concepts of the Likert-scale items, the number of items
for each subscale, a sample item from each scale, and Cronbach’s α of each subscale.
The subscales of perceived usefulness and perceived ease of use were from Davis’s (1989)
scales and were slightly modified to fit the current study. The items measuring partic-
ipants’ continuous intention to use ChatGPT in the future were modified from items
developed by Davis (1989) and published in Falode’s (2018) study. The reliabilities for the
subscales used in this study were all at an acceptable level (Hair, 2009) and ranged from
0.851 to 0.931.
Fig. 3 The elbow point for the clustering analysis of the participants’ trust levels
Table 2 Descriptive statistics for the accuracy of ChatGPT’s answers at trust levels
Group N Agreement Accuracy
Mean SD Mean SD
accuracy of the answers was normalized by using the percentages of questions correctly
answered by ChatGPT and the total number of questions asked. All assumptions were
checked and met. The ANOVA was significant, F (2, 37) = 6.383, p = 0.004, η2 = 0.257.
A post hoc Tukey HSD test indicated that the accuracy of ChatGPT’s answers of the
Distrust group was significantly lower than that of the Partial Trust group (p = 0.010)
and the Trust group (p = 0.003). However, there were no significant differences in answer
accuracy between the Partial Trust group and the Trust group (p = 0.671). Table 2 shows
descriptive statistics for the participants’ trust level and the accuracy of ChatGPT’s
answers. Even though the accuracy of ChatGPT’s answers was only 82% correct, the par-
ticipants in the Trust group agreed to the answers 100% of the time. On the other hand,
the accuracies of the ChatGPT’s answers were slightly higher than the agreement levels
in the Partial Trust group and Distrust group.
To assess if there were differences in the perception of ChatGPT among the three
levels of trust groups, a one-way MANOVA (Multivariate analysis of variance) was
performed to determine if the three groups differed in the perception of ChatGPT in
terms of perceived usefulness, perceived ease of use, and intention to use in the future.
Assumptions of MANOVA were checked and were met. A significant difference was
found in the perception of ChatGPT based on trust levels, F (6, 70) = 2.478, p = 0.031;
Wilk’s lambda = 0.680, η2 = 0.175. A post hoc Tukey HSD test revealed that the Trust
group perceived ChatGPT as significantly more ease of use than the Partial Trust group
(p = 0.026), and a borderline significant difference was found between the three groups
in terms of continuous intention to use in the future (p = 0.059) with the Trust group
showed a higher likelihood of intention than the Partial Trust group. Table 3 shows
descriptive statistics for the participants’ perception of ChatGPT in terms of trust levels.
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 9 of 18
Table 3 Descriptive statistics for the perception of ChatGPT’s answers at trust levels
Concepts Trust group Partial trust group Distrust group
Mean SD Mean SD Mean SD
Unfortunately, only 28 participants submitted their drawings, and one of the files
lost readability and could not be opened. Therefore, 27 drawings were included in the
analysis. A deductive analysis approach was adopted (Bingham & Witkowsky, 2022).
That is, drawing from the literature on the anthropomorphism of AI as well as viewing
the participants’ drawings repeatedly, the first author of this study developed a coding
scheme that mainly focused on two aspects of the drawings—perceived humanness and
perceived warmth (Belanche et al., 2021). Perceived humanness was coded as Human
and Robot/Machine; perceived warmth was coded as Positive, Negative, and Neu-
tral/No Expression. Each category in the coding scheme was provided with a detailed
description and this coding scheme was employed by a second researcher to code the
participants’ drawings. An interrater reliability analysis using the Kappa statistic was
performed to determine consistency between the two researchers. The reliability of per-
ceived humanness yielded to be 0.723 (p < 0.001) and the reliability of perceived warmth
was 0.875 (p < 0.001). Both were at substantial agreement levels (Landis & Koch, 1977).
The two researchers then met to address any inconsistencies in the codes until an agree-
ment was reached.
Overall, as shown in Table 4, the majority of the participants perceived ChatGPT to be
a Robot/Machine and to be either positive or neutral. Two Chi-square analyses were car-
ried out to assess if there was any correlation between trust levels, perceived humanness,
and perceived warmth. Due to the violation of assumptions for Chi-square tests, we
adopted the Likelihood Ratio instead of Pearson Chi-square statistics as recommended
by Field (2009). It appeared that the trust levels of ChatGPT significantly correlated to
the perceived humanness and that the majority of the participants in the Trust group
perceived ChatGPT as more of a Robot/Machine than Human compared to the other
two groups, χ2 (2, N = 27) = 7.37, p = 0.025.
Other than perceiving ChatGPT as a human or robot, several other misconceptions
found in people’s understanding of other types of AI systems also appeared in the par-
ticipants’ drawings. First, some participants directly or indirectly indicated in their
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 10 of 18
Discussion
This study explores students’ perceptions toward using ChatGPT as a virtual tutor in a
physics class. Our findings carry similarities and differences relative to previous research
on other AI-based systems. Consistent with the findings found in previous research on
accuracy issues of other AI systems (Mhlanga, 2023), in our study ChatGPT only pro-
vides 85% accuracy in solving physics questions, particularly for light and radioactiv-
ity-related topics. While students are specifically informed that AI can make mistakes
and the exam keys are even provided, almost half of the students still agree with all the
answers provided by ChatGPT regardless of whether the answers are correct or incor-
rect. And those students are also found to be perceiving ChatGPT as easy to use and
more likely to use it in the future. Previous research has found that people’s perceived
ease of use of an AI-based technology would enhance their trust in the AI technology
(Qin et al., 2020), our findings suggest that the reverse relationship also exists. Nev-
ertheless, caution is warranted when it comes to employing GenAI as a virtual tutor.
Any incorrect information offered by GenAI can lead to misconceptions in learning in
the future. Misconceptions have been notoriously known as challenging to correct in
learning and once the misconception has been formed, it is very hard to change (Smith
et al., 1993). Given the language power of GenAI (particularly ChatGPT) and the lack of
transparency regarding how GenAI works, people without AI literacy training are more
likely to blindly trust incorrect information provided by AI systems (Lockey et al., 2021).
Adding the blind trust of AI systems on top of science misconceptions when using AI
systems to teach would make the science misconceptions even more robust and more
challenging to address.
In addition, our findings reveal that students hold several misconceptions about Chat-
GPT that are consistent with those observed toward other AI systems. The first preva-
lent misconception is the anthropomorphic conception. Many students draw ChatGPT
in the form of a human or a machine that has a brain. The ability of ChatGPT to engage
in human-like conversations may mislead students into perceiving ChatGPT as human/
human-like or possessing human-like cognitive abilities. This misconception has per-
sisted throughout the history of computer advancement and is not only limited to AI
(Mertala et al., 2022; Rücker & Pinkwart, 2016).
The second most prominent misconception shown in our data is that students con-
fuse ChatGPT with robots. This misconception could link back to the anthropomor-
phizing of ChatGPT and AI systems in general as both embody an abstract concept of
GenAI as a concrete representation of a human or human-like object. Mass media may
also have played a critical role in forming the misconception about ChatGPT as a robot
given the portrayal of general AI characters in Western movies such as the Star Wars
series (e.g., C-3PO, R2-D2, and BB-8), and Marvel Studios productions (e.g., J.A.R.V.I.S.,
F.R.I.D.A.Y., and Ultron).
Another misconception that appears in our study is that students seem to believe
ChatGPT has super-intelligence and can perform tasks in a “magic” way. This miscon-
ception mostly emerges from the Trust group in which students blindly trust ChatGPT’s
answers without questioning their accuracy. Previous research conducted with K-12
teachers found that their participants believed that AI-enhanced educational technolo-
gies should be perfect and make no errors (Nazaretsky et al., 2021). This result indirectly
aligns with our finding that if people believe AI should be perfect then they would trust
the AI’s output (i.e., ChatGPT’s answers) uncritically.
Contrary to the findings reported from previous studies, we find that the students who
perceive ChatGPT as being human or human-like are primarily from the Partial Trust
and Distrust groups, whereas the students who perceived ChatGPT as being a machine
or robot are majorly from the Trust group. Previous research has shown that individuals
were more likely to trust AI-enhanced technologies when they perceived them as pos-
sessing human-like characteristics and exhibiting positive attributes such as warmth and
friendliness (Cheng et al., 2022; Glikson & Woolley, 2020). One possible reason that is
attributable to the difference between our findings and the findings reported from previ-
ous studies could be that those studies were conducted in industry environments and
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 12 of 18
Implications
Our findings suggest both pedagogical and design implications. With the development
and increased popularity of AI, our findings support the call for teaching AI literacy to
students to maximize the potential AI could bring to education, especially GenAI.
First, we find that many students blindly trust ChatGPT’s answers. Therefore, in AI
literacy education, we suggest first and foremost improving students’ critical thinking
skills in assessing the information from their surroundings. This is not only limited to
text information they receive from GenAI such as ChatGPT and Google Bard, but also
includes images, audio, and videos (e.g., deep fakes). Students need to be able to criti-
cally analyze the information they receive from GenAI and know how to find reliable
resources to verify the validity of the information. It is also important for them to be
able to recognize artifacts created by GenAI (Michaeli et al., 2023). Second, an intriguing
finding from our study is that among the 206 additional questions asked by the students,
ChatGPT changed its original answers 41 times. One of the reasons that there is a vari-
ety in the answers provided by ChatGPT could be how the students asked their ques-
tions. ChatGPT has the unique ability to not only provide a response but also generate
related content based on subsequent questions and prompts (Sun et al., 2022). There-
fore, prompt engineering is indeed needed in AI literacy education to help students
understand how to ask proper questions that yield the desired information. This skill
is particularly important when tackling open-ended science problems that do not have
definitive answers.
In addition, it seems anthropomorphism, and by extension, equating AI to robots,
have been the predominant misconceptions persistently shown across all different types
of AI, including GenAI in our study. When designing materials to teach AI literacy, pre-
vious pedagogical strategies to address misconceptions in science education, such as
conceptual change (Daniel & Carrascosa, 1990), can be adapted to address this miscon-
ception in AI. Fourth, some students in our study manifested the mindset that Chat-
GPT has super-intelligence and is infallible. Therefore, in AI literacy education, students
need to develop a warranted judgment through understanding how AIs work and need
to be aware that AI, including GenAI, largely relies on the data used for training their
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 13 of 18
algorithms and that AI can make critical mistakes and be very error-prone (Buolamwini
& Gebru, 2018). In relation to this finding from our study (i.e., super-intelligence), it is
also imperative to allow students to differentiate the concepts of general AI versus nar-
row AI. The super-intelligence mentality implies the concept of general AI, however, the
majority of existing AI systems are tailored to specific domains and have clear bounda-
ries within which they operate effectively, such as utilizing natural language processing
(NLP) for answering questions or employing computer vision (CV) to identify individu-
als by their facial features (Kim et al., 2023). Last but not least, some of the students in
our study equate ChatGPT to a search engine or Siri suggesting that the concept of all AI
not working the same way needs to be strongly considered when it comes to AI literacy
education.
Our results also provide implications for designing GenAI for educational purposes.
The most used large language model (LLM) is GPT-3 (Kasneci et al., 2023), and GPT-3 is
trained based on online data and not scientifically proven data. That being said, GPT-3 is
error-prone and is not entirely reliable when it comes to addressing concepts in sciences.
In our study, ChatGPT only performed at an 85% accuracy level and sometimes would
change its correct answers to incorrect answers when additional questions were asked.
Furthermore, due to a lack of AI knowledge, many students heavily relied upon the
answers provided by the model. This suggests that caution should be exercised when
using it directly for teaching sciences. Therefore, in future design and employment of
GenAI for teaching and learning, techniques such as explainable AI or interpretable AI
can be considered to “white box” how models work and allow students to make war-
ranted judgments. Our findings also suggest that when designing GenAI for facilitat-
ing teaching and learning, especially utilizing it as a virtual tutor, designers may want
to shy away from making it too anthropomorphic. The anthropomorphic characteristics
of ChatGPT may have contributed to the students’ misconception of perceiving it as a
human or human-like entity in our study. Once this misconception is formed, it will be
challenging to correct it. Considering the fact that a majority of students from the Trust
group perceive ChatGPT as a know-it-all or a magic machine or robot, it would be ben-
eficial to avoid using persuasive communication cues in designing GenAI for teaching
and learning that could trigger authority heuristics in students, even when the generated
information is incorrect.
Limitations
Several limitations in this study could be improved in future studies. First, the majority
of the students enrolled in the study were female and the study was conducted within
one physics class. It has been found that gender can play a critical role in individuals’ per-
ceptions of AI. Females are more likely to have less knowledge about AI and to believe in
anthropomorphism than males (Ding, et. al., 2023). This may have limited the generabil-
ity of the results found in this study. Future studies can benefit from investigating a more
balanced sample to verify the results reported in this study. Second, more participants
who have more experience with AI systems could have been recruited for the study to
test if there are any differences in the trust of ChatGPT’s answers between a more expe-
rienced group and a less experienced group. Third, we could not interview students for
additional questions about their drawings of ChatGPT. Students might have intended to
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 14 of 18
convey specific details or nuances in their drawings that are not immediately apparent
through coding alone. Through solely coding students’ drawings, some information may
have been lost or the drawing could have been wrongly interpreted by the researchers.
Follow-up interviews could be beneficial to confirm the accuracy of our interpretation
of students’ drawings and to verify the findings of this study. For instance, the wizard hat
on the robot may indicate a misconception that AI has a magic power as we interpreted.
Interviews could help confirm the accuracy of our interpretation, clarifying whether the
hat actually suggests a magic power or if it is simply a cosmetic element. Finally, students
in this study worked individually. Considering the potential value of group discussions,
conducting focus groups in the future could offer a broader perspective. Exploring the
impact of potential group dynamics might shed light on whether a collaborative setting
could make a significant difference in the results, enriching the overall understanding of
students’ perceptions and experiences with ChatGPT.
Conclusion
Large language models (LLM) offer many opportunities for assisting in teaching and
learning and maintain so much potential for researchers to develop and enhance the
models to fulfill future educational needs. In this study, we tested ChatGPT’s perfor-
mance in learning physics and students’ perception of ChatGPT. ChatGPT was used in
an undergraduate-level introductory physics class as a virtual tutor to address questions
in an exam that were incorrectly answered by students. ChatGPT provided an 85% accu-
racy, however, would occasionally change its answers from correct answers to incorrect
answers when additional questions were asked and vice versa. Students held several
misconceptions of ChatGPT that were similar to those found in the studies conducted
with other forms of AI (e.g., anthropomorphism, AI thinks the same as humans, AI has
super-intelligence). Almost half of the students trusted ChatGPT’s answers regardless of
their accuracy and the majority of them believed ChatGPT was a know-it-all Machine/
Robot. Those students also found ChatGPT to be easy to use and more likely to use it in
the future compared to the Partial Trust group and Distrust group.
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 15 of 18
Appendix A
Acknowledgements
We thank all the students who took their valuable time to participate in this study and Removed for Blinded Review for
their edits to the manuscript. In addition, we would like to express our sincere appreciation to all the reviewers for their
comments and feedback to improve this paper.
Author contributions
LD: Conceptualization, Methodology, Formal analysis, Project administration, Writing—Original Draft, Writing—
Reviewing & Editing; TL: Writing—Original Draft, Writing—Reviewing & Editing; SJ: Writing—Reviewing & Editing;
AG—Investigation.
Funding
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial inter-
est or non-financial interest in the subject matter or materials discussed in this manuscript.
Declarations
Competing interests
The authors declare that they have no competing interests.
References
Aaker, J. L., Garbinsky, E. N., & Vohs, K. D. (2012). Cultivating admiration in brands: Warmth, competence, and landing in
the “golden quadrant.” Journal of Consumer Psychology, 22(2), 191–194.
Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of
ChatGPT. Contemporary Educational Technology, 15(3), ep429.
Airenti, G. (2015). The cognitive bases of anthropomorphism: From relatedness to empathy. International Journal of Social
Robotics, 7(1), 117–127. https://doi.org/10.1007/s12369-014-0263-x
Alshater, M. (2022). Exploring the role of artificial intelligence in enhancing academic performance: A case study of Chat-
GPT. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4312358
Baidoo-Anu, D. & Owusu Ansah, L (2023). Education in the era of generative artificial intelligence (AI): Understanding
the potential benefits of ChatGPT in promoting teaching and learning. SSRN. https://www.researchgate.net/publi
cation/369385210
Belanche, D., Casaló, L. V., Schepers, J., & Flavián, C. (2021). Examining the effects of robots’ physical appearance, warmth,
and competence in frontline services: The Humanness-Value-Loyalty model. Psychology and Marketing, 38(12),
2357–2376. https://doi.org/10.1002/mar.21532
Bewersdorff, A., Zhai, X., Roberts, J., & Nerdel, C. (2023). Myths, mis-and preconceptions of artificial intelligence: A review
of the literature. Computers and Education Artificial Intelligence, 100143.
Bingham, A. J., & Witkowsky, P. (2022). Deductive and inductive approaches to qualitative data analysis. In C. Vanover,
P. Mihas, & J. Saldaña (Eds.), Analyzing and interpreting qualitative data: After the interview (pp. 133–146). SAGE
Publications.
Bisdas, S., Topriceanu, C. C., Zakrzewska, Z., Irimia, A. V., Shakallis, L., Subhash, J., ... & Ebrahim, E. H. (2021). Artificial intel-
ligence in medicine: a multinational multi-center survey on the medical and dental students’ perception. Frontiers in
Public Health, 9, 795284.
Bitzenbauer, P. (2023). ChatGPT in physics education: A pilot study on easy-to-implement activities. Contemporary Educa-
tional Technology, 15(3), ep430.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agar-
wal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D.
(2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
http://arxiv.org/abs/2005.14165
Buabbas, A. J., Miskin, B., Alnaqi, A. A., Ayed, A. K., Shehab, A. A., Syed-Abdul, S., & Uddin, M. (2023). Investigating Students’
Perceptions towards Artificial Intelligence in Medical Education. Healthcare, 11, 1298.
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification.
In S. A. Friedler & C. Wilson (Eds.), Conference on fairness, accountability and transparency (pp. 77–91). PMLR.
Chan, C. K. Y., & Hu, W. (2023). Students’ Voices on Generative AI: Perceptions, Benefits, and Challenges in Higher Educa-
tion. arXiv preprint arXiv:2305.00290
Chatterjee, J., & Dethlefs, N. (2023). This new conversational AI model can be your friend, philosopher, and guide... and
even your worst enemy. Patterns, 4(1).
Cheng, X., Zhang, X., Cohen, J., & Mou, J. (2022). Human vs. AI: Understanding the impact of anthropomorphism on
consumer response to chatbots from the perspective of trust and relationship norms. Information Processing and
Management. https://doi.org/10.1016/j.ipm.2022.102940
Daniel, G.-P., & Carrascosa, J. (1990). What to do about science “misconceptions.” Science Education, 74(5), 531–540.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS
Quarterly, 13(3), 319–340.
Ding, L., Li, T., & Turkson, A. (2023). (Mis)conceptions and perceptions of artificial intelligence: A scoping review. Manu-
script Submitted for Publication.
Epley, N., Waytz, A., & Cacioppo, J. T. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological
Review, 114(4), 864–886. https://doi.org/10.1037/0033-295x.114.4.864
Falode, O. (2018). Pre-service teachers’ perceived ease of use, perceived usefulness, attitude, and intentions towards
virtual laboratory package utilization in teaching and learning of physics. Malaysian Online Journal of Educational
Technology, 6(3), 63–72. https://doi.org/10.17220/mojet.2018.03.005
Ferrara, E. (2023). Should ChatGPT be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:
2304.03738.
Field, A. (2009). Discovering statistics using SPSS. Sage publications.
Finson, K. D. (2002). Drawing a scientist: What we do and do not know after fifty years of drawings. School Science and
Mathematics, 102(7), 335–345. https://doi.org/10.1111/j.1949-8594.2002.tb18217.x
Gillissen, A., Kochanek, T., Zupanic, M., & Ehlers, J. (2022). Medical students’ perceptions towards digitalization and artificial
intelligence: A mixed-methods study. Healthcare, 10(4), 723.
Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of Man-
agement Annals, 14(2), 627–660. https://doi.org/10.5465/annals.2018.0057
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 17 of 18
Gong, B., Nugent, J. P., Guest, W., Parker, W., Chang, P. J., Khosa, F., & Nicolaou, S. (2019). Influence of artificial intelligence
on Canadian medical students’ preference for radiology specialty: A national survey study. Academic Radiology, 26(4),
566–577.
Gonzalez-Jiminez, H. (2018). Taking the fiction out of science fiction: (Self-aware) robots and what they mean for society,
retailers and marketers. Futures, 98, 49–56. https://doi.org/10.1016/j.futures.2018.01.004
Graesser, A. C. (2016). Conversations with AutoTutor help students learn. International Journal of Artificial Intelligence in
Education, 26, 124–132.
Hair, J. F. (2009). Multivariate data analysis (7th ed.). Prentice Hall.
Hancer, E., & Karaboga, D. (2017). A comprehensive survey of traditional, merge-split and evolutionary approaches
proposed for determination of cluster number. Swarm and Evolutionary Computation, 32, 49–67. https://doi.org/10.
1016/j.swevo.2016.06.004
Hu, K. (2023). ChatGPT sets record for fastest-growing user base. Reuters.
Kaplan, A. M., & Haenlein, M. (2019). Siri, siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations
and implications of artificial intelligence. Business Horizons, 62(1), 15–25. https://doi.org/10.1016/j.bushor.2018.08.
004
Karabenick, S. A. (2003). Seeking help in large college classes: A person-centered approach. Contemporary Educational
Psychology, 28(1), 37–58. https://doi.org/10.1016/S0361-476X(02)00012-7
Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hül-
lermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., …
Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education.
Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Kim, K., Kwon, K., Ottenbreit-Leftwich, A., Bae, H., & Glazewski, K. (2023). Exploring middle school students’ common
naive conceptions of Artificial Intelligence concepts, and the evolution of these ideas. Education and Information
Technologies. https://doi.org/10.1007/s10639-023-11600-3
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Liao, Q. V., & Sundar, S. S. (2022). Designing for responsible trust in AI systems: A communication perspective. ACM Inter-
national Conference Proceeding Series. https://doi.org/10.1145/3531146.3533182
Lim, W. M., Gunasekara, A., Pallant, J. L., Pallant, J. I., & Pechenkina, E. (2023). Generative AI and the future of education:
Ragnarök or reformation? A paradoxical perspective from management educators. The International Journal of
Management Education, 21(2), 100790.
Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences. https://
doi.org/10.3390/educsci13040410
Lockey, S., Gillespie, N., Holm, D., & Someh, I. A. (2021). A review of trust in artificial intelligence: Challenges, vulnerabilities
and future directions. Proceedings of the 54th Hawaii International Conference on System Sciences, 5463–5472.
Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Cohen, W. W., Stylianides, G. J., & Koedinger, K. R. (2013). Cognitive
anatomy of tutor learning: Lessons learned with SimStudent. Journal of Educational Psychology, 105(4), 1152.
McNamara, D. S., Crossley, S. A., & Roscoe, R. (2013). Natural language processing in an intelligent writing strategy tutor-
ing system. Behavior Research Methods, 45, 499–515.
Mertala, P., Fagerlund, J., & Calderon, O. (2022). Finnish 5th and 6th grade students’ pre-instructional conceptions of arti-
ficial intelligence (AI) and their implications for AI literacy education. Computers and Education Artificial Intelligence.
https://doi.org/10.1016/j.caeai.2022.100095
Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. Social
Science Research Network. https://doi.org/10.2139/ssrn.4354422
Michaeli, T., Romeike, R., & Seegerer, S. (2023). What students can learn about artificial intelligence-recommendations for
K-12 computing education. IFIP WCCE 2022: World Conference on Computers in Education. https://doi.org/10.48550/
arXiv.2305.06450
Nazaretsky, T., Cukurova, M., Ariely, M., & Alexandron, G. (2021). Confirmation bias and trust: Human factors that influence
teachers’ attitudes towards AI-based educational technology. https://www.fastcompany.com/90266263/brooklyn-
students-walk-out-of-school-over-zuckerberg-backed
O’Connor, S., ChatGPT. (2023). Open artificial intelligence platforms in nursing education: Tools for academic progress or
abuse? Nurse Education in Practice, 66, 103537. https://doi.org/10.1016/j.nepr.2022.103537
Pavlik, J. V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for
journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93. https://doi.org/10.1177/
10776958221149577
Pizzi, G., Vannucci, V., Mazzoli, V., & Donvito, R. (2023). I, chatbot! The impact of anthropomorphism and gaze direction on
willingness to disclose personal information and behavioral intentions. Psychology & Marketing, 40(7), 1372–1387.
Qadir, J. (2023). Engineering Education in the Era of ChatGPT: Promise and Pitfalls of Generative AI for Education. IEEE
Global Engineering Education Conference (EDUCON), 2023, 1–9. https://doi.org/10.1109/EDUCON54358.2023.10125
121
Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence-based educational systems: Evidence from
China. British Journal of Educational Technology, 51(5), 1693–1710. https://doi.org/10.1111/bjet.12994
Removed for blinded review.
Rücker, M. T., & Pinkwart, N. (2016). Review and discussion of children’s conceptions of computers. Journal of Science
Education and Technology, 25(2), 274–283. https://doi.org/10.1007/s10956-015-9592-2
Sahoo, S., Kumar, S., Abedin, M. Z., Lim, W. M., & Jakhar, S. K. (2023). Deep learning applications in manufacturing opera-
tions: A review of trends and ways forward. Journal of Enterprise Information Management, 36(1), 221–251.
Sallam, M. (2023). ChatGPT utility in health care education, research, and practice: Systematic review on the promising
perspectives and valid concerns. Healthcare, 11(6), 887. https://doi.org/10.3390/healthcare11060887
Schmidt-Fajlik, R. (2023). ChatGPT as a Grammar Checker for Japanese English Language Learners: A Comparison with
Grammarly and ProWritingAid. AsiaCALL Online Journal, 14(1), 105–119.
Ding et al. Int J Educ Technol High Educ (2023) 20:63 Page 18 of 18
Smith, J. P., diSessa, A. A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in
transition. The Journal of Learning Sciences, 3(2), 115–163.
Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57,
100752.
Sun, J., Liao, Q. V., Muller, M., Agarwal, M., Houde, S., Talamadupula, K., & Weisz, J. D. (2022). Investigating Explainability
of Generative AI for Code through Scenario-based Design. International Conference on Intelligent User Interfaces,
Proceedings IUI. https://doi.org/10.1145/3490099.3511119
Sundar, S. S., & Kim, J. (2019). Machine heuristic: When we trust computers more than humans with our personal informa-
tion. Conference on Human Factors in Computing Systems Proceedings. https://doi.org/10.1145/3290605.3300768
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.