Keywords

1 Introduction

This paper investigates different paradigms of emerging and expanding technologies, in terms of user performance and user reaction (i.e., simulator sickness and immersion): forms of Virtual Reality (VR) and Augmented Reality (AR) are compared through the task of sorting virtual balls. The impetus of the research is to understand how a ball-sorting task, which can be linked to other applications, can be completed in two combinations of Head-Mounted Displays (HMDs), or visual goggles, and controllers. Three ball-sorting scenarios were implemented. The aim is to inform future applications within VR and AR tasks.

1.1 Virtual Reality and Augmented Reality

The amount of virtual elements displayed to a human’s senses can define VR and AR. VR provides a completely fabricated, new environment via software; AR alters, or augments, the display of a real environment with additional display input, such as computer-generated graphics (Chavan 2016). VR is completely synthetic and AR is partially synthetic. Recreational VR and AR games continue to be a popular form of technology; however, applications have been increasingly expanding in the education, healthcare, and manufacturing fields through serious games and simulations.

1.2 Games, Serious Games, and Simulations

VR games have been commercially available since the 1990s on the Nintendo Virtual Boy, although these first attempts at making popular VR gaming consoles failed (Kushner 2014). Yet, new VR HMDs have differentiated themselves from their predecessors with enhanced display resolution, low latency, and computing power (Kress et al. 2014). Following the attempts at early VR gaming, one of the first AR systems—ARQuake—was created. Through the extension of the First-Person Shooter (FPS) game Quake, ARQuake utilized two core elements of AR: tracking (of a players’ orientation and position) and overlaying computer-generated graphics onto real environments (Piekarski and Thomas 2002). Since then, AR gaming has branched off into two directions: HMD gaming and mobile phone gaming.

Serious games are fundamentally games intended for not solely entertainment purposes (Susi et al. 2007). Serious games can utilize game characteristics as a means for learning goals. Serious games differ from edutainment (a merging of entertainment into education), in that edutainment typically uses drill and practice activities to teach lower-order thinking skills, whereas serious games typically facilitate learning of higher-order thinking skills (Charsky 2010).

The element of simulation is to mirror and represent some aspect of the real world (Aldrich 2009). Training with virtual simulations offers reduced time and cost, and is flexible for modification of a workplace (Grajewski et al. 2015). According to Aldrich (2009), when comparing serious games and educational simulations, serious games tend to be more engaging, with less fidelity and transferability than educational simulations.

2 Current Task Background

The current effort focuses on the simplified task of sorting virtual, colored balls of different sizes, accomplished through either a VR or AR system. This task involves grasping and placing, and thus potentially relates to games, serious games, and simulation aspects. This task is thought to involve spatial ability, such as through depth perception of ball location and how close one needs to be to a ball to grasp it. Since the task involves balls of different sizes, one can see how well different-sized balls can be manipulated. In terms of the ball-sorting task, similar tasks are visible.

Cidota et al. (2016) looked at participants who used their fingers to clasp a virtual object and place it in a box, in both VR and AR conditions: one used their hands as an interface for the task in both VR and AR. Young et al. (2014) also tested a VR glove (and a high-cost VR HMD) for ordering randomly-shuffled, numbered boxes in ascending order; and participants in this condition underperformed in comparison to a condition which differed through manipulation via a handheld controller and visuals via a consumer-grade VR HMD. Krichenbauer et al. (2018) used the same object placement task, with the same controller (i.e., a 3D input device), for VR and AR conditions: participants performed better in the AR condition than in the VR condition. Further, a system for virtually sorting red and blue balls into appropriate holes was detailed for reducing phantom limb pain (Zweighaft et al. 2012). Finally, sorting has become a game mechanic: Sort ‘Em is a VR game (Steam, n.d.); in contrast, a serious game focused on sorting (and teaching) different types of waste has also been created (Menon et al. 2017). The latter game used gesture-based interaction (via a Microsoft Kinect system).

Various crucial points are taken from the current state of similar tasks. Comparing a VR system using a handheld controller with an AR system using a hand as an interface has yet to be investigated. Despite this, the value of a sorting task is noticeable, noted by applications in clinical therapy and instruction. In any case, the user’s performance, simulator sickness, and immersion in these differing systems have implications for further use cases and practical applications, rendering a need for further research. This paper seeks to better understand the relationship between participants and the systems of VR and AR, through the task of sorting different-colored balls. The VR system consisted of the HP Mixed Reality headset and controllers, whereas the AR system consisted of the Meta 2 headset and one’s hand(s) as controller(s). Hereinafter, the total VR system will be referred to as the HP Mixed Reality, and the total AR system will be referred to as the Meta 2.

3 Simulator Sickness

A current standard for measuring Simulator Sickness (SS) is the Simulator Sickness Questionnaire (SSQ), a self-report survey developed by Kennedy et al. (1993). Simulator sickness comprises ranked categories of Disorientation, Oculomotor, and Nausea subscales. By looking at the descriptive SSQs of participants exposed to virtual environments, Lampton et al. (1994) concluded that SS might lead to undesirable effects. According to a review paper, there are many potential factors that can influence SS, such as the individual characteristics of age and gender; simulator characteristics, including field of view; and task characteristics, such as session duration and maneuver intensity (Johnson 2005).

Pettijohn et al. (2019) compared the use of two different HMDs, one VR and one AR, for SS differences. The researchers implemented three motion conditions (i.e., no motion, synchronous motion, and asynchronous motion); the task was to destroy a virtual hostile ship with a machine gun. The findings showed no significant differences in SS between the VR and AR headsets.

Muth et al. (2006) tested whether uncoupled motion had any effect on overall performance (measured by task completion time and task accuracy) by having participants complete a motor task consisting of maneuvering a vehicle between cones in a virtual driving game. Participants played the game in two conditions: whilst seated in a moving vehicle and whilst seated in a stationary vehicle. Performance was lessened and SS was higher for the motion condition when compared to the stationary condition. Note, the task of the driving game did not incorporate a VR or AR HMD.

4 Immersion

To further clarify participant reactions to different VR and AR systems, a participant’s sense of immersion may be investigated. Jennett et al. (2008) have pinpointed a type of immersion believed to be part of a positive experience, at least for gaming: “Immersion involves a lack of awareness of time, a loss of awareness of the real world, involvement and a sense of being in the task environment. Most importantly, immersion is the result of a good gaming experience.” (p. 657). These latter authors developed the immersion questionnaire. The present study incorporated a modified immersion questionnaire, which this paper’s authors term the immersion measure.

5 Research Questions

The following questions were developed to assess different HMD systems, for the task of sorting virtual balls.

  • RQ1: Is there a statistically significant difference between the HP Mixed Reality and Meta 2 for effectiveness for each of the three scenarios, completion rate, percentage error, and total false positives?

  • RQ2: Is there a statistically significant difference between the HP Mixed Reality and Meta 2 for efficiency (i.e., trial time duration for each of the three scenario, total time duration, and overall relative efficiency)?

  • RQ3: Is there a statistically significant difference between the HP Mixed Reality and Meta 2 for post-test SS, including SS subscales (i.e., Nausea, Oculomotor, and Disorientation) and Total SS?

  • RQ4: Is there a statistically significant difference between pre-test and post-test SS, including SS subscales (i.e., Nausea, Oculomotor, and Disorientation) and Total SS, in the HP Mixed Reality condition?

  • RQ5: Is there a statistically significant difference between pre-test and post-test SS, including SS subscales (i.e., Nausea, Oculomotor, and Disorientation) and Total SS, in the Meta 2 condition?

  • RQ6: Is there a correlation between post-test SS (including SS subscales [i.e., Nausea, Oculomotor, and Disorientation] and Total SS) and performance (i.e., effectiveness: trial 3; time duration: trial 3, completion rate, total false positives, and percentage errors) for the HP Mixed Reality and Meta 2 conditions?

  • RQ7: Is there a statistically significant difference between the HP Mixed Reality and Meta 2 for immersion?

6 Method

6.1 Participants

Forty-two individuals from the University of Central Florida (UCF) and its surrounding areas participated in the study. Post analyses indicated 20 males and 22 females volunteered. Ages ranged from 19 to 34 years (M = 22, SD = 3.38). Participants were vetted during the sign-up process using the Institute for Simulation and Training SONA system (i.e., an online sign-up portal). In order to participate, each individual had to meet inclusion criteria: an age of at least 18 years, United States citizenship, normal or corrected-to-normal vision, no previous history of seizures, and no color blindness. Following the study, the latter lasting up to one hour, participants received $10 compensation for their time and travel.

6.2 Experimental Design

A between-subjects design was employed to assess the difference between the HP Mixed Reality (VR) system and the Meta 2 (AR) system to complete a ball-sorting task. The Independent Variable (IV) was the type of system (i.e., the headset and its related controller[s]). The Dependent Variables (DVs) were the ball-sorting task scenario measurements (i.e., performance outcomes and survey data). Additionally, a within-subjects design was conducted to assess SS, with the IV being the ball-sorting task scenarios, and the DV being the pre-to-post SS scores.

6.3 Testbed

The three scenarios were executed through a single application on one desktop computer (see Table 1 for the desktop specifications). A tutorial and the three scenarios were developed in the Unity game engine; Unity was selected for its user-friendly development interface and capability to support different software development kits.

Table 1. Desktop specifications.

The tutorial required the participant to practice using the controller scheme respective of the headset (i.e., two handheld Mixed Reality controllers for the HP Mixed Reality headset and physical hand gestures for the Meta 2 headset).

In the tutorial, participants were instructed to pick up colored balls (i.e., red and blue balls) from within a purple box and sort them into respective colored boxes (i.e., red balls were to be sorted into a red box, and blue balls were to be sorted into a blue box; for HP Mixed Reality and Meta 2 examples, see Fig. 1). The AR interface (which did not differ between the tutorial and the scenarios) used a blue status indicator, or highlight, that hovered over each hand when each hand was near a ball. Each indicator was a hollow circle until a fist was made; then the hollow circle would become a filled blue circle. The filled blue circle meant a ball was grabbed; a grabbed ball could be moved by moving one’s fist and the ball could be dropped by undoing the fist and opening one’s hand. In contrast, the VR interface (which did not differ between the tutorial and the scenarios), used a single button on each controller to grab a ball. A ball could be moved while the button was held down, and the ball could be released when the button was released. The tutorial had a maximum time-limit of 10 min and allowed for the participant to sort 10 red balls and 10 blue balls. Further, the participant was allowed to sort with one hand or with both their hands; instruction regarding hand strategy was not given to the participant. When the participant was ready to move on, the tutorial ended and the first of the three scenarios began.

Fig. 1.
figure 1

Ball-sorting scenario within HP Mixed Reality (left) and Meta 2 (right) conditions. (Color figure online)

Each of the three scenarios required the participant to apply what they practiced in the tutorial: the participant’s task was to sort a given number of red and blue balls into their respective color-coded boxes within a 5-min time limit. The first scenario contained 20 red balls and 20 blue balls; all balls had a diameter of 0.15 m. A difficulty curve was introduced in each subsequent scenario: the number of balls would increase (i.e., adding 5 red balls and 5 blue balls), and the size of the balls would decrease (i.e., decreasing the diameter of each ball by 0.025 m). As the participant completed each scenario, the software enacted a process of taking input, processing that input, and outputting the processed results.

The input in the software process was the participant’s controller input and ball manipulation. The software processed the input by utilizing virtual colliders on the balls, the boxes, and the rest of the environment: if the balls collided with the correct corresponding box, the software processed the ball manipulation as correctly sorted; if the balls collided with the incorrect box, the software processed the ball manipulation as incorrectly sorted; if the balls collided with any other element of the virtual environment, the software processed the ball manipulation as dropped. The software outputted the processed results into a collective virtual dataset; this dataset was parsed into objective data measurements and logged into a Comma-Separated Values (CSV) file stored onto the desktop.

6.4 Measurements: Surveys

The survey measures used during the data collection procedure included a demographics questionnaire, the SSQ, and the immersion measure.

Demographics Questionnaire.

The demographics questionnaire collected biographical data, such as age, gender, educational experience, and computer game usage.

Simulator Sickness Questionnaire (SSQ).

The SSQ measured the user’s response to the VR and AR systems, in terms of SS symptoms: responses comprised 16 symptoms, rated from none (0) to severe (3). The responses were scored into four subscales (i.e., Nausea, Oculomotor, and Disorientation), as well as a weighted, Total SS score (Kennedy et al. 1993).

Immersion Measure.

The immersion measure comprised eight questions related to the level of immersion experienced during the three presented scenarios. The questions were rated from strongly disagree (1) to strongly agree (5; Jennett et al. 2008).

6.5 Measurements: Performance

Objective performance data was tracked and logged into a CSV file. Participant ID and type of system (i.e., HP Mixed Reality and Meta 2) were logged as identifiers for each file; each logged dataset within the file was labeled with unique timestamps. Performance data included effectiveness and efficiency measurements.

Effectiveness was tracked through four metrics: per-scenario trial effectiveness, completion rate, percentage error, and total false positives. Per-scenario trial effectiveness, expressed in percentage, was calculated by dividing the number of balls correctly sorted by the total number of balls in the given scenario. Completion rate, expressed in percentage, was found by dividing the number of successful scenarios by the total amount of scenarios (i.e., 3); a successful scenario was defined as answering yes to the question, “Did the participant sort all balls, whether correctly or incorrectly, within the scenario time limit?” Percentage error, expressed in percentage, was calculated by dividing the sum of balls incorrectly sorted and balls dropped by the total number of balls in the given scenario. Total false positives, expressed in percentage, were found by dividing the number of balls dropped by the total number of balls in the given scenario.

Efficiency was tracked through three metrics: per-scenario trial time duration, total time duration, and overall relative efficiency. Per-scenario trial time duration, expressed in minutes, was measured by the time it took the participant to complete the scenario; completion of the scenario was determined by a) having all balls sorted or dropped within the scenario; or b) the maximum allotted time of 5 min passing. Total time duration, expressed in minutes, was measured by adding all three per-scenario trial time durations. Overall relative efficiency (see Eq. 1), expressed in percentage, incorporated completion rate and per-scenario trial time duration (Mifsud 2015):

$$ Overall\;Relative\;Efficiency = \frac{{\sum\limits_{j = 1}^{R} {\sum\limits_{i = 1}^{N} {n_{ij} t_{ij} } } }}{{\sum\limits_{j = 1}^{R} {\sum\limits_{i = 1}^{N} {t_{ij} } } }}\;{ \times }\; 1 0 0\% $$
(1)

For the purpose of this study, N represents the three scenarios, R represents the number of participants in the individual dataset (i.e., one participant per dataset), nij represents the result of scenario completion (i.e., completion rate), and tij represents time spent completing the scenario (i.e., per-scenario trial time duration).

6.6 Procedure

Prior to the study, a random number generator was used to order the conditions. Upon participant arrival, the experimenter greeted and escorted the individual to the lab space. Next, the experimenter presented the informed consent form and asked the participant to read, date, and sign the form. If the participant agreed to participate in the study, the experimenter administered a color blindness test. Only if the participant passed the color blindness test would the experiment continue: the experimenter would then provide the demographics questionnaire to complete.

Next, the experimenter administered the pre-scenarios SSQ to obtain a baseline from each participant. Thereafter, the participant viewed an instructional PowerPoint about the experiment. The instructional PowerPoint provided information on the purpose of the experiment, the task to complete, tools (i.e., HMD and form of controller), and instructions on the control scheme for a participant’s given condition. Additionally, the Meta 2’s PowerPoint included a discussion on the Meta 2’s environmental mapping process. After the participant was allowed practice using their condition’s control scheme (i.e., handheld controllers or physical hand gestures) in the tutorial, the participant began the ball-sorting task scenarios. Following each scenario, the participant received a one-minute break. After the last break, each participant completed the post-scenarios SSQ and immersion measure. Finally, the experimenter provided the participant with a compensation receipt and then dismissed him or her.

7 Results

Preliminary tests were conducted to test for normality, outliers, and homogeneity of variance. Regarding normality, the data violated the test of assumptions on the Kolmogorov-Smirnov Test (with Lilliefors significance correction) and the Shapiro-Wilk Test. Outliers were verified using the 5% trim mean as well as the inspection of the box plots. Three data points were identified as potential outliers; however, the data points were left in the data set after checking the experimental log and finding no inconsistencies. Homogeneity of variance was assessed via non-parametric tests. As a result, the non-parametric tests were used for data analysis.

7.1 Data Analysis

RQ1. There was a statistically significant difference between the HP Mixed Reality and Meta 2 for effectiveness for each of three scenarios, completion rate, percentage error, and total false positives. The Mann-Whitney U Test indicated statistically significant findings (see Table 2).

Table 2. Significant effectiveness differences between the HP Mixed Reality and Meta 2.

RQ2. There was a significant difference between the HP Mixed Reality and Meta 2 for efficiency (i.e., trial time duration for each of the three scenario, total time duration, and overall relative efficiency). The Mann-Whitney U Test indicated statistically significant findings (see Table 3).

Table 3. Significant efficiency differences between the HP Mixed Reality and Meta 2.

RQ3. A Mann-Whitney U tested revealed no statistically significant differences between the HP Mixed Reality and Meta 2 for SS, in terms of post-test SS subscales (i.e., Nausea, Oculomotor, and Disorientation) and Total SS.

RQ4. A Wilcoxon Signed-Rank Test revealed a statistically significant increase in Disorientation following the three test scenarios in the HP Mixed Reality condition, z = −2.14, p < .05, with a medium effect size (r = .33). The pre-test median score increased from (Md = .00) to post-test median score (Md = 6.98).

RQ5. A Wilcoxon Signed-Rank Test revealed no statistically significant difference between pre-test and post-test SS, in terms of SS subscales and Total SS, in the Meta 2 condition.

RQ6. There were no statistically significant Spearman’s correlations found between post-test SS (i.e., neither SS subscales nor Total SS) and performance (i.e. effectiveness: trial 3; time duration: trial 3; completion rate, total false positives, and percentage errors) for the HP Mixed Reality and Meta 2 conditions.

RQ7. A Mann-Whitney U Test revealed a statistically significant difference between the HP Mixed Reality and Meta 2 for the immersion measure statement “The scenarios were challenging.” See Table 4 for statistical findings.

Table 4. Significant immersion differences between the HP Mixed Reality and Meta 2.

8 Discussion

In terms of objective performance for the ball-sorting task, participants were more effective and efficient with the VR system. This suggests a higher level of objective usability for the VR system, as it allows one to complete the task at an easier level. Perhaps the AR system had interface difficulties, such as the ability to grab a ball or how the system could only control objects displayed within the HMD (i.e., if a grabbed ball left the headset’s field of view, the ball was no longer registered as grabbed). Note, AR participants were told (in the instructional PowerPoint) that if a ball moved out of their view, it was no longer considered in their possession. Given both systems differed on both HMD and controller, the attribution of usability causality is ambiguous. To improve the AR interface, various solutions are given: additional feedback via a haptic armband may be used to alert when a ball is grasped in AR; and allowing the software to remember if an off-screen ball was continuously grasped might affect performance. Ultimately, maturation appears needed for the AR system’s usability.

Further, the higher challenge in the AR system may also be due to immersion, and thus can be interpreted as AR having a positive challenge that captivates participants. Although this challenge may benefit games, more serious applications could be hindered. For example, surgery may benefit from a bare-hand interface to explore medical images (Gallo et al. 2011); perhaps future surgery could involve a bare-hand interface merged with AR visuals, due to the restriction of using handheld controllers during surgery. This is a potential way AR maturation for object selection tasks has practical merit.

Between the VR and AR systems, the relatively stationary task of ball-sorting produced similar SS. Still, participants in the VR system experienced increased disorientation from pre- to post-scenarios. Methods on eradicating disorientation in tasks similar to ball-sorting may debar SS effects in VR. When disorientation is a concern, the application of AR may be a more forgiving option than VR, in terms of SS.

8.1 Limitations

The ball-sorting task contained notable limitations. One limitation relates to defining the source of performance and reaction (i.e., SS and immersion) differences, since the systems differed in both HMD and controllers. Another limitation focuses on the study design: there was little to no control over the participant’s method for sorting the balls into the bins (e.g., whether to use one or two hands when sorting); this lack of control could have impacted participant performance. Finally, it is difficult to generalize the research findings outside of the ball-sorting task completed via the VR and AR systems.

9 Conclusion

This paper investigated aspects of both VR and AR systems, in terms of both users’ performance and reaction (i.e., SS and immersion), with regards to a virtual ball-sorting task. The task goal and scenarios were similar, but the tools differed between VR and AR. As VR provides enhanced performance in both effectiveness and efficiency, AR may require further maturation of interface conventions. In opposition, AR is more suitable for low-disorientation settings than VR, at least for the systems and task compared.

9.1 Next Steps

The next steps of the study include determining if the increased challenge relating to immersion in AR is beneficial to educational applications or is restrictive and provides undue workload. Another step would be to test the ball-sorting task using exclusively either VR or AR systems. Also, tweaking the AR interface system could impact performance. Finally, another experimental design would be to test different natural, hand-based interfaces (e.g., haptic gloves) to complete identical ball-sorting tasks.