0% found this document useful (0 votes)
15 views29 pages

Early Detection of Visual Impairment in Young Children Using A Smartphone-Based Deep Learning System

The document presents a study on the Apollo Infant Sight (AIS), a smartphone-based deep learning system designed for early detection of visual impairment in young children. The system analyzes gazing behaviors and facial features from videos of children to identify 16 ophthalmic disorders, achieving high accuracy in both clinical and at-home settings. The AIS has the potential to improve early diagnosis and intervention for visual impairment, particularly in low-resource environments.

Uploaded by

sjtu.xiaowei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

Early Detection of Visual Impairment in Young Children Using A Smartphone-Based Deep Learning System

The document presents a study on the Apollo Infant Sight (AIS), a smartphone-based deep learning system designed for early detection of visual impairment in young children. The system analyzes gazing behaviors and facial features from videos of children to identify 16 ophthalmic disorders, achieving high accuracy in both clinical and at-home settings. The AIS has the potential to improve early diagnosis and intervention for visual impairment, particularly in low-resource environments.

Uploaded by

sjtu.xiaowei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

nature medicine

Article https://doi.org/10.1038/s41591-022-02180-9

Early detection of visual impairment in


young children using a smartphone-based
deep learning system
Received: 16 June 2022 A full list of authors and their affiliations appears at the end of the paper.

Accepted: 9 December 2022


Early detection of visual impairment is crucial but is frequently missed in
Published online: 26 January 2023
young children, who are capable of only limited cooperation with standard
Check for updates vision tests. Although certain features of visually impaired children, such as
facial appearance and ocular movements, can assist ophthalmic practice,
applying these features to real-world screening remains challenging. Here,
we present a mobile health (mHealth) system, the smartphone-based Apollo
Infant Sight (AIS), which identifies visually impaired children with any of 16
ophthalmic disorders by recording and analyzing their gazing behaviors
and facial features under visual stimuli. Videos from 3,652 children (≤48
months in age; 54.5% boys) were prospectively collected to develop and
validate this system. For detecting visual impairment, AIS achieved an
area under the receiver operating curve (AUC) of 0.940 in an internal
validation set and an AUC of 0.843 in an external validation set collected
in multiple ophthalmology clinics across China. In a further test of AIS for
at-home implementation by untrained parents or caregivers using their
smartphones, the system was able to adapt to different testing conditions
and achieved an AUC of 0.859. This mHealth system has the potential to be
used by healthcare professionals, parents and caregivers for identifying
young children with visual impairment across a wide range of ophthalmic
disorders.

Visual impairment is one of the most important causes of long-term tests, even when performed by experienced pediatric ophthalmolo-
disability in children worldwide and has a detrimental impact on edu- gists, have been shown to have low repeatability in large-scale popu-
cation and socioeconomic achievements1,2. Infancy and toddlerhood lation screening studies9–11. Therefore, it is imperative to develop an
(early childhood) are critical periods for visual development3, dur- easy-to-use and effective detection tool to enable the timely diagnosis
ing which early detection and prompt treatment of ocular pathology of visual impairment in young children and prompt intervention.
can prevent irreversible visual loss4,5. Young children are unable to Ocular abnormalities causing visual impairment in children often
complain of visual difficulties, and since they are unwilling or find it manifest with typical phenotypic features, such as leukocoria (white
difficult to cooperate with standard vision tests (for example, optotype eye) in cataract12 and retinoblastoma13, eyelid drooping in congenital
tests), age-appropriate tests such as grating acuity cards are commonly ptosis14, and a cloudy and enlarged cornea in congenital glaucoma15. In
used to observe their reactions to visual stimuli6,7. However, evaluating addition, previous studies have found that dynamic aberrant behavio-
the vision of young children using these tests requires highly trained ral features such as abnormal ocular movement, fixation patterns or
operators, which greatly hinders their wider adoption, especially in visual preference can also point toward an underlying ocular pathol-
low-income and middle-income countries with the highest prevalence ogy in children16,17. These phenotypic manifestations are frequently
of visual impairment but poor medical resources8. In addition, these seen in ocular diseases, such as amblyopia and strabismus, and they

e-mail: dingxiaowei@sjtu.edu.cn; linht5@mail.sysu.edu.cn

Nature Medicine | Volume 29 | February 2023 | 493–503 493


Article https://doi.org/10.1038/s41591-022-02180-9

can provide valuable clues for diagnosing visual impairment in young preparations, including choosing and maintaining a suitable testing
children18–20. However, systematically recording and applying these setting (Extended Data Fig. 2). After data collection was completed, DL
features to real ophthalmic practice are still in their infancy due to the models were applied to analyze the collected features and identify visu-
lack of practical and effective tools. ally impaired children. To ensure the system’s performance in chaotic
Given the rapid development of mobile health (mHealth) and arti- settings (environments with various interference factors or biases that
ficial intelligence (AI) algorithms in identifying or monitoring disease can impact the system’s performance), a series of algorithm-based qual-
states21,22, the use of mobile devices, such as smartphones, to record ity checking operations, including face detection (max-margin object
and analyze phenotypic features to help identify visual impairment detection (MMOD) convolutional neural network (CNN)); facial key
in young children presents great opportunities. However, develop- point localization (ensemble of regression trees); and crying, occlu-
ing such a system for large-scale ophthalmic application is hindered sion and interference factor detections (the EfficientNet-B2 backbone
by three main challenges: (1) collecting phenotypic data that reliably shown in Extended Data Fig. 3a,b), was first automatically performed by
reflect the visual status of the children in complex environments, (2) a quality control module to extract consecutive frames of high quality
generalizing the system for large-scale applications and (3) providing from the original video as short clips. Facial areas were cropped out
evidence of its feasibility. The major bottleneck that impedes the wide- to further eliminate environmental interference before the qualified
spread adoption of many medical AI systems is the limited feasibility clips were sent to a DL-based detection model for identifying visually
and reliability when applied to settings with various data distribu- impaired children and a diagnostic model for discriminating multiple
tions in the real world23,24. A lack of cooperation is very common in ocular disorders (the EfficientNet-B4 backbone shown in Extended
pediatric ophthalmic practice, with constant head movement during Data Fig. 3c). The final results were returned to the mHealth app to
examinations introducing test noise that poses several challenges to alert users to promptly refer children at high risk of visual impairment
the stability of the system25. For the nascent technology of mHealth, to experienced pediatric ophthalmologists for timely diagnosis and
rigorous evidence of clinical application is necessary but generally lack- intervention.
ing21. These major difficulties explain the current lack of an effective We first developed the data quality control module. Two facial
and practical tool for detecting visual impairment in young children. detection and key point localization models were pretrained on pub-
In this prospective, multicenter, observational study, we devel- licly available datasets and adopted from an open-source library26.
oped and validated a smartphone-based system, the Apollo Infant Sight Additionally, we developed three CNNs for crying, interference and
(AIS), to identify visual impairment in young children in real-world occlusion detection using images sampled from raw videos collected
settings. AIS was designed to induce a steady gaze in children by using at the ZOC clinic (Extended Data Fig. 3d and Supplementary Table 1).
cartoon-like video stimuli and collect videos that capture phenotypic Then, we trained and validated the detection/diagnostic models on the
features (facial appearance and ocular movements) for further analysis development dataset collected by trained volunteers using iPhone-7/8
using deep learning (DL) models with robust quality control design smartphones at the clinic of ZOC (Extended Data Fig. 3e). A total of
against test noises. We collected more than 25,000,000 frames of 2,632 raw videos from 2,632 children were collected, and after auto-
videos from 3,652 children using AIS for DL model training and testing. matic quality control, videos of 2,344 children (89.1%) were reserved as
We evaluated the system for detecting visual impairment caused by the development dataset (Fig. 1b), including 871 (37.2%) for children in
any of 16 ophthalmic disorders in five clinics at different institutions. the ‘nonimpairment’ group, 861 (36.7%) in the ‘mild impairment’ group
Furthermore, we validated this system under different conditions with and 612 (26.1%) in the ‘severe impairment’ group. Detailed information
various test noise levels or ambient interference presented in real-world on the qualified dataset is provided in Table 1. Before model training,
settings. We also evaluated AIS used by untrained parents or caregivers the development dataset was randomly split into training, tuning
at home to test its wider applicability. This preliminary study indicates and validation sets stratified on sex, age and the ophthalmic condi-
that AIS shows potential for early detection of visual impairment in tion (Supplementary Table 2). The videos utilized for quality control
young children in both clinical and community settings. module development were excluded from the detection/diagnostic
model validation.
Results
Overview of the study Performance of the detection model in real clinical settings
We conducted this prospective, multicenter and observational study with trained volunteers
(identifier: NCT04237350) in three stages from 14 January 2020 to 30 The detection model was trained to discriminate visually impaired
January 2022 and collected a total of 3,865 videos with 25,972,800 children from nonimpaired children based on the high-quality clips
frames of images from 3,652 Chinese children (aged ≤48 months) to extracted from the phenotypic videos. At the clip level, the detection
develop and validate the AIS system in clinical and at-home settings model achieved an area under the receiver operating curve (AUC) of
(Fig. 1). The AIS system was developed and comprehensively tested 0.925 (95% confidence interval (95% CI), 0.914–0.936) in the internal
(internal validation and reliability analyses under different testing validation (Extended Data Fig. 4a). Furthermore, we evaluated the per-
conditions) at the clinic of Zhongshan Ophthalmic Center (ZOC) in formance of the detection model via an independent external validation
the first stage, and was further tested in four other centers (external performed by trained volunteers using iPhone-7/iPhone-8 smartphones
validation) and community settings (at-home implementation) in the at the routine clinics of four other centers. In this stage, quality checking
second and third stages, respectively. was embedded in the data acquisition process, and the quality control
module automatically reminded volunteers to recollect data when the
Development of the mHealth AIS system videos were of low quality (Fig. 1b). Qualified videos for 298 children
We developed AIS for detecting visual impairment in young children undergoing ophthalmic examinations were utilized for final valida-
tailored to the present study (Fig. 1a and Supplementary Video 1). A tion, including 188 (63.1%) nonimpaired children, 67 (22.5%) mildly
child-friendly app was designed to attract children to maintain their impaired children and 43 (14.4%) severely impaired children (Table 1).
gaze using cartoon-like stimuli (Extended Data Fig. 1). The inbuilt front At the clip level, the detection model achieved an AUC of 0.814 (95%
camera of the smartphone recorded 3.5-min videos that captured CI, 0.790–0.838) in the external validation (Extended Data Fig. 4b).
phenotypic features of the facial appearance and ocular movements The performance of the detection model to identify visually
during gazing. In this process, the mHealth app interactively guided impaired children was evaluated by averaging the clip-level predic-
users (healthcare professionals, volunteers, parents and caregivers) tions. Figure 2a shows distinguished clip predicted probability pat-
to familiarize themselves with the system and complete standardized terns for children with various visual conditions. At the child level, the

Nature Medicine | Volume 29 | February 2023 | 493–503 494


Article https://doi.org/10.1038/s41591-022-02180-9

a
Q Output

Occlusion
detector

Video level
integration
Crying
detector
O

augmentation
Crop & data
Inteference
detector FC
decoder

P
Temporal
Key point pooling
regressor

EfficientNet
Start Stop -B4
70% Face
detector encoder

I Sface

b
3,286 children recruited at 28 ineligible
ZOC 25 could not record raw
videos
3 had history of brain or
Stage 1 mental illnesses
3,258 children recorded
phenotypic videos at ZOC
24 ineligible Developing
24 could not undergo detection/diagnostic models
eye examinations and relability analyses (2,344
3,234 children received eye children, 2,344 videos)
examinations at ZOC
Across-smartphone analysis
(361 children, 361 videos)
Developing quality Raw data (3,234 children,
control module 3,447 videos) Retest analysis (187 children,
374 videos)

Unqualified data (342 Qualified data (2,892 children,


Automated quality control
children, 368 videos) 3,079 videos)

Stage 2 Stage 3

305 children recruited at 125 children recruited


multiple centers 4 ineligible 3 ineligible online
4 could not record 3 could not record
qualified videos qualified videos
301 children recorded 122 children recorded
phenotypic videos with phenotypic videos with
automated quality control automated quality control
at multiple centers 3 ineligible 2 ineligible at home
3 could not undergo 2 could not undergo
eye examinations eye examinations
298 children received 120 children received eye
eye examinations at examinations at ZOC
multiple centers

Qualified data (298 Qualified data (120 children,


children, 298 videos) 120 videos)

At-home Fine-tuning the detection


External validation (298
implementation (88 model (32 children, 32
children, 298 videos)
children, 88 videos) videos)

Fig. 1 | Overall study design and participant flow diagram. a, Workflow of the inputs to the detection/diagnostic models. A small rectangle indicates input or
system. The smartphone-based AIS system consists of two key components: an output data, a large rectangle indicates mathematical operation, and a trapezoid
app for user education, testing preparation and data collection and a DL-based indicates DL or machine learning algorithm. b, Participant flow diagram.
back end for data analysis. Parents or other users utilize the app to induce Children were recruited at multiple clinics to develop and comprehensively test
children to gaze at the smartphone, allowing the app to record their phenotypic the AIS system in stage 1 and stage 2. Children were recruited online to perform
states as video data. Then, the phenotypic videos are sent to a quality control an at-home validation by untrained parents or caregivers in stage 3. I, input video;
module to discard low-quality frames. After automatic quality checking, multiple O, clip-level model outputs; P, key point coordinates; Q, qualified clips; Sface, facial
sets of consecutive qualified frames are extracted from the original video as clips, regions of the clips; FC, fully connected.
and the child’s facial regions are cropped from the clips to serve as candidate

Nature Medicine | Volume 29 | February 2023 | 493–503 495


Article https://doi.org/10.1038/s41591-022-02180-9

Table 1 | Summary of the qualified datasets used in this study

Dataset A Dataset B Dataset C Dataset D Dataset E Dataset F


(n = 2,344 children, (n = 187 children, (n = 361 children, (n = 298 children, (n = 32 children, (n = 88 children,
2,344 videos) 374 videos) 361 videos) 298 videos) 32 videos) 88 videos)

Sources ZOC clinic ZOC clinic ZOC clinic Clinics of multiple At-home At-home
hospitals environment environment
Usage of dataset Model Retest analysis Across-smartphone External validation Model fine-tuning At-home
development and analysis validation
reliability analyses
Images, n 15,751,680 2,513,280 2,425,920 2,002,560 215,040 591,360
Visual conditions, n (%)
Nonimpairment 871 (37.2%) 102 (54.5%) 87 (24.1%) 188 (63.1%) 10 (31.3%) 31 (35.2%)
Mild impairment 861 (36.7%) 52 (27.8%) 169 (46.8%) 67 (22.5%) 14 (43.8%) 31 (35.2%)
Severe impairment 612 (26.1%) 33 (17.6%) 105 (29.1%) 43 (14.4%) 8 (25.0%) 26 (29.5%)
Age of months (mean ± s.d.) 25.2 ± 11.7 28.7 ± 10.9 25.7 ± 10.8 28.0 ± 13.0 29.5 ± 10.1 30.0 ± 10.9
Sex, n (%)
Boys 1,265 (54.0%) 107 (57.2%) 202 (56.0%) 169 (56.7%) 16 (50.0%) 50 (56.8%)
Girls 1,079 (46.0%) 80 (42.8%) 159 (44.0%) 129 (43.3%) 16 (50.0%) 38 (43.2%)
Room illumination (mean ± s.d.) 289.5 ± 130.1 280.5 ± 122.5* 334.0 ± 117.5 N/A N/A N/A
Testing distance, n (%)
Short 195 (8.3%) 45 (12.0)* 34 (9.4%) 61 (20.5%) 6 (18.8%) 19 (21.6%)
Medium 1,738 (74.2%) 291 (77.8%)* 279 (77.3%) 125 (42.0%) 9 (28.1%) 51 (58.0%)
Long 411 (17.5%) 38 (10.2%)* 48 (13.3%) 112 (37.6%) 17 (53.1%) 18 (20.5%)
Laterality of the eye disorder, n (%)
Bilateral 995 (67.6%) 49 (57.7%) 209 (76.3%) 77 (70.0%) 8 (36.4%) 29 (50.9%)
Unilateral 478 (32.5%) 36 (42.4%) 65 (23.7%) 33 (30.0%) 14 (63.6%) 28 (49.1%)
Smartphones used iPhone-7/iPhone-8 iPhone-7/iPhone-8 Huawei Honor-6 Plus/ iPhone-7/iPhone-8 Parents’ own smartphones
Redmi Note-7 (no restriction)
*Metrics calculated in the unit of video. Except for the asterisk-marked metrics in dataset B, metrics were calculated in the unit of child. ZOC, Zhongshan Ophthalmic Center; N/A, not
applicable.

detection model achieved an AUC of 0.940 (95% CI, 0.920–0.959), an classification of AIS to detect common causes of visual impairment
accuracy of 86.5% (95% CI, 83.4%–89.0%), a sensitivity of 84.1% (95% CI, in young children.
80.2%–87.4%) and a specificity of 91.9% (95% CI, 86.9%–95.1%) in the Additionally, the performance of AIS in discriminating mild or
internal validation (Fig. 2b and Supplementary Table 3). It achieved a severe impairment from nonimpairment was assessed at the child level
child-level AUC of 0.843 (95% CI, 0.794–0.893), an accuracy of 82.6% (Fig. 2g–j and Supplementary Table 3). Significantly lower predicted
(95% CI, 77.8%–86.4%), a sensitivity of 80.9% (95% CI, 72.6%–87.2%) and probabilities of AIS were obtained for the nonimpaired group than for
a specificity of 83.5% (95% CI, 77.6%–88.1%) in the external validation the mild or severe impairment groups. For discriminating mild impair-
(Fig. 2c and Supplementary Table 3). ment from nonimpairment, an AUC of 0.936 (95% CI, 0.912–0.960) and
Furthermore, we investigated whether our system could identify an AUC of 0.833 (95% CI, 0.774–0.892) were obtained for the internal
visual impairment with any of 16 common ophthalmic disorders at validation and the external validation, respectively. For discriminating
the child level (Table 2 and Supplementary Table 4). For different oph- severe impairment from nonimpairment, an AUC of 0.944 (95% CI,
thalmic disorders, the predicted probabilities of the detection model 0.919–0.969) and an AUC of 0.859 (95% CI, 0.779–0.939) were obtained
were all significantly higher than those for nonimpairment (Fig. 2d). for the internal validation and the external validation, respectively.
AIS achieved AUCs of over 0.800 in 15 of 16 binary classification tasks To further evaluate the performance of AIS when applied to
to distinguish visual impairment with various causes from nonimpair- a population with a rare-case prevalence of visual impairment, we
ment (Fig. 2e,f and Supplementary Table 5), except for limbal dermoid conducted a ‘finding a needle in a haystack’ test based on the inter-
with an AUC of 0.747 (95% CI, 0.646–0.849). Even for diseases not nal validation dataset, with the simulated prevalences ranging from
present in the training set, our system showed effective discriminative 0.1% to 9%. AIS successfully identified visually impaired children at
capabilities, revealing wider extendibility and generalizability to other different simulated prevalences, with AUCs stabilized around 0.940
conditions (Fig. 2f). In addition, we initially recruited children with (Supplementary Table 7).
aphakia (including iatrogenic aphakia cases with common features
of visual impairment, accounting for 10.2% of the visually impaired Performance of the detection model in at-home settings with
participants enrolled) to increase diversity of training samples for the untrained parents or caregivers
robustness of the system. Therefore, to evaluate the performance of After validation in real clinical settings, we further implemented
AIS in the natural population without iatrogenic cases or cases with a more challenging application in at-home settings by parents or
medical interventions, the children with aphakia were removed from caregivers using their smartphones according to the system’s instructions
the validation datasets for further analysis and AIS remained reli- (Fig. 1b). Of the 125 children recruited online from the Guangdong
able (Supplementary Table 6). These results indicate the advanced area, 122 children (97.6%) successfully completed qualified video

Nature Medicine | Volume 29 | February 2023 | 493–503 496


Article https://doi.org/10.1038/s41591-022-02180-9

a d e
1.00
1.00
NI (n = 173)
Predicted probability

NI
VI NA (n = 32) AA CG
* PA RB

Sensitivity
SA (n = 48) *
0.50 0.50 OF CP
LD (n = 34) * CC SA
CP (n = 20) * HA NA
RB (n = 14) * PFV
0 MO (n = 19) * 0
0 10 20 30 40 CG (n = 18) * 0 0.50 1.00
PFV (n = 13) 1 – Specificity
Video stream (clip) *
HA (n = 12) *
b c f
PM (n = 19) * 1.00
1.00 1.00
CC (n = 69) *
OF (n = 10) *

Sensitivity
SSOM (n = 11) * Other
Sensitivity
Sensitivity

PA (n = 14) * 0.50 SSOM


0.50 0.50
AA (n = 28) PM
*
AUC: 0.940 AUC: 0.843 Other (n = 35) MO
(0.920–0.959) (0.794–0.893)
*
LD
0 0.50 1.00
0
0 0 Predicted probability 0 0.50 1.00
0 0.50 1.00 0 0.50 1.00
1 – Specificity
1 – Specificity 1 – Specificity

g Mild versus NI: P = 1.96 × 10–49 h i Mild versus NI: P = 5.75 × 10–16 j
Severe versus NI: P = 2.88 × 10–47 Severe versus NI: P = 2.13 × 10–13
1.00 1.00
Predicted probability

# #
Predicted probability

#
#
1.00 1.00
Sensitivity

Sensitivity
0.50 AUC 0.50 AUC
0.50 Mild versus NI 0.50 Mild versus NI
0.936 (0.912–0.960) 0.833 (0.774–0.892)
Severe versus NI Severe versus NI
0.944 (0.919–0.969) 0.859 (0.779–0.939)
0 0
0 0
NI Mild Severe 0 0.50 1.00 NI Mild Severe 0 0.50 1.00
n = 173 n = 215 n = 181 1 – Specificity n = 188 n = 67 n = 43 1 – Specificity

k Mild versus NI: P = 2.85 × 10–6 l m 1.00 0.50 0 n


Severe versus NI: P = 1.42 × 10–6 1.00 1.00
Percentage of each row
Predicted probability

# AUC SA 8 1 0 1 29
1.00 #
Sensitivity

NI 5 Sensitivity AA
VI versus NI 0 2 1 166
0.50 0.859 (0.767–0.950) 0.50 CG
Mild versus NI
CP 0 0 19 2 0
NI
Prediced

0.50
0.846 (0.743–0.949) CG 0 15 0 1 0 CP
Severe versus NI
0.873 (0.783–0.964) AA 20 0 0 3 14 SA
0
0 0
NI Mild Severe 0 0.50 1.00 AA CG CP NI SA 0 0.50 1.00
n = 31 n = 31 n = 26 1 – Specificity Acutal 1 – Specificity

Fig. 2 | Performance of the AIS system in clinical and at-home settings. the at-home implementation (k). Results are expressed as mean ± s.d. #P < 0.001,
a, Typical predicted probability patterns of the detection model. b,c, Receiver two-tailed Mann–Whitney U-tests. h,j, ROC curves of the detection model for
operating characteristic (ROC) curves of the detection model for distinguishing distinguishing mildly or severely impaired children from nonimpaired children
visually impaired children from nonimpaired children in the internal validation in the internal validation set (h) and in the external validation set (j). l, ROC curves
set (b) and in the external validation set (c). Center lines show ROC curves and of the detection model for distinguishing impaired, mildly impaired or severely
shaded areas show 95% CIs. d, The predicted probabilities of children with impaired children from nonimpaired children in the at-home implementation.
the indicated ophthalmic disorders and nonimpaired children in the internal m, The confusion matrix of the diagnostic model. n, ROC curves of the diagnostic
validation set. Results are expressed as mean ± s.d. *P < 0.001 (ranging from 4.83 × model for discriminating each category of ophthalmic disorder from the other
10−27 for congenital cataract (CC) to 2.40 × 10−5 for high ametropia (HA) compared categories (aphakia (AA), AUC = 0.947 (0.918–0.976); congenital glaucoma (CG),
with nonimpairment (NI), two-tailed Mann–Whitney U-tests). e,f, ROC curves AUC = 0.968 (0.923–1.000); NI, AUC = 0.976 (0.959–0.993); CP, AUC = 0.996
of the detection model for distinguishing nonimpaired children from children (0.989–1.000); strabismus (SA), AUC = 0.918 (0.875–0.961)). 95% DeLong CIs are
with the indicated ophthalmic disorders that overlap (e) or did not overlap (f) shown for AUC values. MO, microphthalmia; NA, nystagmus; OF, other fundus
with those in the training set (AUCs range from 0.747 for limbal dermoid (LD) diseases; PA, Peters’ anomaly; PFV, persistent fetal vasculature; PM, pupillary
to 0.989 for congenital ptosis (CP)). g,i,k, The predicted probabilities of the membrane; RB, retinoblastoma; SSOM, systemic syndromes with ocular
detection model for the nonimpaired, mildly impaired and severely impaired manifestations; VI, visual impairment.
groups in the internal validation set (g), in the external validation set (i) and in

collection, among whom 120 children undergoing ophthalmic exami­ for the home environments compared with the clinics, we fine-tuned
nations were enrolled. Other detailed information on the qualified data the detection model using qualified videos from 32 children and then
is summarized in Table 1. Given the great difference in data distributions tested it by the subsequently collected validation set from another

Nature Medicine | Volume 29 | February 2023 | 493–503 497


Article https://doi.org/10.1038/s41591-022-02180-9

Table 2 | Summary of the ophthalmic conditions of participants in this study

Dataset A (n = 2,344 Dataset B (n = 187 Dataset C (n = 361 Dataset D (n = 298 Dataset E (n = 32 Dataset F (n = 88
children) children) children) children) children) children)

Ophthalmic conditions, n (%)


Nonimpairment 871 (37.2%) 102 (54.5%) 87 (24.1%) 188 (63.1%) 10 (31.3%) 31 (35.2%)
Aphakia 153 (6.5%) 7 (3.7%) 44 (12.2%) 4 (1.3%) 6 (18.8%) 6 (10.5%)
Congenital cataract 348 (14.8%) 7 (3.7%) 133 (36.8%) 28 (9.4%) 10 (31.3%) 30 (52.6%)
Congenital glaucoma 95 (4.1%) 5 (2.7%) 0 2 (0.7%) 0 1 (1.8%)
High ametropia 69 (2.9%) 7 (3.7%) 2 (0.6%) 10 (3.4%) 2 (6.3%) 3 (5.3%)
Peters’ anomaly 39 (1.7%) 4 (2.1%) 0 0 0 1 (1.8%)
Nystagmus 174 (7.4%) 7 (3.7%) 39 (10.8%) 21 (7.0%) 1 (3.1%) 4 (7.0%)
PFV 36 (1.5%) 2 (1.1%) 6 (1.7%) 0 1 (3.1%) 3 (5.3%)
Other fundus diseases 54 (2.3%) 4 (2.1%) 2 (0.6%) 7 (2.3%) 0 0
Congenital ptosis 101 (4.3%) 10 (5.3%) 0 2 (0.7%) 0 0
Retinoblastoma 41 (1.7%) 6 (3.2%) 2 (0.6%) 3 (1.0%) 0 2 (3.5%)
Strabismus 245 (10.5%) 15 (8.0%) 35 (9.7%) 28 (9.4%) 1 (3.1%) 6 (10.5%)
Limbal dermoid 34 (1.5%) 3 (1.6%) 0 0 0 0
Microphthalmia 19 (0.8%) 2 (1.1%) 1 (0.3%) 3 (1.0%) 0 0
Pupillary membranes 19 (0.8%) 2 (1.1%) 7 (1.9%) 1 (0.3%) 1 (3.1%) 1 (1.8%)
SSOM 11 (0.5%) 0 0 1 (0.3%) 0 0
Other 35 (1.5%) 4 (2.1%) 3 (0.8%) 0 0 0
PFV, persistent fetal vasculature; SSOM, systemic syndromes with ocular manifestations.

88 children. On the validation set, 31 (35.2%) children were classified the correctly recognized clips (Extended Data Fig. 5). Moreover, for the
as non­impaired and 57 (64.8%) children were classified as visually 20% of samples with the lowest predicted confidence values, the false
impaired. AIS achieved effective performance in the at-home imple- identification rate was significantly higher than that of other groups
mentation, with an AUC of 0.817 (95% CI, 0.756–0.881) for discrimi- and the system was equivocal. We aimed to find a solution when the
nating clips of visually impaired children from those of nonimpaired system was unreliable by filtering out equivocal samples for manual
children (Extended Data Fig. 4c). At the child level, significantly lower review by ophthalmologists. The results show that the system perfor-
predicted probability patterns were obtained for the nonimpaired mance was substantially improved with the increasing ratios for manual
children compared with mildly or severely impaired children (Fig. 2k). review. For instance, when selecting cases with confidence values less
An AUC of 0.859 (95% CI, 0.767–0.950), an accuracy of 77.3% (95% CI, than 0.071 for manual review, accounting for 3% of the total cases, the
67.5%–84.8%), a sensitivity of 77.2% (95% CI, 64.8%–86.2%) and a speci- sensitivity improved from 84.1% to 85.1% and the specificity improved
ficity of 77.4% (95% CI, 60.2%–88.6%) were attained for discriminating from 91.9% to 93.1%; when selecting cases with confidence values less
visual impairment from nonimpairment (Fig. 2l and Supplementary than 0.193 for manual review, accounting for 7% of the total cases, the
Table 3). sensitivity and specificity improved to 85.4% and 94.2%, respectively
(Extended Data Fig. 6).
Model visualization and explanation
We improved the interpretability of the detection model outputs Multiple-category classification of ophthalmic disorders
by visualizing the model results in the internal validation set. After Considering that our system exhibited different attention patterns for
being projected into a two-dimensional space, the feature information visual impairment caused by specific ophthalmic disorders (Fig. 3c),
extracted by the detection model exhibited distinct patterns between we further developed a DL-based diagnostic model to differentiate
the visually impaired and nonimpaired clips (Fig. 3a). The attention ophthalmic disorders with characteristic attention patterns by the
patterns of the detection model presented by the average heat maps detection model (aphakia, congenital glaucoma, congenital ptosis and
varied with the children’s visual functions and underlying ophthal- strabismus) and nonimpairment at the child level. In the diagnostic
mic disorders (Fig. 3b,c). Among the visually impaired children, the validation, our system effectively discriminated multiple ophthalmic
detection model focused more on the eyes and areas around the neck disorders, achieving AUCs ranging from 0.918 for strabismus to
(Fig. 3c). In particular, for the clips extracted from visually impaired 0.996 for congenital ptosis (Fig. 2m,n).
samples, those classified by human experts as having abnormal patterns
were more likely to be predicted by our system as ‘visual impairment’ Reliability and adjusted analyses
than those that were randomly extracted (Fig. 3d,e and Supplementary Stable performance is critical for real-world applications of mHealth
Table 8), indicating that the detection model might pay more attention and medical AI systems. Thus, we investigated the reliability of AIS at the
to the morphological appearance or behavioral patterns of the eye and clinic of ZOC. We first evaluated the influences of patient-related fac-
head regions, as we previously reported16. tors, including sex, age, laterality of the eye disorder and the apparency
Additionally, the clips misidentified by the system exhibited of the phenotypic features, on the performance of AIS. For the reliability
different clustering characteristics from the correctly recognized clips stratified by sex, AIS achieved an AUC of 0.948 (95% CI, 0.921–0.971) in
(true visually impaired or true nonimpaired clips), and more of the the boys group and an AUC of 0.931 (95% CI, 0.899–0.961) in the girls
misidentified clips fell in the intermediate zone of the two clusters for group (Fig. 4a). The predicted probability pattern of AIS remained

Nature Medicine | Volume 29 | February 2023 | 493–503 498


Article https://doi.org/10.1038/s41591-022-02180-9

a VI NI c

40

20
NI HA LD OF
t-SNE2

−20

RB PFV PA SSOM

−40 −20 0 20 40
t-SNE1
b

CP MO PM AA

CC CG NA SA
d e
Clip with abnormal behavior
Motionless fixation *
Random VI clip
NI clip
Squinting
NI video

Nystagmus *

Head position *
VI video

Suspected strabismus *

Random VI

NI

Suspected Nystagmus Squinting Head position Motionless 0 0.50 1.00


strabismus fixation
Clip predicted probability

Fig. 3 | Interpretability and visualization of the detection model. a, The types of clips in (d) were compared: motionless fixation, n = 48; squinting,
t-SNE algorithm was applied to visualize the detection model at the clip level. n = 18; nystagmus (NA), n = 95; head position, n = 115; suspected strabismus
b, Facial detection and facial landmark localization algorithms were applied to (SA), n = 360; random visual impairment (VI), n = 1,000; nonimpairment (NI),
detect and crop the facial regions of the children before data served as inputs n = 1,000. Results are expressed as mean ± s.d. *P < 0.01 for comparisons with
to the AIS system. c, Average heat maps obtained from the detection model random VI (motionless fixation, P = 4.60 × 10−8; suspected SA, P = 1.09 × 10−15;
based on the inputs of facial regions in (b) for nonimpaired children and for NA, P = 1.52 × 10−7; head position, P = 0.005; two-tailed Mann–Whitney U-tests).
children with the indicated ophthalmic disorders. d, The predicted probabilities AA, aphakia; CC, congenital cataract; CG, congenital glaucoma; CP, congenital
for various types of clips were compared: clips randomly extracted from the ptosis; HA, high ametropia; LD, limbal dermoid; MO, microphthalmia; OF, other
videos of nonimpaired children, clips randomly extracted from the videos of fundus diseases; PA, Peters’ anomaly; PFV, persistent fetal vasculature;
visually impaired children and clips labeled by experienced ophthalmologists PM, pupillary membrane; RB, retinoblastoma; SSOM, systemic syndromes
as having abnormal behavioral patterns extracted from videos of visually with ocular manifestations.
impaired children. e, Predicted probabilities of the detection model for various

stable under various age conditions (Fig. 4b), and the system achieved who could have insidious phenotypic features and were easily neglected
AUCs ranging from 0.909 for age group 4 to 0.954 for age group 3 by community ophthalmologists (Supplementary Table 9).
(Fig. 4c). Additionally, AIS effectively identified visually impaired Furthermore, we investigated the reliability of AIS under different
children with bilateral or unilateral eye disorders, with an AUC of data capture conditions, including testing distance, room illuminance,
0.921 (95% CI, 0.891–0.952) in the unilateral group and an AUC of 0.952 repeated testing and duration of the video recording. Similarly, AIS
(95% CI, 0.932–0.973) in the bilateral group (Fig. 4d). In addition, AIS obtained stable detection performance among groups of different
achieved satisfactory performance with an AUC of 0.939 (95% CI, testing distances, with the lowest AUC of 0.935 (95% CI, 0.912–0.958)
0.918–0.960) in identifying hard-to-spot visually impaired children, in the medium-distance group (Fig. 4e). Additionally, the AIS predicted

Nature Medicine | Volume 29 | February 2023 | 493–503 499


Article https://doi.org/10.1038/s41591-022-02180-9

a Girl b c
Boy
NI 1.00

1.00
VI
Metric

0.50

Sensitivity
0 0.50 0.50 Age group 1

Dispersion
AUC ACC SEN SPE Age group 2
0
Age group 3
d Unilateral
Bilateral −0.50 Age group 4

1.00 0
0 10 20 30 40 50 0 0.50 1.00
Metric

Age 1 – Specificity
0.50

AUC ACC SEN SPE


f g 1.00
NI
VI
Long distance
e
Medium distance Illuminance
Short distance group 1

Sensitivity
0.50
0.50
Dispersion

1.00 Illuminance
0 group 2
Metric

0.50 Illuminance
−0.50 group 3
0
0
AUC ACC SEN SPE 0 200 400 600 0 0.50 1.00
Room illuminance 1 – Specificity
NI VI
h 1.00
i
Retest predicted probability

0.9
ACC

AUC
Metric

0.50
SEN

0.8 SPE

0 0.50 1.00 0 0.50 1.00 0 30 60 90 120 150 180

Test predicted probability Video duration

Fig. 4 | Performance of the AIS system in reliability analyses. a, Performance distance, n = 432; short distance, n = 90. f, Scatterplot of dispersion of the AIS
of AIS in detecting children with visual impairment (VI) based on sex: girls, predicted probability changes by room illuminance (in lux (lx)). g, ROC curves
n = 254; boys, n = 315. b, Scatterplot of dispersion of the AIS predicted probability of AIS for distinguishing children with VI under various room illuminance
changes by age (months). c, Receiver operating characteristic (ROC) curves of conditions: illuminance group 1, room illuminance ≤ 200 lx, n = 125, AUC = 0.936
AIS for detecting children with VI by age groups: age group 1, age ≤ 12 months, (0.895–0.976); illuminance group 2, 200 lx < room illuminance ≤ 400 lx, n = 317,
n = 98, AUC = 0.925 (0.847–1.000); age group 2, 12 months < age ≤ 24 months, AUC = 0.932 (0.901–0.963); illuminance group 3, room illuminance > 400 lx,
n = 160, AUC = 0.936 (0.895–0.977); age group 3, 24 months < age ≤ 36 months, n = 127, AUC = 0.950 (0.915–0.985). h, Predicted probabilities of the detection
n = 189, AUC = 0.954 (0.928–0.980); age group 4, 36 months < age ≤ 48 months, model for repeated detection tests (NI, n = 102; VI, n = 85). i, Performance curves
n = 122, AUC = 0.909 (0.855–0.964). d, Performance of AIS for identifying of AIS by video duration. In a,d,e, results are expressed as means and 95% CIs with
children with unilateral or bilateral VI: unilateral, n = 158; bilateral, n = 238; DeLong CIs for AUC values and 95% Wilson CIs for other metrics. ACC, accuracy;
nonimpairment (NI), n = 173. e, Performance of AIS for detecting children with SEN, sensitivity; SPE, specificity.
VI under various testing distance conditions: long distance, n = 47; medium

probability pattern remained stable under different room illumi- a maximal AUC of 0.931 (95% CI, 0.914–0.956) with a video duration
nance conditions (Fig. 4f). Our system achieved the lowest AUC of longer than 30 s (Fig. 4i).
0.932 (95% CI, 0.901–0.963) in the medium illuminance group To further verify that the detecting results of our system were reli-
(Fig. 4g). In the retest analysis, the system remained robust with an able and not solely mediated by baseline characteristics as confound-
intraclass correlation coefficient for predicted probabilities of 0.880 ers, we examined the odds ratios (ORs) of the AIS predictions adjusted
(95% CI, 0.843–0.908) and a Cohen’s κ for predicted categories of 0.837 for baseline characteristics at the child level. Even after controlling for
(95% CI, 0.758–0.916) in another independent validation population potential baseline confounders, the AIS predictions had statistically
recruited at ZOC (Fig. 4h and Table 1). In addition, as the duration significant adjusted ORs for detecting visual impairment in the internal
of the video recording increased, AIS remained stable and achieved and external validations and the at-home implementation (P < 0.001).

Nature Medicine | Volume 29 | February 2023 | 493–503 500


Article https://doi.org/10.1038/s41591-022-02180-9

The adjusted ORs ranged from 3.034 to 3.248 for tasks in the internal To apply AIS to various scenarios, we recruited cases of a broad range of
validation (Supplementary Table 10) and from 2.307 to 2.761 for tasks eye disorders with variable severity in terms of their impact on vision.
in the external validation (Supplementary Table 11). For the at-home Our system was reasonably accurate in identifying mildly impaired
implementation, the AIS predictions had a statistically significant children who could have subtle phenotypic features, making them easy
adjusted OR of 2.496 (95% CI, 1.748–3.565, P = 4.815 × 10−7) for detecting to miss. Furthermore, our results indicate that AIS can be extended to
visual impairment (Supplementary Table 12). diseases that have not been previously encountered in the training
process, demonstrating its broader applicability.
Performance of the AIS across different smartphone The use of smartphones to detect visual impairment caused by
platforms extraocular diseases or systematic diseases is an important application
To test the stability of our system in more complex settings, we per- in the future, but the feasibility remained to be further verified. Some
formed adjustments to a dataset randomly sampled from the ZOC systemic diseases, such as cardiovascular, hepatobiliary and renal
validation set with various blurring, brightness, color or Gaussian noise diseases, can exhibit ocular manifestations that are recognizable by
adjustment gradients to simulate the diversity of data quality collected algorithms, which is also indicated by our findings in small samples30–32.
by different smartphone cameras. Our system remained reliable and Furthermore, disorders of neurological system can impact vision and
achieved AUCs of over 0.800 with blurring factors no more than 25 cause cerebral visual impairment with pathology outside the eye,
or brightness factors no more than 0.7, and it achieved AUCs of over which is a common type of visual impairment in developed countries
0.930 under different color adjustments and over 0.820 under various but lacking in this study33,34. Therefore, future work is needed to evalu-
Gaussian noise adjustments (Extended Data Fig. 7). ate the merit of AIS in detecting visual impairment caused by a broad
Furthermore, an independent validation set from 389 children range of diseases, such as cerebral visual impairment, and in reducing
was collected at ZOC using the Huawei Honor-6 Plus and Redmi Note-7 the extraocular morbidity associated with systemic diseases in a larger
smartphones with the Android operation system to evaluate the per- population: for example, the cardiovascular complications linked with
formance of AIS (Fig. 1b and Supplementary Table 13). After data qual- Marfan syndrome.
ity checking, videos of 361 children were reserved (92.8%), including A major strength of AIS is its reliability in real-world practice.
87 (24.1%) children without visual impairment, 169 (46.8%) children Although a large number of medical AI systems have been evaluated
with mild visual impairment and 105 (29.1%) children with severe with high performance in the laboratory setting, only a few systems
visual impairment (Table 1). AIS showed significantly higher predicted have demonstrated real-world medical feasibility23,25. Bias from training
probabilities for mild or severe impairment than for nonimpairment data and low stability of the model design greatly limit the generaliz-
and achieved an AUC of 0.932 (95% CI, 0.902–0.963) for identifying ability of these AI systems. Previously, we evaluated the feasibility of
visual impairment for the Android system at the child level (Extended identifying visual impairment in children by analyzing their phenotypic
Data Fig. 8). characteristics using DL algorithms16. For that study, the evaluation
was conducted by experienced experts under a tightly controlled,
Discussion standardized laboratory setting to strictly control for interference
With the high incidence of visual problems during the first few years factors, which is not possible in routine ophthalmic practice. In this
of life, timely intervention to counter pathological visual deprivation study, we prospectively collected a large amount of phenotypic data
mechanisms during this critical development period can prevent or (facial features and ocular movements) to develop a DL system with a
minimize long-term visual loss3. However, early detection of visual highly reliable design. Our results show that AIS exhibited high stabil-
impairment in young children is challenging due to the lack of accurate ity and prediction effectiveness under various testing conditions.
and easy-to-use tools applicable to both clinical and community envi- Importantly, AIS remained effective in multicenter external validation
ronments. To overcome these challenges, we developed and validated and crucially, when rolled out in the community and used by parents
a smartphone-based system (AIS) that provides a holistic and quan- or caregivers at home. When transferred to at-home settings, fac-
titative technique to identify visual impairment in young children in tors such as environmental interference, blurring, brightness, pixels
real-world settings. We comprehensively evaluated this system for 16 of different cameras and the influence of untrained operators may
important causes of childhood vision loss. Our system achieved an AUC impact the system’s performance. Therefore, we used a pilot dataset
of 0.940 in the internal validation and an AUC of 0.843 in the external to fine-tune our system for its generalizability to various home envi-
validation at the clinics of four different hospitals. Furthermore, our ronments and broader applications. AIS achieved an acceptable AUC
system proved reliable when used by parents or caregivers at home, of 0.859 in the subsequent implementation, which indicates that it
achieving an AUC of 0.859 under these specific testing conditions. can benefit from further model updating on larger-scale datasets for
One of the merits of AIS is in its applicability to different ocular broader applications. Importantly, AIS kept stable in 88 different types
diseases. Previous studies have utilized photographs to detect ocular of home environments after one round of fine-tuning, demonstrating
and visual abnormalities in childhood27,28. These technologies, which its potential to be used generally in a variety of complex environments
focus on a single static image, are not suitable for large-scale applica- with no requirement of regular adaptations or fine-tuning in the future
tions due to their limited effectiveness and inability to handle multiple application.
abnormalities with variable patterns. Given the complexity of ocular Our findings demonstrate that sensory states, especially
pathologies in children, the concept of accurately assessing a broad vision, can be derived from phenotypic video data recorded using
range of ocular conditions is attractive. In our prospective multicenter consumer-grade smartphones. Two types of underlying features
study, we analyzed more than 25,000,000 frames of information-rich seemed to be captured by smartphones. First, changes in facial appear-
phenotypic videos and accurately identified visual impairment caused ance caused by ocular pathologies can be directly recorded by mobile
by a wide range of sight-threatening eye diseases. Strikingly, AIS was devices, especially those of the ocular surface or adnexa: for example,
able to detect most of the common causes of visual impairment in eyelid drooping in congenital ptosis. Second and more importantly,
childhood, including anterior and posterior segment disorders, stra- individuals may display aberrant behaviors to adapt to changes in
bismus, ocular neoplasms, developmental abnormalities and ocular their sensory modality, a process conserved from arthropods to
manifestations of systemic and genetic diseases29. Although cases like mammals35,36 and confirmed in human children16. Our results show
congenital cataracts tend to be easily diagnosed in specialist settings that the model can focus on behavioral features replicated in various
by experienced doctors, they are still frequently missed in the commu- eye diseases, such as abnormal ocular movement or alignment/fixa-
nity, especially in areas with pediatric ophthalmic resource shortfall28. tion patterns. These common behavioral patterns may broaden the

Nature Medicine | Volume 29 | February 2023 | 493–503 501


Article https://doi.org/10.1038/s41591-022-02180-9

applicability of AIS to multiple ocular diseases, including posterior References


segment abnormalities that are more challenging to diagnose based 1. Kliner, M., Fell, G., Pilling, R. & Bradbury, J. Visual impairment in
on phenotypic video data. children. Eye 25, 1097–1097 (2011).
A smartphone-based system to detect ocular pathology in children 2. Mariotti, A. & Pascolini, D. Global estimates of visual impairment.
has obvious clinical implications. Early identification by parents or Br. J. Ophthalmol. 96, 614–618 (2012).
caregivers of ocular abnormalities facilitates timely referral to pedi- 3. Bremond-Gignac, D., Copin, H., Lapillonne, A. & Milazzo, S.
atric ophthalmologists and prompt intervention. AIS does not require Visual development in infants: physiological and pathological
professional medical equipment; smartphones and simple stabilization mechanisms. Curr. Opin. Ophthalmol. 22, S1–S8 (2011).
are sufficient. This low-barrier system is a promising tool for the timely 4. Teoh, L., Solebo, A. & Rahi, J. Temporal trends in the epidemiology
testing of children in the community, which is a major advantage given of childhood severe visual impairment and blindness in the UK.
the rapidly changing nature of the ocular pathology encountered in Br. J. Ophthalmol. https://doi.org/10.1136/bjophthalmol-2021-320119
children. This could have a major impact by improving vision-related (2021).
outcomes and even survival rates in cases such as retinoblastoma37,38. 5. Gothwal, V. K., Lovie-Kitchin, J. E. & Nutheti, R. The development
Furthermore, AIS is a promising tool to screen young children for of the LV Prasad-Functional Vision Questionnaire: a measure
ocular abnormalities remotely, which can reduce ophthalmologists’ of functional vision performance of visually impaired children.
exposure risk to infectious agents, as exemplified by the impact of Investigative Ophthalmol. Vis. Sci. 44, 4131–4139 (2003).
the coronavirus disease 2019 (COVID-19) pandemic, in the so-called 6. Brown, A. M. & Yamamoto, M. Visual acuity in newborn and
‘new normal’ period39. preterm infants measured with grating acuity cards. Am. J.
This study has several limitations. First, although we may miss Ophthalmol. 102, 245–253 (1986).
the recruitment of some patients with conditions causing slight visual 7. Dutton, G. N. & Blaikie, A. J. How to assess eyes and vision in
impairment in specialist clinical settings, our system was satisfactorily infants and preschool children. BMJ Br. Med. J. 350, h1716 (2015).
accurate in identifying mildly impaired children with subtle phenotypic 8. Blindness and Vision Impairment (World Health Organization,
features. Importantly, the versatile AIS system kept reliable perfor- 2021); https://www.who.int/en/news-room/fact-sheets/detail/
mance to detect visually impaired children who were hard to spot even blindness-and-visual-impairment
for community ophthalmologists, which sheds light on its significant 9. Mayer, D. L. & Dobson, V. in Developing Brain Behaviour
application prospect of expanding our future work to the general popu- (ed. Dobbing, J.) 253–292 (Academic, 1997).
lation and groups of children with mild or early-stage ocular pathol- 10. Quinn, G. E., Berlin, J. A. & James, M. The Teller acuity card
ogy. Second, to develop the quality control module and analyze the procedure: three testers in a clinical setting. Ophthalmology 100,
influencing factors, only a single video was collected for each child at 488–494 (1993).
ZOC, accounting for the relatively high rate of unsuccessful cases in this 11. Johnson, A., Stayte, M. & Wortham, C. Vision screening at 8
stage. However, our system allowed users to repeat video recordings and 18 months. Steering Committee of Oxford Region Child
until the qualified videos were acquired. As a result, the successful rate Development Project. Br. Med. J. 299, 545–549 (1989).
of identification greatly improved. Although a proportion of uncoop- 12. Long, E. et al. Monitoring and morphologic classification of
erative children may not be appropriate for our tool, our AIS system has pediatric cataract using slit-lamp-adapted photography.
greatly lowered the minimal operating threshold for untrained users, Transl. Vis. Sci. Technol. 6, 2 (2017).
indicating the potential for the general applications. Third, our cohorts 13. Balmer, A. & Munier, F. Differential diagnosis of leukocoria
recruited in clinical settings may not represent the real-world popula- and strabismus, first presenting signs of retinoblastoma.
tion. Although AIS effectively identified visually impaired children Clin. Ophthalmol. 1, 431 (2007).
in the finding a needle in a haystack test with a prevalence simulated 14. SooHoo, J. R., Davies, B. W., Allard, F. D. & Durairaj, V. D. Congenital
to a general population, a large-scale screening trial is needed in the ptosis. Surv. Ophthalmol. 59, 483–492 (2014).
future to validate the utility of the AIS system in the real-world applica- 15. Mandal, A. K. & Chakrabarti, D. Update on congenital glaucoma.
tions. Fourth, AIS requires collecting facial information from children, Indian J. Ophthalmol. 59, S148 (2011).
which may pose a risk of privacy exposure. To avoid potential privacy 16. Long, E. et al. Discrimination of the behavioural dynamics of
risks, future techniques such as lightweight model backbones40 and visually impaired infants via deep learning. Nat. Biomed. Eng. 3,
model pruning41 could be applied to deploy the DL system in individual 860–869 (2019).
smartphones with no requirement for additional computing resources. 17. Brown, A. M. & Lindsey, D. T. Infant color vision and color
In addition, digital fingerprint technology, such as blockchain42, can preferences: a tribute to Davida Teller. Vis. Neurosci. 30, 243–250
also be applied to monitor data usage and mitigate abuse effectively. (2013).
Additionally, we developed a real-time three-dimensional facial recon- 18. Holmes, J. M. & Clarke, M. P. Amblyopia. Lancet 367, 1343–1351
struction technology to irreversibly erase biometric attributes while (2006).
retaining gaze patterns and eye movements43, which can be used in the 19. Abadi, R. & Bjerre, A. Motor and sensory characteristics of
future to safeguard children’s privacy when using AIS. infantile nystagmus. Br. J. Ophthalmol. 86, 1152–1160 (2002).
In conclusion, we developed and validated an innovative 20. Wright, K. W., Spiegel, P. H. & Hengst, T. Pediatric Ophthalmology
smartphone-based technique to detect visual impairment in young and Strabismus (Springer, 2013).
children affected with a broad range of eye diseases. Given the ubiquity 21. Sim, I. Mobile devices and health. N. Engl. J. Med. 381, 956–968
of smartphones, AIS is a promising tool that can be applied in real-world (2019).
settings for secondary prevention of visual loss in this particularly 22. Grady, C. et al. Informed consent. N. Engl. J. Med. 376, 856–867
vulnerable age group. (2017).
23. Beede, E. et al. A human-centered evaluation of a deep learning
Online content system deployed in clinics for the detection of diabetic
Any methods, additional references, Nature Portfolio reporting sum- retinopathy. In Proc. 2020 CHI Conference on Human Factors in
maries, source data, extended data, supplementary information, Computing Systems 1–12 (Association for Computing Machinery,
acknowledgements, peer review information; details of author contri- 2020)..
butions and competing interests; and statements of data and code avail- 24. Davenport, T. H. & Ronanki, R. Artificial intelligence for the real
ability are available at https://doi.org/10.1038/s41591-022-02180-9. world. Harvard Bus. Rev. 96, 108–116 (2018).

Nature Medicine | Volume 29 | February 2023 | 493–503 502


Article https://doi.org/10.1038/s41591-022-02180-9

25. Lin, H. et al. Diagnostic efficacy and therapeutic decision-making 36. Klein, M. et al. Sensory determinants of behavioral dynamics in
capacity of an artificial intelligence platform for childhood Drosophila thermotaxis. Proc. Natl Acad. Sci. USA 112, E220–E229
cataracts in eye clinics: a multicentre randomized controlled trial. (2015).
eClinicalMedicine 9, 52–59 (2019). 37. Finger, P. T. & Tomar, A. S. Retinoblastoma outcomes: a global
26. King, D. E. Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. perspective. Lancet Glob. Health 10, e307–e308 (2022).
10, 1755–1758 (2009). 38. Wong, E. S. et al. Global retinoblastoma survival and globe
27. Munson, M. C. et al. Autonomous early detection of eye disease in preservation: a systematic review and meta-analysis of associ­
childhood photographs. Sci. Adv. 5, eaax6363 (2019). ations with socioeconomic and health-care factors. Lancet Glob.
28. Long, E. et al. An artificial intelligence platform for the Health 10, E380–E389 (2022).
multihospital collaborative management of congenital cataracts. 39. Romano, M. R. et al. Facing COVID-19 in ophthalmology
Nat. Biomed. Eng. 1, 0024 (2017). department. Curr. Eye Res. 45, 653–658 (2020).
29. Gogate, P., Gilbert, C. & Zin, A. Severe visual impairment and 40. Howard, A. et al. Searching for mobilenetv3. In Proc. IEEE/CVF
blindness in infants: causes and opportunities for control. International Conference on Computer Vision 1314–1324
Middle East Afr. J. Ophthalmol 18, 109–114 (2011). (IEEE, 2019).
30. Cheung, C. Y. et al. A deep-learning system for the assessment of 41. Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N. & Peste, A. Sparsity
cardiovascular disease risk via the measurement of retinal-vessel in deep learning: pruning and growth for efficient inference and
calibre. Nat. Biomed. Eng. 5, 498–508 (2021). training in neural networks. J. Mach. Learn. Res. 22, 1–124 (2021).
31. Sabanayagam, C. et al. A deep learning algorithm to 42. Leeming, G., Ainsworth, J. & Clifton, D. A. Blockchain in health
detect chronic kidney disease from retinal photographs in care: hype, trust, and digital health. Lancet 393, 2476–2477 (2019).
community-based populations. Lancet Digital Health 2, 43. Yang, Y. et al. A digital mask to safeguard patient privacy.
e295–e302 (2020). Nat. Med. 28, 1883–1892 (2022).
32. Xiao, W. et al. Screening and identifying hepatobiliary diseases
through deep learning using ocular images: a prospective, Publisher’s note Springer Nature remains neutral with regard to
multicentre study. Lancet Digital Health 3, e88–e97 (2021). jurisdictional claims in published maps and institutional affiliations.
33. Pehere, N., Chougule, P. & Dutton, G. N. Cerebral visual
impairment in children: causes and associated ophthalmological Springer Nature or its licensor (e.g. a society or other partner) holds
problems. Indian J. Ophthalmol. 66, 812–815 (2018). exclusive rights to this article under a publishing agreement with
34. Gilbert, C. & Foster, A. Childhood blindness in the context of the author(s) or other rightsholder(s); author self-archiving of the
VISION 2020—the right to sight. Bull. World Health Organ 79, accepted manuscript version of this article is solely governed by the
227–232 (2001). terms of such publishing agreement and applicable law.
35. Dey, S. et al. Cyclic regulation of sensory perception by
a female hormone alters behavior. Cell 161, 1334–1344 © The Author(s), under exclusive licence to Springer Nature America,
(2015). Inc. 2023

Wenben Chen1,24, Ruiyang Li1,24, Qinji Yu2,24, Andi Xu1,24, Yile Feng3,24, Ruixin Wang1, Lanqin Zhao 1, Zhenzhe Lin1,
Yahan Yang1, Duoru Lin1, Xiaohang Wu1, Jingjing Chen1, Zhenzhen Liu1, Yuxuan Wu1, Kang Dang3, Kexin Qiu3,
Zilong Wang 3, Ziheng Zhou3, Dong Liu1, Qianni Wu1, Mingyuan Li1, Yifan Xiang 1, Xiaoyan Li1, Zhuoling Lin1,
Danqi Zeng1, Yunjian Huang1, Silang Mo4, Xiucheng Huang4, Shulin Sun5, Jianmin Hu6, Jun Zhao7, Meirong Wei8,
Shoulong Hu9,10, Liang Chen11, Bingfa Dai6, Huasheng Yang1, Danping Huang1, Xiaoming Lin1, Lingyi Liang1, Xiaoyan Ding1,
Yangfan Yang1, Pengsen Wu1, Feihui Zheng12, Nick Stanojcic13, Ji-Peng Olivia Li 14, Carol Y. Cheung15, Erping Long 1,
Chuan Chen16, Yi Zhu17, Patrick Yu-Wai-Man 14,18,19,20, Ruixuan Wang21, Wei-shi Zheng 21, Xiaowei Ding 2,3 &
Haotian Lin 1,22,23

1
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of
Ophthalmology and Vision Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China. 2Institute of Image
Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China. 3VoxelCloud, Shanghai, China. 4School of Medicine,
Sun Yat-sen University, Shenzhen, China. 5Department of Urology, Peking University Third Hospital, Peking University Health Science Center, Beijing,
China. 6Department of Ophthalmology, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China. 7Shenzhen People’s Hospital
(The Second Clinical Medical College, Jinan University; The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen,
China. 8Liuzhou Maternity and Child Healthcare Hospital, Affiliated Women and Children’s Hospital of Guangxi University of Science and Technology,
Liuzhou, China. 9National Center for Children’s Health, Department of Ophthalmology, Beijing Children’s Hospital, Capital Medical University, Beijing,
China. 10Department of Ophthalmology, Zhengzhou Children’s Hospital, Zhengzhou, China. 11Shenzhen Eye Hospital, Jinan University, Shenzhen Eye
Institute, Shenzhen, China. 12Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore. 13Department of Ophthalmology,
St. Thomas’ Hospital, London, UK. 14Moorfields Eye Hospital, London, UK. 15Department of Ophthalmology & Visual Sciences, Faculty of Medicine, The
Chinese University of Hong Kong, Hong Kong, China. 16Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami,
FL, USA. 17Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, FL, USA. 18University College
London Institute of Ophthalmology, University College London, London, UK. 19Cambridge Eye Unit, Addenbrooke’s Hospital, Cambridge University
Hospitals, Cambridge, UK. 20Cambridge Center for Brain Repair and Medical Research Council (MRC) Mitochondrial Biology Unit, Department of Clinical
Neurosciences, University of Cambridge, Cambridge, UK. 21School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
22
Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China. 23Center for Precision
Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China. 24These
authors contributed equally: Wenben Chen, Ruiyang Li, Qinji Yu, Andi Xu, Yile Feng. e-mail: dingxiaowei@sjtu.edu.cn; linht5@mail.sysu.edu.cn

Nature Medicine | Volume 29 | February 2023 | 493–503 503


Article https://doi.org/10.1038/s41591-022-02180-9

Methods used to detect abnormalities in the eyes. Additional examinations,


Ethics approval such as intraocular pressure, ultrasound, computerized tomography
The predefined protocol of the clinical study was approved by the scans and genetic tests, were determined by experienced pediatric
Institutional Review Board/Ethics Committee of ZOC and prospectively ophthalmologists when necessary.
registered at ClinicalTrials.gov (identifier: NCT04237350), and it is According to the results of the abovementioned examinations and
shown in Supplementary Note. Consent was obtained from all individu- a referenced distribution of monocular visual acuity45, experienced
als whose eyes or faces are shown in the figures or video for publication. pediatric ophthalmologists comprehensively stratified children’s
Before data collection, informed written consent was obtained from at visual conditions into three groups. Children with the best-corrected
least one parent or guardian of each child. The investigators followed visual acuity (BCVA) of both eyes in the 95% referenced range with no
the requirements of the Declaration of Helsinki throughout the study. abnormalities of structure or other examination results were assigned
to the nonimpaired group. Children with the BCVA in the 99% refer-
Study design and study population enced range in both eyes with abnormalities of structure or other
This prospective, multicenter and observational study was conducted examination results were assigned to the mildly impaired group. Chil-
between 14 January 2020 and 30 January 2022 to recruit children for the dren with the BCVA of at least one eye outside the 99% referenced range
development and validation of the mHealth system in three stages (Fig. or worse than light perception with structural abnormalities or other
1b). Major eligibility criteria included an age of 48 months or younger examination results were assigned to the severely impaired group16.
and informed written consent obtained from at least one parent or We recruited visually impaired children with primary diagnoses of the
guardian of each child. We did not include children having central following 16 ocular disorders: aphakia, congenital cataract, congenital
nervous system diseases, mental illnesses or other known illnesses glaucoma, high ametropia, Peters’ anomaly, nystagmus, congenital
that could affect their behavioral patterns, in the absence of ocular ptosis, strabismus, persistent fetal vasculature, retinoblastoma, other
manifestations. Children who could not cooperate to complete the fundus diseases, limbal dermoid, microphthalmia, pupillary mem-
ophthalmic examinations or the detection test using AIS were excluded. branes, systemic syndromes with ocular manifestations and other
We also excluded children who had received ocular interventions and ocular conditions (Table 2 and Supplementary Table 4). A tiered panel
treatments in the month immediately preceding data collection. consisting of two groups of experts assigned and confirmed the pri-
In the first stage completed from 14 January 2020 to 15 September mary diagnosis as the most significant diagnostic label for each child.
2021, children were enrolled at the clinic of ZOC (Guangdong Province) The first group of experts consisted of two pediatric ophthalmologists
to develop and comprehensively validate (internal validation and reli- with over 10 years of experience in each recruiting ophthalmic center
ability analyses) the system. In the second stage, which occurred from who separately provided the preliminary labeling information. If a
22 September 2021 to 19 November 2021, children were enrolled at consensus was not reached at this stage, a second group of more senior
the clinics of the Second Affiliated Hospital of Fujian Medical Univer- pediatric ophthalmologists with over 20 years of experience at ZOC
sity (Fujian Province), Shenzhen Eye Hospital (Guangdong Province), verified the diagnostic labels as the ground truth. The diagnoses of
Liuzhou Maternity and Child Healthcare Hospital (Guangxi Province) children recruited online for the at-home implementation were made
and Beijing Children’s Hospital of Capital Medical University (Beijing) by experts at ZOC following the same criteria.
to additionally evaluate the system (external validation). We selected
these sites from three provinces across northern and southern China, Concept of the AIS system
representing the variations in clinical settings. In the first two stages, The AIS system consisted of a smartphone app (available for iPhone
recruited children underwent ophthalmic examinations by clinical and Android operating systems) for data collection and a DL back end
staff, and phenotypic videos were collected by trained volunteers for data analysis (Fig. 1a and Extended Data Fig. 1). To ensure the qual-
using mHealth apps installed on iPhone-7 or iPhone-8 smartphones at ity of data collected in real-world settings, AIS interactively instructed
each center. In the third stage conducted from 24 November 2021 to users to follow a standardized preparation sequence for data collec-
30 January 2022, we advertised our study through the online platform tion (Extended Data Fig. 2). Before data collection, a short demo video
of the Pediatric Department of ZOC and the social media of WeChat. was displayed to instruct users on the standard operation and how to
We recruited children and their parents or caregivers online from choose an appropriate environment to minimize testing biases (for
the Guangdong area for at-home implementation. The investigators example, room illuminance, background, testing distance and inter-
recruited the children following the same eligibility criteria as the ference). Once the smartphone was firmly in place, a face-positioning
previous two stages by collecting their basic information and medical frame was shown on the screen to help adjust the distance and position
history online. In addition, children who could not come to ZOC for an of the child in relation to the smartphone. After all preparations were
ophthalmic assessment or who had been included in other stages of completed properly, AIS played a cartoon-like video stimulus lasting
this study were excluded. Untrained parents or caregivers recorded the approximately 3.5 min to attract children’s attention, and the inbuilt
phenotypic videos with their smartphones according to the instruc- front camera recorded the children’s phenotypic features (ocular
tions of the AIS app at home (Extended Data Figs. 1 and 2). The quality movements and facial appearance) in video format.
control module automatically reminded parents or caregivers to repeat Then, the collected data were transferred to the DL-based back
data collection when the video recordings were unqualified. In this end, where the quality control module automatically performed quality
stage, all the children who completed successful video recordings checking on each frame first. To eliminate background interference,
underwent ophthalmic examinations at ZOC. A total of 3,652 children the children’s facial regions were then cropped out of consecutive
were finally enrolled, recording more than 25,000,000 frames of videos frames of sufficient quality to form short video clips as inputs of the
for development and validation of the system. subsequent DL models for final decision-making (a detection model to
distinguish visually impaired children from nonimpaired individuals
Definition of visual impairment and a diagnostic model to discriminate multiple ocular disorders). The
Comprehensive functional and structural examinations were per- DL models produced classification probabilities for short video clips,
formed to stratify children’s visual conditions for developing and which were eventually merged into the video-level classification prob-
validating the DL-based AIS. For unified examination, a teller vision ability as the final outcome by averaging. The final results were returned
card (Stereo Optical Company) was utilized to measure children’s to the mHealth app to alert users to promptly refer children at high risk
monocular visual acuity44. In addition, high-resolution slit lamp exami- of visual impairment to experienced pediatric ophthalmologists for
nations, fundoscopy examinations and cycloplegic refraction were further diagnosis and intervention.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Deep quality control module initialized by the parameters of the models pretrained on the ImageNet
To ensure prediction reliability, we adopted a strict data quality con- dataset51. At the inference stage, class scores given by the models were
trol strategy to ensure that the input clips of the detection/diagnostic treated as the final clip-level probability outcomes. For the detection
models satisfied certain quality criteria (Fig. 1a). First, for each frame, model, the output of the last classification layer, indicated by xi, was
the child’s facial area was detected, and frames without successful normalized to the range between 0.00 and 1.00 for each clip using the
face detection were rejected. If two or more faces were detected in a sigmoid function pi =
1
, representing the final probability of
1+exp(−xi )
given frame, it suggested that the child’s parents or other persons were
the ith clip being classified as a visually impaired candidate. To train
inside the scene, and such a frame was also rejected. The facial region
the detection model, the cost function was given by the classic binary
detection algorithm was based on MMOD CNN46, which consisted of 1 N
a series of convolutional layers for feature learning and max-margin cross-entropy loss L = − ∑i=1 (ŷi log (pi ) + (1 − ŷi ) log (1 − pi )) , where
N
operation during model training. In this study, the MMOD CNN face N was the number of clips within each batch, ŷi was the ground truth
detector pretrained on publicly available datasets was adopted from label of the ith clip and pi was the output classification probability of
the Dlib Python Library, which has been proven to be effective and the model.
robust in facial detection tasks26. The diagnostic model was developed based on the same
Second, a facial key point localization algorithm was applied to EfficientNet-B4 backbone as the detection model. The only difference
the detected facial area to extract the landmarks of facial regions, was that the output of the diagnostic model was activated by a
including the left eye, right eye, nose tip, chin and mouth corners, five-category softmax function that indicated the probability of each
exk
which served as the reference coordinates for the cropping of facial class: pk = 5 xj , where xk was the output of the last classification layer
∑j=1 e
regions. The facial key point localization algorithm was realized based
on a pretrained ensemble of regression trees, which was also provided for the kth class. The cost function of the network was given by the
1 N 5 i
by the Dlib Python Library47,48. We adopted a cascade of regressors to stochastic cross-entropy loss L = − ∑i=1 ∑k=1 (yk̂ log (pik )), where N was
N
i
take the facial region of the frame as the input. The network was able the batch size and yk̂ ∈ {0,1} was the binary Boolean variable of the ith
to learn coarse-to-fine feature representations of the child’s face, espe- input clip within each batch, indicating whether the kth class matched
cially details of the facial patterns. The output of this model was then the ground truth label of the ith clip.
fitted to the coordinates representing facial structures to generate Child-level classification was based on clip-level predictions.
68 key target points. The coordinates of the key points then served as A sliding window integrated with the quality control module was
the reference for facial region cropping. All video data and image data applied along the temporal dimension of the whole video to extract
processing were performed using the FFmpeg toolkit and OpenCV high-quality clips. Such clips then served as the candidate inputs for
Python Library49. detection/diagnostic models. For the detection model, if the average
Then, a combination of crying, interference and occlusion classifi- score of the clips exceeded 0.50 within each video, the child was even-
cation models based on EfficientNet-B2 networks (Extended Data Fig. tually classified as a visually impaired individual. For the diagnostic
3a,b) was applied to each frame, which was trained based on the data model, the category with the highest average probability was treated
collected at ZOC (Extended Data Fig. 3d and Supplementary Table 1)50. as the final prediction outcome.
During model training and inference, the input frame was first rescaled
to 384 × 384 resolution and then sent into the models for deep feature Model training and internal validation
representation learning (Supplementary Table 14). Positive outputs by We first developed the data quality control module using both publicly
the models indicated that the child was crying, was interfered with or available datasets and the ZOC dataset (Supplementary Table 1). Then,
had its facial region blocked by objects such as toys or other persons’ we trained and validated the detection/diagnostic models with the
hands, and the corresponding frames were also discarded. In practice, ground truth of the visual conditions using the development dataset
we fine-tuned the models pretrained on the ImageNet public dataset51. at ZOC. In this stage, data collection preceded the development of
Eventually, the remaining frames were considered high-quality the quality control module, so raw videos without quality checking
candidates, and consecutive high-quality frames were selected to form were collected. In total, raw videos from 2,632 children undergoing
short video clips. Each clip lasted at least 1.5 s and at most 5 s. The child’s ophthalmic examinations were collected by trained volunteers using
facial region within each clip was then cropped out to serve as the final the mHealth apps installed on iPhone-7 or iPhone-8 smartphones. After
input of the subsequent detection/diagnostic models based on the initial quality checking by the quality control module, qualified videos
facial key point coordinates to eliminate the interference of the back- of 2,344 (89.1%) children were reserved as the development dataset,
ground region. A qualified video should contain more than ten clips; which was randomly split into training, tuning and validation (internal
otherwise, the video was treated as a low-quality sample and discarded. validation) sets using a stratified sampling strategy according to sex,
age and the category of ophthalmic disorder to train and internally
DL framework of the detection/diagnostic models validate the detection/diagnostic models (Fig. 1b, Extended Data Fig. 3e
Two models with various clinical purposes were developed in this study: and Supplementary Table 2). The age distribution and the proportions
a detection model to detect visually impaired children from nonim- of children with unilateral and bilateral severe visual impairment for
paired children and a five-category diagnostic model to discriminate different datasets are shown in Supplementary Tables 15 and 16, respec-
specific ophthalmic disorders (aphakia, congenital glaucoma, con- tively. Internal validation refers to the assessment of the performance
genital ptosis and strabismus) and nonimpairment. The backbone of of the selected optimized model, after training and hyperparameter
each DL model was built on a deep convolutional network known as selection and tuning, on the independent datasets from the same set-
EfficientNet-B4 (Extended Data Fig. 3c and Supplementary Table 14)50. tings as training datasets. The top-performing checkpoint was selected
The models made predictions on the children’s cropped facial regions. on the basis of accuracy on the tuning set. In particular, the videos
Specifically, spatial cues of the input clips were learned by cascaded utilized for quality control module development did not overlap with
convolutional layers, while temporal cues were integrated by temporal those in the detection/diagnostic model validation.
average pooling layers, which was inspired by successful applications
in gait recognition52. The temporal average pooling operator was given Finding a needle in a haystack test
1 n ⇀ ⇀
by ∑i=1 xi, where n was the number of frames in the input clip and xi To estimate the performance of the AIS system in the general popula-
n
was the feature map of each frame output by the last convolutional tion with a rare-case prevalence of visual impairment, we simulated a
layer of the network. Before training, all convolutional blocks were gradient of prevalences ranging from 0.1% to 9% to conduct a finding a

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

needle in a haystack test. For each simulated prevalence, we resampled Reliability across different smartphone platforms
10,000 children based on the internal validation dataset in a boot- We performed adjustments at different blur, brightness, color or
strap manner to test whether the AIS system could pick up the ‘needle’ Gaussian noise adjustment gradients to a dataset (n = 200 children and
(visually impaired children at the simulated prevalence) in the n = 200 qualified videos) randomly sampled from the ZOC validation
‘haystack’ (10,000 resampled children) and repeated this process set to simulate the characteristics of data collected by various cameras
100 times to estimate the 95% CIs. and evaluate the reliability of AIS. Furthermore, we collected another
dataset in an independent population of children at ZOC to assess the
Data augmentation reliability of the AIS system across different operating systems. In total,
To ensure better model capacity and reliability in complex environ- raw videos from 389 children undergoing ophthalmic examinations
ments, data augmentation was performed during model training using were collected by trained volunteers using two Android smartphones,
brightness and contrast adjustments, together with blurring tech- Redmi Note 7 and Huawei Hornor-6 plus. After initial quality checking,
niques. Specifically, the brightness of the input frames was randomly qualified videos of 361 (92.8%) children were reserved for testing. The
adjusted by a factor of 0.40, and the contrast was randomly adjusted technical specifications of the smartphones used in this study are
by a factor of 0.20. Blurring techniques included Gaussian blur, median summarized in Supplementary Table 13.
blur and motion blur. The factor of all blurring techniques was set to
five. Each input frame had a probability of 0.50 to perform data augmen- Retest reliability analysis
tation (Supplementary Table 17). All data augmentation processes were We performed detection tests for each child twice by two volunteers at
based on a publicly available Python library known as Albumentations53. least 1 day apart on another independent population recruited at ZOC to
evaluate the retest reliability. Raw videos from 213 children undergoing
Multicenter external validation ophthalmic examinations were collected using iPhone-7 or iPhone-8
External validation refers to the assessment of the performance of the smartphones. Qualified videos of 187 (87.8%) children were reserved
AI system using independent datasets, captured from different clinical for retest analysis after initial quality checking (Fig. 1b and Table 1). An
settings. This is to ensure the generalizability of the system to different intraclass correlation coefficient was calculated for repeated predicted
settings. Trained volunteers used mHealth apps installed on iPhone-7 probabilities of the detection model, and a Cohen’s κ was calculated for
or iPhone-8 smartphones to perform external validation in the oph- repeated predicted categories to evaluate retest reliability.
thalmology clinics of the Second Affiliated Hospital of Fujian Medical
University, Shenzhen Eye Hospital, Liuzhou Maternity and Child Hard-to-spot test
Healthcare Hospital and Beijing Children’s Hospital of Capital Medical To investigate the influence of the apparency of the phenotypic features
University. In this stage, the quality control module automatically on the AIS system, a panel of 14 community ophthalmologists with
reminded volunteers to repeat data collection when the videos were 3–5 years of clinical experience identified ‘likely impaired’ children
of low quality. In total, 305 children were recruited and qualified videos based on the phenotypic videos in the ZOC validation dataset. The
for 301 children (98.7%) were successfully collected. Qualified videos true impaired and nonimpaired children were mixed at a ratio of 1:1
for 298 children undergoing ophthalmic examinations were reserved during identification. Each case was independently reviewed by three
for final validation of the detection model (see Fig. 1b and Table 1 for ophthalmologists. When no more than one ophthalmologist provided
details of the participants and the dataset used for external validation). ‘likely impaired’ labels for one true impaired child, this child was clas-
sified as a hard-to-spot case with insidious phenotypic features rather
Implementation by untrained parents or caregivers at home than a relatively evident case. The performance of the AIS system for
We further challenged our system in an application administered relatively evident/hard-to-spot cases was assessed.
by untrained parents or caregivers with their smartphones in daily
routines (Fig. 1b). Children (independent from the development and Other reliability analyses
external validation participants) were recruited online, and their par- We tested AIS under different room illuminance conditions.
ents or caregivers autonomously used AIS at home according to the Photometers (TESTES-1330A; TES Electrical Electronic Corp.) were
system’s instructions to collect qualified videos and perform tests used to measure the mean room illuminance intensity before and
without pretraining or controlling any biases before testing, such as after data collection. The following criteria were applied to estimate
brands and models of smartphones and the home environment. This the distances between the children and the smartphones to assess the
process generated data with huge variations of distributions that had reliability of AIS in different testing distance groups. When most of the
an extremely high requirement of generalizability and extensibility for vertical lengths of a child’s head regions were less than one-third of
the DL-based system. Thus, before final implementation, we performed the height of the smartphone screen at the frame level, the video was
a pilot study to collect a dataset for fine-tuning our system to chaotic determined to be taken from a long distance. When most of the lengths
home environments. To efficiently evaluate the performance of AIS were between one-third and one-half of the height of the screen, the
for identifying visual impairment in at-home settings, a sufficient video was judged to be taken from a medium distance, and when most
proportion of visually impaired children with various ocular diseases of the lengths were larger than one-half of the height of the screen, the
were recruited. Of the 125 children recruited, 122 children (97.6%) suc- video was judged to be taken at a close distance. For each full-length
cessfully completed the detection tests and collected qualified videos, video, subvideos with various durations were generated to serve as
among whom 120 children undergoing ophthalmic examinations were inputs to evaluate the influence of the duration of the video recording
enrolled to fine-tune and evaluate the detection model. We fine-tuned on the performance of AIS. We also evaluated the performances of AIS
the detection model using qualified videos from 32 children collected grouped by patient-related factors including sex, age and laterality of
first and then tested it by the subsequently collected validation set from the eye disorder.
another 88 children. See Fig. 1b and Table 1 for more information on
the fine-tuning and implementation. Adjusted analyses
To further verify that the predictions of this system were not solely
Reliability analyses and adjusted analyses mediated by sample characteristics as confounders, we performed
To test the stability and generalizability of AIS under various conditions, adjusted analyses to examine the ORs of the predictions of the system
investigators conducted a batch of reliability analyses and adjusted adjusted for sample characteristics leveraging logistic regression
analyses (Fig. 1b and Table 1). models.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Detection model visualization and explanation P value of <0.05 indicates statistical significance. All statistical analyses
Two strategies were used to interpret and visualize the detection were performed in R Statistics (v.4.1.2) or Python Programs (v.3.9.7),
model: t-distributed stochastic neighbor embedding (t-SNE) and and plots were created with the ggplot2 package (v.3.3.5) in R Statistics.
gradient-weighted class activation mapping (Grad-CAM)54–56. The
former was used to visualize the high-dimensional activation status Computational hardware
of the deep CNN at the clip level by projecting its feature vector into a Hardware information for this study is shown as follows: graphics
two-dimensional space, and the latter was adopted to create a heat map processing unit (GPU), Nvidia Titan RTX 24 GB memory × 4, Driver
showing the area within each frame of the clip that contributed most to v.440.82, Cuda v.10.2; central processing unit (CPU), Intel(R) Xeon(R)
the output class of the network. In practice, the feature vectors output CPU E5-2678 v.3 @ 2.50 GHz × 2, 48 threads; random access memory
by the temporal average pooling layer and flatten operation and the (RAM), Samsung 64 GB RAM × 8, configured speed 2,133 MHz.
feature maps output by the last convolutional layer before the temporal
average pooling operation were chosen to visualize the results gener- Use of human data
ated by t-SNE and Grad-CAM, respectively. Specifically, 1,200 visually The ethical review of this study was approved by the Institutional
impaired clips and 1,200 nonimpaired clips were randomly selected Review Board/Ethics Committee of ZOC. The test was prospectively
from the ZOC validation set to perform t-SNE analysis. To generate registered at ClinicalTrials.gov (identifier: NCT04237350).
average heat maps, we randomly sampled ten videos for each ophthal-
mic disorder from the internal validation dataset. Since each video Reporting summary
had multiple clips, we ranked these clips according to the model Further information on research design is available in the Nature
predicted probabilities and selected the two clips with the highest Portfolio Reporting Summary linked to this article.
probabilities. For each selected clip, we took 30 frames at equal inter-
vals to generate the corresponding average heat map. In summary, Data availability
we had a total of 600 heat maps for each type of disorder, and we The data that support the findings of this study are divided into two
summed and averaged these heat maps to obtain the typical heat groups: published data and restricted data. The authors declare that
map for a certain disease. A public machine learning Python library the published data supporting the main results of this study can be
named Scikit-learn was used to generate two-dimensional coordinates obtained within the paper and its Supplementary Information. For
of t-SNE results, and Grad-CAM analysis was performed based on an research purposes, a representative video deidentified using digital
open-source GitHub code set57. masks on children’s faces for each disorder or behavior in this study
Additionally, we compared the model-predicted probabilities is available. In the case of noncommercial use, researchers can sign
of three groups of clips (clips randomly sampled from videos of the license, complete a data access form provided at https://github.
nonimpaired children, clips randomly sampled from videos of com/RYL-gif/Data-Availability-for-AIS and contact H.L. Submitted
visually impaired children, and clips annotated by experts as hav- license and data access forms will be evaluated by the data manager.
ing abnormal behavioral patterns from videos of visually impaired For requests from verified academic researchers, access will be granted
children) to investigate whether the detection model focused on within 1 month. Due to portrait rights and patient privacy restrictions,
specific behavioral patterns in children (Fig. 3d and Supplementary restricted data, including raw videos, are not provided to the public.
Table 8).
Code availability
Triage-driven approach to select equivocal cases for manual Since we made use of proprietary libraries in our study, our codes for
review system development and validation release to the public are therefore
We assessed a triage strategy to find a solution when the system was not feasible. We detail the methods and experimental protocol in this
likely unreliable by choosing equivocal cases for manual review in the paper and its Supplementary Information to provide enough infor-
internal validation set. An equivocal case referred to a child predicted mation to reproduce the experiment. Several major components of
by the AIS system with a low confidence value, given by |p − 0.50|, where our work are available in open-source repositories: PyTorch (v.1.7.1):
p was the predicted probability for the child. Three ophthalmologists https://pytorch.org; Dlib Python Library (v.19.22.1): https://github.
from ZOC with over 10 years of clinical experience vetted the pheno- com/davisking/dlib (frameworks for facial region detection and facial
typic videos of the equivocal cases and the AIS predictions in a voting key point localization); EfficientNet-PyTorch: https://github.com/
manner. Additional information, including baseline information and lukemelas/EfficientNet-PyTorch (frameworks for models in the
medical histories, was provided when necessary. An increasing ratio quality control module and the detection/diagnostic models); Albu-
from 0 to 19% of equivocal cases with the lowest confidence values was mentations (v.0.5.2): https://github.com/albumentations-team/
chosen for manual review to evaluate this triage strategy. albumentations (data augmentation); and OpenCV Python Library
(v.4.5.3.56): https://github.com/opencv/opencv-python (video data
Statistical analysis and image data processing).
The primary outcomes were the AUCs of the detection/diagnostic
models. The secondary outcomes included the accuracy, sensitivity References
and specificity of the models and the reliability of the detection model 44. Drover, J. R., Wyatt, L. M., Stager, D. R. & Birch, E. E. The teller
under various settings. The 95% CIs of the AUC, accuracy, sensitivity and acuity cards are effective in detecting amblyopia. Optom. Vis. Sci.
specificity of the models were estimated. Specifically, the DeLong CIs 86, 755 (2009).
of AUCs were calculated at the child level. To eliminate bias due to the 45. Mayer, D. L. et al. Monocular acuity norms for the Teller Acuity
association of multiple clips for the same child, the bootstrap CIs of the Cards between ages one month and four years. Investigative
AUCs of the detection model were calculated at the clip level. One clip Ophthalmol. Vis. Sci. 36, 671–685 (1995).
for each child was randomly taken to form a bootstrap sample, and this 46. King, D. E. Max-margin object detection. Preprint at https://
process was repeated 1,000 times. Wilson CIs were reported for other ui.adsabs.harvard.edu/abs/2015arXiv150200046K (2015)..
proportional metrics. Descriptive statistics, including means, s.d., 47. Zhou, E., Fan, H., Cao, Z., Jiang, Y. & Yin, Q. Extensive facial
numbers and percentages, were used. Mann–Whitney U-tests were used landmark localization with coarse-to-fine convolutional network
to compare means on continuous variables, and Fisher exact tests were cascade. In 2013 IEEE International Conference on Computer
used to compare distributions on categorical variables. A two-sided Vision Workshops 386–391 (IEEE, 2013).

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

48. Kazemi, V. & Sullivan, J. One millisecond face alignment with (GR001376), the Addenbrooke’s Charitable Trust, the National Eye
an ensemble of regression trees. In 2014 IEEE Conference on Research Centre (UK), the International Foundation for Optic Nerve
Computer Vision and Pattern Recognition 1867–1874 (IEEE, 2014). Disease, the NIHR as part of the Rare Diseases Translational Research
49. Bradski, G. The openCV library. Dr. Dobb’s J. Softw. Tools 25, Collaboration, the NIHR Cambridge Biomedical Research Centre
120–123 (2000). (BRC-1215-20014) and the NIHR Biomedical Research Centre based
50. Tan, M. & Le, Q. EfficientNet: rethinking model scaling for at Moorfields Eye Hospital National Health Service Foundation Trust
convolutional neural networks. In Proc. 36th International and University College London Institute of Ophthalmology. The views
Conference on Machine Learning 6105–6114 (PMLR, 2019). expressed are those of the author(s) and not necessarily those of the
51. Deng, J. et al. Imagenet: a large-scale hierarchical image National Health Service, the NIHR or the Department of Health. The
database. In 2009 IEEE Conference on Computer Vision and funders had no role in the study design, data collection and analysis,
Pattern Recognition 248–255 (IEEE, 2009). decision to publish or preparation of the manuscript.
52. Chao, H., He, Y., Zhang, J. & Feng, J. GaitSet: regarding gait as
a set for cross-view gait recognition. In Proc. Thirty-Third AAAI Author contributions
Conference on Artificial Intelligence and Thirty-First Innovative W.C., R.L. and H.L. contributed to the concept of the study and
Applications of Artificial Intelligence Conference and Ninth AAAI designed the research. W.C., R.L., A.X., Ruixin Wang, Yahan Yang, D.
Symposium on Educational Advances in Artificial Intelligence Lin, X.W., J.C., Z. Liu, Y.W., K.Q., Z.Z., D. Liu, Q.W., Y.X., X.L., Zhuoling
Article 996 (AAAI Press, 2019). Lin, D.Z., Y.H., S.M., X.H., S.S., J.H., J.Z., M.W., S.H., L.C., B.D., H.Y.,
53. Buslaev, A. et al. Albumentations: fast and flexible image D.H., X.L., L.L., Xiaoyan Ding, Yangfan Yang and P.W. collected the
augmentations. Information 11, 125 (2020). data. W.C., R.L., Q.Y., Y.F., Zhenzhe Lin, K.D., Z.W., M.L. and Xiaowei
54. Hinton, G. E. & Roweis, S. Stochastic Neighbor Embedding. In Ding conducted the study. W.C., R.L. and L.Z. analyzed the data.
Advances in Neural Information Processing Systems 15 (Eds. W.C., R.L., Q.Y., Y.F. and H.L. cowrote the manuscript. D. Lin, X.W.,
Becker, S., Thrun, S. and Obermayer, K.) 833–840 (NIPS, 2002). F.Z., N.S., J.-P.O.L., C.Y.C., E.L., C.C., Y.Z., P.Y.-W.-M., Ruixuan Wang and
55. Belkina, A. et al. Automated optimized parameters for W.-s.Z. critically revised the manuscript. Zhenzhe Lin, Ruixuan Wang,
T-distributed stochastic neighbor embedding improve W.-s.Z, Xiaowei Ding and H.L. performed the technical review. All
visualization and analysis of large datasets. Nat. Commun. 10, authors discussed the results and provided comments regarding the
5415 (2019). manuscript.
56. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep
networks via gradient-based localization. In 2017 IEEE International Competing interests
Conference on Computer Vision (ICCV) 618–626 (IEEE, 2017). Zhongshan Ophthalmic Center and VoxelCloud have filed for patent
57. Zuppichini, F. S. FrancescoSaverioZuppichini/cnn-visualisations. protection for W.C., R.L., A.X., Y.F., Zhenzhe Lin, K.D., K.Q., Xiaowei
GitHub https://github.com/FrancescoSaverioZuppichini/cnn- Ding and H.L. for work related to the methods of detection of visual
visualisations (2018). impairment in young children. All other authors declare no competing
interests.
Acknowledgements
We thank all the participants and the institutions for supporting this Additional information
study. We thank H. Sun, T. Wang, T. Li, W. Lai, X. Wang, L. Liu, T. Cui, Extended data is available for this paper at
S. Zhang, Y. Gong, W. Hu, Y. Huang, Y. Pan and C. Lin for supporting the https://doi.org/10.1038/s41591-022-02180-9.
data collection; M. Yang for the help with statistical suggestions and
Y. Mu for the help with our demo video. This study was funded by the Supplementary information The online version
National Natural Science Foundation of China (grant nos. 82171035 contains supplementary material available at
and 91846109 to H.L.), the Science and Technology Planning Projects https://doi.org/10.1038/s41591-022-02180-9.
of Guangdong Province (grant no. 2021B1111610006 to H.L.), the
Key-Area Research and Development of Guangdong Province (grant Correspondence and requests for materials should be addressed to
no. 2020B1111190001 to H.L.), the Guangzhou Basic and Applied Xiaowei Ding or Haotian Lin.
Basic Research Project (grant no. 2022020328 to H.L.), the China
Postdoctoral Science Foundation (grant no. 2022M713589 to W.C.), Peer review information Nature Medicine thanks Pete Jones,
the Fundamental Research Funds of the State Key Laboratory of Ameenat Lola Solebo and the other, anonymous, reviewer(s) for their
Ophthalmology (grant no. 2022QN10 to W.C.) and Hainan Province contribution to the peer review of this work. Primary Handling Editor:
Clinical Medical Center (H.L.). P.Y.-W.-M. is supported by an Advanced Michael Basson, in collaboration with the Nature Medicine team.
Fellowship Award (NIHR301696) from the UK National Institute of
Health Research (NIHR). P.Y.-W.-M. also receives funding from Fight Reprints and permissions information is available at
for Sight (UK), the Isaac Newton Trust (UK), Moorfields Eye Charity www.nature.com/reprints.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 1 | The app for data collection. a, The operation interface of the app. b, Utilize the smartphone for data collection in real-world settings.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 2 | The standard preparation sequence guided by the app for data collection.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 3 | See next page for caption.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 3 | Development of deep learning models of the AIS curves of the models trained for the quality control module. e, The training and
system. a, Basic building blocks and architecture of EfficientNet. Two model tuning curves of the detection model at the clip level. Conv 2d, 2-dimensional
architectures, EfficientNet-B2 and EfficientNet-B4, were used in data quality convolutional layer; ReLU, rectified linear unit; Temporal Avg Pooling, average
control for detection/diagnostic tasks, respectively. b, Architecture of the pooling along the temporal dimension; ROC curve, receiver operating
EfficieNet-B2 model. c, Architecture of the EfficientNet-B4 model. d, ROC characteristic curve; AIS, Apollo Infant Sight.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 4 | Performance of the detection model at the clip level. implementation by parents or caregivers (NI, n = 947; mild, n = 943; severe,
a, ROC curves of the detection model in the internal validation (NI, n = 6,735; n = 809; VI versus NI, AUC = 0.817 (0.756–0.881); mild versus NI, AUC = 0.809
mild, n = 8,310; severe, n = 6,685; VI versus NI, AUC = 0.925 (0.914–0.936); (0.735–0.884); severe versus NI, AUC = 0.825 (0.764–0.886)). Parentheses show
mild versus NI, AUC = 0.916 (0.904–0.928); severe versus NI, AUC = 0.935 95% bootstrap CIs. A cluster-bootstrap biased-corrected 95% CI was computed,
(0.924–0.946)). b, ROC curves of the detection model in the external validation with individual children as the bootstrap sampling clusters. NI, nonimpairment;
(NI, n = 7,392; mild, n = 2,580; severe, n = 1,569; VI versus NI, AUC = 0.814 VI, visual impairment; ROC curve, receiver operating characteristic curve;
(0.790–0.838); mild versus NI, AUC = 0.802 (0.770–0.831); severe versus NI, AUC, area under the curve; CI, confidence interval.
AUC = 0.834 (0.807–0.863)). c, ROC curves of the detection model in the at-home

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 5 | Visualization of the clips correctly classified or center of true NI clips in the t-SNE scatter plot were compared. *P < 0.001
misclassified by the detection model. a, The t-distributed stochastic neighbor (true NI clip, n = 1084; false clip, n = 317; P < 1.00 × 10−36, two-tailed Mann-Whitney
embedding (t-SNE) algorithm was applied to visualize the clustering patterns of U test). The thick central lines denote the medians, the lower and upper box limits
clips correctly classified or misclassified by the detection model. b, Distances denote the first and third quartiles, and the whiskers extend from the box to the
from true VI and false clips to the center of true VI clips in the t-SNE scatter plot outermost extreme value but no further than 1.5 times the interquartile range
were compared. *P < 0.001 (true VI clip, n = 999; false clip, n = 317; P < 1.00 × 10−36, (IQR). VI, visual impairment; NI, nonimpairment.
two-tailed Mann-Whitney U test) c, Distances from true NI and false clips to the

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 6 | The triage-driven approach to select the equivocal 90th–100th to 1.45 × 10−3 for 20th–30th; 10th–20th percentile versus other percentile
cases with the lowest predicted confidence values for manual review. a, The intervals, P ranging from 2.02 × 10−6 for 90th–100th to 2.02 × 10−2 for 20th–30th;
false predicted rate (both false positive and false negative) in different percentile two-tailed Fisher’s exact tests). Results are expressed as means and the 95%
intervals of predicted confidence values. *P < 0.001 (0th–9th, n = 51; 10th–20th, Wilson confidence intervals (CIs). b, The performance of the triage-driven system
n = 61; 20th–30th, n = 59; 30th–40th, n = 57; 40th–50th, n = 57; 50th–60th, n = 56; with increasing manual review ratios for the equivocal cases. SPE, specificity;
60th–70th, n = 57; 70th–80th, n = 57; 80th–90th, n = 57; 90th–100th, n = 57; 0th–9th SEN, sensitivity; ACC, accuracy.
percentile versus other percentile intervals, P ranging from 7.92 × 10−8 for

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 7 | See next page for caption.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 7 | Performance of the detection model under blurring, f, ROC curves of the detection model for identifying visual impairment change
brightness, color, and noise adjustment gradients. a, Cartoon diagram by brightness factors (AUCs range from 0.551 for factor 0.9 to 0.951 for factor 0).
showing adjusting effect on the input data by blurring factors. b, Cartoon g, ROC curves of the detection model for identifying visual impairment change
diagram showing adjusting effect on the input data by brightness factors. by color factors (AUCs range from 0.930 for factor 70 to 0.952 for factor 20).
c, Cartoon diagram showing adjusting effect on the input data by color factors. h, ROC curves of the detection model for identifying visual impairment change
d, Cartoon diagram showing adjusting effect on the input data by noise factors. by noise factors (AUCs range from 0.820 for factor 1800 to 0.951 for factor 0). NI,
e, ROC curves of the detection model for identifying visual impairment change n = 60; VI, n = 140; ROC curve, receiver operating characteristic curve; VI, visual
by blurring factors (AUCs range from 0.683 for factor 37 to 0.951 for factor 0). impairment; NI, nonimpairment.

Nature Medicine
Article https://doi.org/10.1038/s41591-022-02180-9

Extended Data Fig. 8 | Performance of the AIS system using Huawei Honor-6 lower and upper box limits denote the first and third quartiles, and the whiskers
Plus/Redmi Note-7 smartphones. a, Comparisons of the predicted probabilities extend from the box to the outermost extreme value but no further than 1.5
for the AIS system between the nonimpairment, mild impairment, and severe times the interquartile range (IQR). b, ROC curves of the AIS system with Android
impairment groups. *P < 0.001 (NI versus mild, P = 8.10 × 10−28; NI versus severe, smartphones. c, Performance of the AIS system in the across-smartphone
P = 1.51 × 10−27; two-tailed Mann-Whitney U tests). The cross symbols denote the analysis. VI, visual impairment; NI, nonimpairment; ROC curve, receiver
means, the thick central lines and triangle symbols denote the medians, the operating characteristic curve; AIS, Apollo Infant Sight.

Nature Medicine

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy