Application of Machine Learning in Predicting Non Alcoholic Fatty Liver Disease Using Anthropometric and Body Composition Indices
Application of Machine Learning in Predicting Non Alcoholic Fatty Liver Disease Using Anthropometric and Body Composition Indices
com/scientificreports
Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, which can
progress from simple steatosis to advanced cirrhosis and hepatocellular carcinoma. Clinical diagnosis
of NAFLD is crucial in the early stages of the disease. The main aim of this study was to apply machine
learning (ML) methods to identify significant classifiers of NAFLD using body composition and
anthropometric variables. A cross-sectional study was carried out among 513 individuals aged 13 years
old or above in Iran. Anthropometric and body composition measurements were performed manually
using body composition analyzer InBody 270. Hepatic steatosis and fibrosis were determined using
a Fibroscan. ML methods including k-Nearest Neighbor (kNN), Support Vector Machine (SVM),
Radial Basis Function (RBF) SVM, Gaussian Process (GP), Random Forest (RF), Neural Network (NN),
Adaboost and Naïve Bayes were examined for model performance and to identify anthropometric and
body composition predictors of fatty liver disease. RF generated the most accurate model for fatty
liver (presence of any stage), steatosis stages and fibrosis stages with 82%, 52% and 57% accuracy,
respectively. Abdomen circumference, waist circumference, chest circumference, trunk fat and body
mass index were among the most important variables contributing to fatty liver disease. ML-based
prediction of NAFLD using anthropometric and body composition data can assist clinicians in decision
making. ML-based systems provide opportunities for NAFLD screening and early diagnosis, especially
in population-level and remote areas.
Abbreviations
NAFLD Non-alcoholic fatty liver disease
BMI Body mass index
LSM Liver stiffness measurement
ROCc Receiver operating characteristic curve
PCA Principal component analysis
CAP Controlled attenuation parameter
CI Confidence interval
ANOVA Analysis of variance
DCL Dorso-cervical lipohypertrophy
1
Department of Nutrition, Faculty of Medicine, Hormozgan University of Medical Sciences, Shahid Chamran
Boulevard, Bandar Abbas, Iran. 2Institute for Physical Activity and Nutrition (IPAN), Deakin University, Geelong
Victoria, Australia. 3Department of Nutrition, School of Nutrition Sciences and Food Technology, Kermanshah
University of Medical Sciences, Kermanshah, Iran. 4Institute for Intelligent Systems Research and Innovation
(IISRI), Geelong Waurn Ponds Victoria, Australia. 5Biomedical Machine Learning Lab, University of New South
Whales, Sydney, Australia. 6Concordia Institute for Information Systems Engineering, Concordia University,
Montreal, Canada. 7Department of Biomedical Engineering, Faculty of Engineering, Imam Reza International
University, Mashhad, Iran. 8Metabolic Syndrome Research Center, Faculty of Medicine, Mashhad University of
Medical Sciences, Mashhad, Iran. 9Department of Electronic Learning, Shiraz University, Shiraz, Iran. 10These
authors contributed equally: Farkhondeh Razmpour and Reza Daryabeygi-Khotbehsara. *email: frazmpoor@
gmail.com; reza.d@deakin.edu.au
Vol.:(0123456789)
www.nature.com/scientificreports/
Non-alcoholic fatty liver disease (NAFLD)‒ the hepatic manifestation of metabolic syndrome‒ is the most
common chronic liver d isease1,2. Worldwide prevalence of metabolic syndrome and NAFLD has increased in
parallel with increased obesity prevalence3–5, which is about 20–30% in developed countries and one-third
among American a dults6–8.
Obesity is a common metabolic risk factor associated with NAFLD9–11. The prevalence of NAFLD is directly
related to increased body mass index (BMI) and central o besity12–14. Most studies have shown that visceral fat
is an independent factor in generating hepatic steatosis, independent of B MI15,16. The amount of adipose tissue
and its distribution differs between men and women17. Women have higher overall fat tissue with relatively more
subcutaneous adipose tissue in the hips and thighs. At the same time, men accumulate visceral and subcutaneous
fat mainly in the trunk and abdomen with continuous changes before and after p uberty17–19. The increased fat
distribution around the waist (i.e. apple-shaped body) is linked to NAFLD in both g enders20. In a pear-shaped
body, the subcutaneous fat accumulates mainly in the thighs and b uttocks21,22, which is typical among females
but can increase metabolic syndrome in males, which is a risk factor for NAFLD independent of central o besity23.
In support of the role of fat distribution and anthropometric measures in NAFLD, studies have found several
contributing factors, including abdomen circumferences, waist, neck and fat accrual in trunk and arms24–29.
Most people with NAFLD, including both children or adults, do not have differential symptoms at the early
stages of the disease30. Notably, after the development of cirrhosis, different symptoms such as caput medusa,
spider angioma, palmar erythema, ascites, and jaundice appear31. Therefore, early diagnosis is critical to prevent
severe complications.
Ultrasonography and laboratory tests are typical diagnostic methods for detecting fatty liver disease.
Ultrasound technique has relatively high accuracy in detecting the moderate-to-severe steatosis level and lower
accuracy in earlier stages of NAFLD32. Notably, hepatic fibrosis cannot be diagnosed by ultrasonography14,33.
Although typically used to detect fatty liver disease, laboratory tests are not useful for all ages and gender groups
due to low a ccuracy34,35. Therefore, a precise, cost-effective, and non-invasive method to analyze symptoms of
various stages of the fatty liver for NAFLD diagnosis is desirable. Such an approach is important to help with
early diagnosis of NAFLD, which could help prevent hepatic steatosis progression to fibrosis, advanced cirrhosis,
and hepatocellular carcinoma.
In recent years, machine learning (ML) models have been used as a novel approach in predicting NAFLD36–39.
However, all of these studies have focused mainly on laboratory outcomes and have not considered body
composition and anthropometric factors. Therefore, the primary aim of this study is to identify essential ML
classifiers of NAFLDs using body composition and anthropometric indices. The secondary aim is to identify
feature contributions to the prediction of NAFLDs.
Data collection. At each medical clinic, eligibility, demographics questionnaire, anthropometric, and
body composition measurements were assessed by two trained nutritionists. Medical examination and
disease diagnosis were performed by a general physician and an internal specialist, respectively. Demographic
information, including sex, age, education, disease history and medications, were assessed by researcher using a
questionnaire. Weight was measured using a digital weighing scale (Seca 704; Hamburg, Germany), height was
measured using a wall height chart, and the body composition measures were assessed using InBody 270 (Inbody
Co. Ltd, South Korea) body analyzer to measure per cent (%) body fat, total fat mass, muscle mass, as well as fat
mass in the right/left leg, right/left arm and trunk with light clothing and without shoes. The circumferences of
neck, chest, arm, wrist, waist, hips, abdomen, thighs, and length of ulna and leg were measured using a flexible
tape measure with an accuracy of 0.1 cm. BMI was calculated by dividing weight (kilograms) by the height
in meters s quared26. Subcutaneous fat in the area below the scapula, arms biceps and triceps and the upper
iliac crest was measured using a Saehan calliper (Saehan SH5020, Korea). Participants were also examined for
acanthosis in the back of the neck and armpits and the presence of subcutaneous fat under the chin and at the
back of the neck. A Fibroscan equipped with the M and XL probes (Echosens 504, Paris, France) was used to
assess both controlled attenuation parameter (CAP) (dB/m) and liver stiffness measurement (LSM) (kPa) values
simultaneously. A reliable LSM was defined as the median liver stiffness of the 10 measurements (a success rate
of greater than 60%, and an IQR < 30% of the median LSM value)40. CAP values range from 100 to 400 dB/m and
the following cut-off values were used for the diagnosis of steatosis stages: Stage 0, < 238 dB/m, Stage 1, ≥ 238 to
260 dB/m, Stage 2, ≥ 260 to 292 dB/m, and Stage 3, ≥ 292 dB/m41. LSM values range from 1.5 to 75 kPa, and the
following cut-off values were used for the diagnosis of liver fibrosis stages: no significant fibrosis or F0 < 6.2 kPa,
Vol:.(1234567890)
www.nature.com/scientificreports/
mild fibrosis or F1 ≥ 6.2 to 7.6 kPa, moderate fibrosis or F2 ≥ 7.6 to 8.8 kPa, severe fibrosis or F3 ≥ 8.8 to 11.8 kPa
and cirrhosis or F4 ≥ 11.8 kPa42.
Statistical analysis. Descriptive and non-predictive data analysis was performed using SPSS version 21
software (SPSS, Inc., Chicago, IL). Data were expressed as mean ± standard deviation or frequencies. Between-
group comparisons were performed using an independent sample t test and analysis of variance (ANOVA),
followed by Tukey’s post hoc test. A P-value of less than 0.05 was considered statistically significant.
Machine learning models. Three label variables were considered: fatty liver (stage I, II and III vs. no
steatosis), steatosis, and fibrosis stages. Eight ML techniques were applied to the dataset to identify the best
modelling approach. To this end, k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Radial Basis
Function (RBF) SVM, Gaussian Process (GP), Random Forest (RF), Neural Network (NN), AdaBoost and Naïve
Bayes were tested. An extant explanation of these classifiers can be found e lsewhere43. Testing of these models
was performed using the Scikit-learn library in Python programming language44.
To comprehensively compare different classifiers, we trained and evaluated dataset 50 times. This is because
different classifiers sometimes predict slightly different outputs and initial points are different for a specific
classifier in each run. Thus, a reliable output can be estimated by averaging each classifier several times. Model
accuracy and area under the curve (AUC) are reported for each ML technique. Importance values are reported
for individual feature variables.
Pre-processing involved data normalization and segmentation. The few missing values in the numerical results
of the experiments were replaced using the Linear Interpolation method45. Principal component analysis (PCA)
was used to extract the attribute of the d ata46,47. Data were divided into two parts, train and test. Processing
involved feature selection and classification with the best feature. The model processing involved a variety of
models. The model with the highest performance was selected.
Patient consent. All patients provided written consent for participation in this study. For participants aged
under 18 years, written informed consent was obtained from their guardians.
Ethics approval. Ethical approval was received from the research ethics committee at Mashhad University
of Medical Sciences (Code: IR.MUMS.fm.REC.1395.64).
Results
In total 513 participants (240 males and 273 females) took part in the study, of whom 169 (74.1%) male and 220
(80.6%) female cases had a degree of hepatic steatosis. The mean age, weight, and BMI were 37.04 ± 15.44 years,
77.26 ± 17.31 kg, and 28.15 ± 4.89 kg m2, respectively. Overall demographic characteristics and biochemical
measures are presented in Table 1. Significant differences were found in most anthropometric variables between
male and female participants (see Tables 2 and 3).
Machine learning results. Figures 1, 2, 3 present box plots for each classification method applied to three
outcomes. Random Forest (RF) method generated the most accurate ML model for fatty liver (presence of any
stage), steatosis stage and fibrosis stage. Average accuracy and AUC values resulted from RF were 0.82 and 0.84
for fatty liver, 0.52 and 0.69 for steatosis stages, 0.57 and 0.58 for fibrosis stages, respectively. Average accuracy
and AUC are presented in the Supplemental file (Model Iterations) for all conditions. Moreover, sensitivity,
specificity, true positive and true negative measures were presented for fatty liver disease.
Feature variables with the highest predictability for fatty liver were abdomen circumference (IV; average
importance value = 0.061), waist circumference (IV = 0.061), chest circumference (IV = 0.054), trunk fat
(IV = 0.056) and BMI (IV = 0.053); for steatosis, the stage was abdominal circumference (IV = 0.053), waist
circumference (IV = 0.052), chest circumference (IV = 0.052), trunk fat (IV = 0.051) and BMI (IV = 0.050); and
for fibrosis were abdominal circumference (IV = 0.049), waist circumference (IV = 0.049), chest circumference
(IV = 0.043), BMI (IV = 0.045) and weight (IV = 0.045). See Figs. 4, 5, 6 and Tables 4, 5, 6.
Further assessment identified gender-specific features (see Supplemental Figs. 1–6; Supplemental Tables 1–6).
Important predictor factors for fatty liver disease among females were waist circumference (IV = 0.057), abdomen
circumference (IV = 0.056), trunk fat (IV = 0.055), fat mass (IV = 0.052), chest circumference (IV = 0.048), and
BMI (IV = 0.048) were the most important features. Among males, waist circumference (IV = 0.053), chest
circumference (IV = 0.052), trunk fat (IV = 0.051), BMI (IV = 0.052), abdomen circumference (IV = 0.049) and
fat mass (IV = 0.048) had the highest predictive value for fatty liver. Important predictor factors for steatosis
among females were abdomen circumference (IV = 0.048), waist circumference (IV = 0.047), weight (IV = 0.046),
trunk fat (IV = 0.045), fat mass (IV = 0.044), and BMI (IV = 0.043) were the most important features. Among
males, waist circumference (IV = 0.051), chest circumference (IV = 0.050), abdomen circumference (IV = 0.049),
trunk fat (IV = 0.048), BMI (IV = 0.048), and fat mass (IV = 0.046) had the highest predictive value for steatosis.
Important predictor factors for fibrosis among females were abdomen circumference (IV = 0.048), waist
circumference (IV = 0.047), BMI (IV = 0.046), trunk fat (IV = 0.045), chest circumference (IV = 0.043), and
muscle mass (IV = 0.043) were the most important features. Among males, abdomen circumference (IV = 0.045),
waist circumference (IV = 0.043), weight (IV = 0.043), BMI (IV = 0.043), right arm fat (IV = 0.042) and fat mass
(IV = 0.042) had the highest predictive value for fibrosis.
Vol.:(0123456789)
www.nature.com/scientificreports/
Discussion
This study applied ML techniques to determine the optimal body composition and anthropometric classifier
of NAFLD and identify feature contribution to the prediction of the disease. RF generated the most accurate
ML model to predict fatty liver presence, steatosis (stages) and fibrosis. To our knowledge, this is the first
study applying ML on body composition and anthropometric data to predict NAFLD. High accuracy (82%)
highlights the potential for applying ML techniques for the primary prevention and screening of NAFLD using
anthropometric measurements.
Previous studies using ML techniques to predict fatty liver disease have mainly focused on biochemical
measurements, with similar levels of accuracy (83.0%) using Bayesian N etwork38, (76.3%) Logistic R egression37,
(86.4%) RF39 and (80%) Classification Tree techniques36. However, we tested the predictive value of body
composition and anthropometric measurements rather than biochemical variables. Anthropometry as a lower-
cost and more feasible approach can be considered a primary screening method for fatty liver disease.
Abdominal obesity is a significant risk factor leading to N AFLD27. Waist circumference and trunk fat have
been shown to be significantly predicting the risk of NAFLD24. Although BMI is one of the risk factors of
NAFLD31, it has been argued that BMI is limited compared to other anthropometric measures (e.g., waist
circumference) in identifying lean NAFLD i ndividuals25. In a similar vein, the findings of the present study clearly
show the importance of these body composition and anthropometric measures and their relative contribution
to the prediction of NAFLD.
Neck circumference reflects the amount of subcutaneous fat in the upper body, and is a reliable factor in
determining central o besity48. A positive correlation has been shown between neck circumference and hepatic
26,28
steatosis . Neck circumference showed a positive association with other anthropometric components, such
as BMI and waist and waist-to-hip circumference. In the present study, neck circumference contributed almost
equally to hepatic steatosis and fibrosis.
A study by Subramanian revealed that the level of arm fat index in both males and females had a negative
association with the degree and severity of NAFLD29. In our study, a strong and positive relationship between
arm circumference and the severity of steatosis and fibrosis was detected, validated by the ML model. Rafiee et al.
showed that the amount of fat in hips and legs and circumference of hip negatively associated with fatty liver
and the severity of the disease. In contrast, the waist-to-hip ratio was closely associated with fatty liver. They also
showed that the accuracy of this ratio in predicting NAFLD was greater than BMI and waist-to-height ratio49.
Most ML studies for the prediction of NAFLD have used the ultrasonography technique to diagnose fatty liver
disease36–39. Ultrasound is a commonly used method for the diagnosis of hepatic steatosis50. Ultrasonography
is a safe, well-tolerated, non-invasive and low-cost technique50; however, there are limitations associated with
ultrasound use, including limited capability in detecting fatty infiltration (less than 20% steatosis), operator
dependency and subjective a ssessment51,52, and ML is expected to minimise some of these. Application of ML
Vol:.(1234567890)
www.nature.com/scientificreports/
Hepatic Steatosis
Variables Gender Grade 0 Grade Ι Grade ΙΙ Grade ΙΙΙ P-value
F 53 39 55 129 –
Number
M 81 38 48 77 –
F 31.13 ± 15.79# 34.64 ± 16.67 42.02 ± 15.37# 41.79 ± 15.88# 0.001
Age, years
M 27.54 ± 12.32 35.51 ± 19.29 39.33 ± 13.91# 36.12 ± 13.68# 0.001
F 59.90 ± 11.40 67.09 ± 11.08# 69.26 ± 11.24# 79.93 ± 14.56#&$ < 0.001
Weight; kg
M 66.71 ± 13.59 75.69 ± 13.90# 82.56 ± 10.39# 87.84 ± 16.70#& < 0.001
F 156.84 ± 8.46 158.64 ± 8.04 155.83 ± 7.68 159.00 ± 6.94 0.206
Height; cm
M 167.50 ± 13.60 167.78 ± 13.47 171.23 ± 6.68 172.15 ± 8.99 0.31
F 24.30 ± 4.00 26.53 ± 3.16 28.45 ± 3.44# 31.60 ± 5.21#&$ < 0.001
BMI; kg m−2
M 23.83 ± 3.21 26.86 ± 3.21# 28.19 ± 3.53# 29.67 ± 4.03#& < 0.001
F 27.60 ± 3.53 29.74 ± 3.14# 31.30 ± 2.58# 33.44 ± 4.23#&$ < 0.001
Arm circumference; cm
M 28.81 ± 3.04 30.51 ± 2.70 31.90 ± 2.53# 33.27 ± 3.47#& < 0.001
F 31.68 ± 2.37 32.77 ± 1.65 33.25 ± 1.89# 35.70 ± 2.34#&$ < 0.001
Neck circumference; cm
M 36.09 ± 2.83 37.80 ± 2.73 38.94 ± 2.87# 39.56 ± 2.87#& < 0.001
F 89.04 ± 8.79 94.30 ± 8.19# 98.78 ± 7.61# 106.18 ± 9.56#&$ < 0.001
Chest circumference; cm
M 89.32 ± 10.24 96.78 ± 10.43# 101.34 ± 7.52# 104.26 ± 10.86#& < 0.001
F 81.74 ± 10.85 86.69 ± 7.25 94.57 ± 7.92#& 101.96 ± 10.53#&$ < 0.001
Waist circumference; cm
M 85.71 ± 9.08 96.42 ± 7.99# 99.83 ± 8.75# 102.68 ± 10.07#& < 0.001
F 85.82 ± 11.11 91.77 ± 7.60# 98.15 ± 7.70# 105.33 ± 10.52#&$ < 0.001
Abdomen circumference; cm
M 87.61 ± 8.88 98.50 ± 7.12# 101.27 ± 8.93# 104.15 ± 9.83#& < 0.001
F 98.03 ± 12.80 101.87 ± 9.20 103.50 ± 8.82 106.60 ± 13.60# 0.002
Hip circumference; cm
M 96.70 ± 8.88 101.31 ± 6.67 103.88 ± 7.21# 106.57 ± 8.00#& < 0.001
F 14.98 ± 0.94 15.63 ± 0.742 15.78 ± 1.13# 16.30 ± 1.44# < 0.001
Wrist circumference; cm
M 16.71 ± 0.87 17.44 ± 1.11# 17.66 ± 0.85# 17.92 ± 1.10# < 0.001
F 15.40 ± 5.40 19.05 ± 5.19 21.29 ± 4.81# 25.33 ± 9.10#& < 0.001
Subscapular skinfold; mm
M 14.01 ± 5.32 19.58 ± 7.06# 19.56 ± 9.05# 22.93 ± 8.56# < 0.001
F 8.02 ± 3.29 9.47 ± 3.38 9.94 ± 2.48 12.76 ± 5.17#&$ < 0.001
Biceps skinfold; mm
M 6.69 ± 3.65 8.83 ± 3.80 9.01 ± 4.14# 9.34 ± 3.15# 0.001
F 11.22 ± 3.44 14.35 ± 6.19# 14.02 ± 3.75 16.33 ± 6.49# < 0.001
Triceps skinfold; mm
M 9.02 ± 3.71 11.16 ± 5.05 10.40 ± 4.34 11.48 ± 3.98# 0.019
F 14.82 ± 5.27 17.59 ± 6.08 19.26 ± 6.06# 21.30 ± 7.54# < 0.001
Suprailiac skinfold; mm
M 13.98 ± 5.96 18.98 ± 7.35# 16.62 ± 7.41 19.42 ± 7.99# 0.001
Table 2. A comparison of the anthropometric variables across different stage of the hepatic steatosis in
male and female participants. Abbreviation. BMI: body mass index. Data are presented as means ± standard
deviations. P values were obtained from analysis of variance (ANOVA), followed by Tukey’s post hoc test.
#
Significant difference (P < 0.05) compared with grade 0; &Significant difference (P < 0.05) compared with grade
1; $Significant difference (P < 0.05) compared with grade 2.
techniques on body composition and anthropometric measures as a less time-consuming and easy to undertake
method can help physicians in their clinical decision making.
The presence of liver fibrosis in patients with NAFLD is considered the strongest predictor of long-term
outcome53. NAFLD Fibrosis Score (NFS) and Fibrosis-4 (FIB-4) have been recommended as appropriate methods
for the initial assessment of fibrosis in NAFLD patients54. Both of these methods use a combination of variables
including age, BMI and biochemical measures (i.e. aspartate aminotransferase (AST), alanine aminotransferase
(ALT), platelets, etc.). Graupera et al. concluded that NFS and FIB-4 are not optimal for screening as they
correlate poorly with liver stiffness55. In their study, waist circumference was found to be the ideal measure for
fibrosis screening among high risk people from general p opulation55. However, other studies found that NFS and
FIB-4 have the potential to detect advanced fibrosis and the progression of fibrosis among people with N AFLD56.
It seems that NFS and FIB-4 are more useful in the diagnosis of fibrosis in NAFLD but not for fibrosis screening
among the general populations. The present study showed suboptimal accuracy (57%) in detecting fibrosis using
less expensive and non-invasive factors i.e. anthropometric and body composition measures. Further studies
might explore a combination of these methods including anthropometric, body composition and biochemical
variables altogether.
Vol.:(0123456789)
www.nature.com/scientificreports/
Hepatic fibrosis
Variables Gender Grade 0 Grade Ι Grade ΙΙ Grade ΙΙΙ–IV P-value
F 137 82 37 17 –
Number
M 156 59 14 15 –
F 34.78 ± 16.36 38.11 ± 17.09 49.33 ± 10.64# 50.12 ± 8.09# 0.005
Age, years
M 33.05 ± 14.87 36.42 ± 14.47 36.02 ± 15.02 42.15 ± 15.30 0.119
F 65.97 ± 12.73 72.39 ± 14.79# 81.56 ± 22.85# 81.77 ± 12.44# < 0.001
Weight; kg # #
M 76.25 ± 14.83 83.92 ± 15.62 90.08 ± 20.44 90.37 ± 18.95# < 0.001
F 157.78 ± 8.53 157.48 ± 6.70 157.55 ± 3.84 156.12 ± 6.49 0.9150
Height; cm
M 169.34 ± 11.82 171.77 ± 8.39 177.08 ± 11.01 171.46 ± 9.09 0.445
F 26.41 ± 4.21 29.07 ± 4.99# 32.91 ± 9.21# 33.46 ± 4.13# < 0.001
BMI; kg m−2
M 26.55 ± 3.66 28.36 ± 4.41# 31.28 ± 4.08#& 30.43 ± 4.36# < 0.001
F 29.60 ± 3.82 31.66 ± 4.24# 33.05 ± 7.32 33.25 ± 3.32 0.001
Arm circumference; cm
M 30.83 ± 3.10 32.34 ± 3.49# 33.52 ± 4.23# 33.53 ± 4.20# < 0.001
F 32.93 ± 2.27 33.89 ± 3.30 35.55 ± 3.08# 35.00 ± 3.33 0.003
Neck circumference; cm #
M 37.57 ± 2.84 38.98 ± 2.78 40.28 ± 2.88# 41.15 ± 4.35# < 0.001
F 94.30 ± 9.73 99.46 ± 10.27# 110.88 ± 16.56#& 108.18 ± 9.60# < 0.001
Chest circumference; cm
M 96.83 ± 11.55 100.68 ± 11.14 107.04 ± 8.37# 106.88 ± 10.37# < 0.001
F 87.93 ± 11.06 94.54 ± 12.68# 104.83 ± 17.32# 105.37 ± 11.66# < 0.001
Waist circumference; cm # #&
M 94.32 ± 9.73 99.82 ± 11.81 106.64 ± 9.77 104.23 ± 10.25# < 0.001
F 91.79 ± 10.98 99.01 ± 12.68# 108.27 ± 16.02# 108.37 ± 10.72# < 0.001
Abdomen circumference; cm
M 96.00 ± 9.25 101.15 ± 11.56# 108.52 ± 9.23#& 106.64 ± 10.55# < 0.001
F 101.34 ± 11.28 103.33 ± 12.13 108.11 ± 17.07 104.62 ± 20.85 0.346
Hip circumference; cm
M 101.22 ± 8.25 104.64 ± 8.24# 108.64 ± 8.76# 106.15 ± 8.87 < 0.001
F 15.49 ± 1.15 15.71 ± 1.22 16.71 ± 1.97 16.00 ± 0.70 0.052
Wrist circumference; cm
M 17.44 ± 1.15 17.61 ± 0.97 18.12 ± 1.22 17..36 ± 1.31 0.075
F 18.66 ± 6.34 21.34 ± 8.21 28.86 ± 15.23& 23.96 ± 5.39 0.001
Subscapular skinfold; mm
M 19.10 ± 7.03 19.56 ± 9.10 26.77 ± 10.79#& 22.00 ± 7.69 0.002
F 9.38 ± 3.27 10.88 ± 5.31 14.03 ± 8.77# 9.92 ± 2.94 0.013
Biceps skinfold; mm
M 8.71 ± 3.80 8.60 ± 3.48 8.99 ± 3.76 8.54 ± 3.37 0.979
F 13.23 ± 4.85 13.97 ± 4.62 18.40 ± 13.00 15.44 ± 3.92 0.76
Triceps skinfold; mm
M 10.70 ± 4.38 10.23 ± 3.77 12.95 ± 4.96 10.66 ± 3.16 0.081
F 17.15 ± 5.85 18.88 ± 8.27 23.97 ± 10.18# 19.92 ± 4.05 0.037
Suprailiac skinfold; mm
M 17.32 ± 6.57 17.05 ± 7.44 23.50 ± 11.31#& 15.77 ± 6.70 0.004
Table 3. A comparison of the anthropometric variables across different stage of the hepatic fibrosis in male
and female participants. BMI body mass index. Data are presented as means ± standard deviations. P values
were obtained from analysis of variance (ANOVA), followed by Tukey’s post hoc test. # Significant difference
(P < 0.05) compared with grade 0; &Significant difference (P < 0.05) compared with grade 1; $Significant
difference (P < 0.05) compared with grade 2.
The proposed algorithm identified in this research can be used by the health systems for several reasons.
Screening of the presence or absence of NAFLD with the help of non-invasive anthropometric measurements
can be achieved with simple and cheap equipment57. Moreover, performing the measurement task needs less
specialty knowledge therefore can be implemented in several health centres (e.g., primary practice) and also
remote areas. Once validated, the resulted assistive technology can serve the clinicians in the prevention of liver
diseases. There are limitations of the present research that need to be addressed. A small sample size might have
potentially limited the results of ML prediction. Although, the small sample size was accounted for by multiple
cross-validations, which reduced potential errors. Future studies with larger sample sizes can allocate separate
validation sets and evaluate the model. Moreover, even though the most common method for fatty liver diagnosis,
the ultrasound technique is not the gold standard. Using liver biopsy outcomes would generate more valid results.
Also, to increase the predictive accuracy of the proposed model for NAFLD prediction, future studies should
include other body composition and anthropometric measures such as sagittal abdominal diameter (SAD) and
peri-renal fat58.
Vol:.(1234567890)
www.nature.com/scientificreports/
Figure 1. Box plots showing different classification methods applied to the dataset for presence of fatty liver.
Box plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved
results are reliable.
Figure 2. Box plots showing different classification methods applied to the dataset for stages of steatosis. Box
plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved results
are reliable.
Vol.:(0123456789)
www.nature.com/scientificreports/
Figure 3. Box plots showing different classification methods applied to the dataset for stages of fibrosis. Box
plots are generated by performing 50 individual runs for each classifier. This will assure that the achieved results
are reliable.
Figure 4. Box plots showing relative feature importance for presence of fatty liver. hx history, cm centimeter, kg
kilograms, BMI body mass index, MUACmid-upper arm circumference.
Figure 5. Box plots showing relative feature importance for stages of steatosis. hx history, cm centimeter, kg
kilograms, BMI body mass index, MUACmid-upper arm circumference.
Vol:.(1234567890)
www.nature.com/scientificreports/
Figure 6. Box plots showing relative feature importance for stages of fibrosis. hx history, cm centimeter, kg
kilograms, BMI body mass index, MUACmid-upper arm circumference.
Table 4. Variable importance from the random forest method for fatty liver (presence of any stage).
Vol.:(0123456789)
www.nature.com/scientificreports/
Table 5. Variable Importance from the random forest method for steatosis stage.
Conclusion
Present findings show that applying a ML classification model on anthropometric and body composition variables
predicted the presence of fatty liver disease. ML-based decision support systems offer potential to assist physicians
with screening, diagnosis and prevention of NAFLD. ML-based decision support systems could be of particular
value for providing services at a population level and remote health care where there is a lack of trained specialists.
Vol:.(1234567890)
www.nature.com/scientificreports/
Table 6. Variable importance from the random forest method for fibrosis.
Data availability
The datasets used and/or analyzed in the current study are available from the corresponding author upon
reasonable request.
References
1. Aggarwal, A., Puri, K., Thangada, S., Zein, N. & Alkhouri, N. Nonalcoholic fatty liver disease in children: Recent practice guidelines,
where do they take us?. Curr. Pediatr. Rev. 10(2), 151–161 (2014).
2. Khashab, M. A., Liangpunsakul, S. & Chalasani, N. Nonalcoholic fatty liver disease as a component of the metabolic syndrome.
Curr. Gastroenterol. Rep. 10(1), 73–80 (2008).
3. Wagenknecht, L. E. et al. Correlates and heritability of nonalcoholic fatty liver disease in a minority cohort. Obesity 17(6), 1240–
1246 (2009).
4. Abdelmalek, M. F. & Diehl, A. M. Nonalcoholic fatty liver disease as a complication of insulin resistance. Med. Clin. North Am.
91(6), 1125–1149 (2007).
5. Milić, S. & Štimac, D. Nonalcoholic fatty liver disease/steatohepatitis: Epidemiology, pathogenesis, clinical presentation and
treatment. Dig. Dis. 30(2), 158–162 (2012).
6. Clark, J. M., Brancati, F. L. & Diehl, A. M. The prevalence and etiology of elevated aminotransferase levels in the United States.
Am. J. Gastroenterol. 98(5), 960–967 (2003).
7. Kim, W. R., Brown, R. S. Jr., Terrault, N. A. & El-Serag, H. Burden of liver disease in the United States: Summary of a workshop.
Hepatology 36(1), 227–242 (2002).
8. McCullough, A. J. Pathophysiology of nonalcoholic steatohepatitis. J. Clin. Gastroenterol. 40, S17–S29 (2006).
9. Chalasani, N. et al. The diagnosis and management of non-alcoholic fatty liver disease: Practice Guideline by the American
Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological
Association. Hepatology 55(6), 2005–2023 (2012).
10. Ertle, J. et al. Non-alcoholic fatty liver disease progresses to hepatocellular carcinoma in the absence of apparent cirrhosis. Int. J.
Cancer 128(10), 2436–2443 (2011).
11. Bellentani, S. & Marino, M. Epidemiology and natural history of non-alcoholic liver disease (NAFLD). Ann. Hepatol. 8(S1), 4–8
(2009).
12. Patton, H. M. et al. Pediatric nonalcoholic fatty liver disease: A critical appraisal of current data and implications for future research.
J. Pediatr. Gastroenterol. Nutr. 43(4), 413–427 (2006).
Vol.:(0123456789)
www.nature.com/scientificreports/
13. Shiotani, A., Motoyama, M., Matsuda, T. & Miyanishi, T. Brachial-ankle pulse wave velocity in Japanese university students. Intern.
Med. 44(7), 696–701 (2005).
14. Razmpour, F., Abbasi, B. & Ganji, A. Evaluating the accuracy and sensitivity of anthropometric and laboratory variables in
diagnosing the liver steatosis and fibrosis in adolescents with non-alcoholic fatty liver disease. J. Liver Res. Disord. Ther. 4(3),
121–125 (2018).
15. Bellentani, S. et al. Prevalence of and risk factors for hepatic steatosis in Northern Italy. Ann. Intern. Med. 132(2), 112–119 (2000).
16. Omagari, K. et al. Fatty liver in non-alcoholic non-overweight Japanese adults: Incidence and clinical characteristics. J.
Gastroenterol. Hepatol. 17(10), 1098–1105 (2002).
17. Shaw, N. J., Crabtree, N. J., Kibirige, M. S. & Fordham, J. N. Ethnic and gender differences in body fat in British schoolchildren as
measured by DXA. Arch. Dis. Child. 92(10), 872–875 (2007).
18. Chumlea, W. C., Siervogel, R., Roche, A. F., Webb, P. & Rogers, E. Increments across age in body composition for children 10 to
18 years of age. Hum. Biol. 55, 845–852 (1983).
19. Van der Sluis, I., De Ridder, M., Boot, A., Krenning, E. & de Muinck, K.-S. Reference data for bone density and body composition
measured with dual energy x ray absorptiometry in white children and young adults. Arch. Dis. Child. 87(4), 341–347 (2002).
20. Alferink, L. J. M. et al. Nonalcoholic fatty liver disease in the Rotterdam study: About muscle mass, sarcopenia, fat mass, and fat
distribution. J. Bone Miner. Res. 34(7), 1254–1263 (2019).
21. He, Q. et al. Sex and race differences in fat distribution among Asian, African-American, and Caucasian prepubertal children. J.
Clin. Endocrinol. Metab. 87(5), 2164–2170 (2002).
22. Płudowski, P., Matusik, H., Olszaniecka, M., Lebiedowski, M. & Lorenc, R. S. Reference values for the indicators of skeletal and
muscular status of healthy Polish children. J. Clin. Densitom. 8(2), 164–177 (2005).
23. Yang, K. C. et al. Association of non-alcoholic fatty liver disease with metabolic syndrome independently of central obesity and
insulin resistance. Sci. Rep. 6(1), 1–10 (2016).
24. Balakrishnan, M. et al. Obesity and risk of nonalcoholic fatty liver disease: A comparison of bioelectrical impedance analysis and
conventionally-derived anthropometric measures. Clin. Gastroenterol. Hepatol. 15(12), 1965–1967 (2017).
25. Brambilla, P., Bedogni, G., Heo, M. & Pietrobelli, A. Waist circumference-to-height ratio predicts adiposity better than body mass
index in children and adolescents. Int. J. Obes. 37(7), 943–946 (2013).
26. Huang, B.-A. et al. Neck circumference, along with other anthropometric indices, has an independent and additional contribution
in predicting fatty liver disease. PLoS One 10(2), e0118071 (2015).
27. Sookoian, S. & Pirola, C. J. Systematic review with meta-analysis: Risk factors for non-alcoholic fatty liver disease suggest a shared
altered metabolic and cardiovascular profile between lean and obese patients. Aliment. Pharmacol. Ther. 46(2), 85–95 (2017).
28. Stabe, C. et al. Neck circumference as a simple tool for identifying the metabolic syndrome and insulin resistance: Results from
the Brazilian Metabolic Syndrome Study. Clin. Endocrinol. 78(6), 874–881 (2013).
29. Subramanian, V., Johnston, R., Kaye, P. & Aithal, G. Regional anthropometric measures associated with the severity of liver injury
in patients with non-alcoholic fatty liver disease. Aliment. Pharmacol. Ther. 37(4), 455–463 (2013).
30. Borruel, S. et al. Surrogate markers of visceral adiposity in young adults: Waist circumference and body mass index are more
accurate than waist hip ratio, model of adipose distribution and visceral adiposity index. PLoS One 9(12), e114112 (2014).
31. Rankinen, T., Kim, S., Perusse, L., Despres, J. & Bouchard, C. The prediction of abdominal visceral fat level from body composition
and anthropometry: ROC analysis. Int. J. Obes. 23(8), 801 (1999).
32. Lee, S. S. & Park, S. H. Radiologic evaluation of nonalcoholic fatty liver disease. World J. Gastroenterol. WJG 20(23), 7392 (2014).
33. EskandarNejad, M. Correlation of perceived body image and physical activity in women and men according to the different levels
of Body Mass Index (BMI). J. Health Promot. Manag. 2, 59–40 (2013).
34. Belghaisi-Naseri, M. et al. Plasma levels of vascular endothelial growth factor and its soluble receptor in non-alcoholic fatty liver.
J. Fast. Health (2018).
35. Dehnavi, Z. et al. Fatty Liver Index (FLI) in predicting non-alcoholic fatty liver disease (NAFLD). Hepat. Mon. 18(2) (2018).
36. Birjandi, M., Ayatollahi, S. M. T., Pourahmad, S. & Safarpour, A. R. Prediction and diagnosis of non-alcoholic fatty liver disease
(NAFLD) and identification of its associated factors using the classification tree method. Iran. Red Crescent Med. J. 18(11) (2016).
37. Islam, M., Wu, C.-C., Poly, T. N., Yang, H.-C. & Li, Y.-C.J. Applications of machine learning in fatty live disease prediction. Building
Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth 166–170 (IOS Press, 2018).
38. Ma, H., Xu, C.-F., Shen, Z., Yu, C.-H. & Li, Y.-M. Application of machine learning techniques for clinical predictive modeling: A
cross-sectional study on nonalcoholic fatty liver disease in China. BioMed Res. Int. 2018 (2018).
39. Wu, C.-C. et al. Prediction of fatty liver disease using machine learning algorithms. Comput. Methods Programs Biomed. 170, 23–29
(2019).
40. Gaia, S. et al. Reliability of transient elastography for the detection of fibrosis in non-alcoholic fatty liver disease and chronic viral
hepatitis. J. Hepatol. 54(1), 64–71 (2011).
41. Sasso, M. et al. Controlled attenuation parameter (CAP): A novel VCTE™ guided ultrasonic attenuation measurement for the
evaluation of hepatic steatosis: Preliminary study and validation in a cohort of patients with chronic liver disease from various
causes. Ultrasound Med. Biol. 36(11), 1825–1835 (2010).
42. Hsu, C. et al. Magnetic resonance vs transient elastography analysis of patients with nonalcoholic fatty liver disease: A systematic
review and pooled analysis of individual participants. Clin. Gastroenterol. Hepatol. 17(4), 630–637 (2019).
43. Shamsi, A. et al. An uncertainty-aware transfer learning-based framework for COVID-19 diagnosis. IEEE Trans. Neural Netw.
Learn. Syst. 32(4), 1408–1417 (2021).
44. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
45. Noor, N. M. et al. (eds) (Trans Tech Publ, 2015).
46. Norazian, M. N. Comparison of linear interpolation method and mean method to replace the missing values in environmental
data set (2007).
47. Cunningham, J. P. & Ghahramani, Z. Linear dimensionality reduction: Survey, insights, and generalizations. J. Mach. Learn. Res.
16(1), 2859–2900 (2015).
48. Onat, A. et al. Neck circumference as a measure of central obesity: Associations with metabolic syndrome and obstructive sleep
apnea syndrome beyond waist circumference. Clin. Nutr. 28(1), 46–51 (2009).
49. Rafiei, R., Fouladi, L. & Torabi, Z. Which component of metabolic syndrome is the most important one in development of colorectal
adenoma?
50. Albhaisi, S. Noninvasive imaging modalities in nonalcoholic fatty liver disease: Where do we stand?. EMJ 4(3), 57–62 (2019).
51. Ferraioli, G. & Monteiro, L. B. S. Ultrasound-based techniques for the diagnosis of liver steatosis. World J. Gastroenterol. 25(40),
6053 (2019).
52. Khov, N., Sharma, A. & Riley, T. R. Bedside ultrasound in the diagnosis of nonalcoholic fatty liver disease. World J. Gastroenterol.
WJG 20(22), 6821 (2014).
53. Angulo, P. et al. Liver fibrosis, but no other histologic features, is associated with long-term outcomes of patients with nonalcoholic
fatty liver disease. Gastroenterology 149(2), 389-397.e10 (2015).
54. Lee, J. et al. Prognostic accuracy of FIB-4, NAFLD fibrosis score and APRI for NAFLD-related events: A systematic review. Liver
Int. 41(2), 261–270 (2021).
Vol:.(1234567890)
www.nature.com/scientificreports/
55. Graupera, I. et al. Low accuracy of FIB-4 and NAFLD fibrosis scores for screening for liver fibrosis in the population. Clin.
Gastroenterol. Hepatol. 20(11), 2567–76.e6 (2022).
56. Siddiqui, M. S. et al. Diagnostic accuracy of noninvasive fibrosis models to detect change in fibrosis stage. Clin. Gastroenterol.
Hepatol. 17(9), 1877–85.e5 (2019).
57. Eaton-Evans, J. Nutritional assessment | Anthropometry (2005).
58. Vitturi, N. et al. Ultrasound, anthropometry and bioimpedance: A comparison in predicting fat deposition in non-alcoholic fatty
liver disease. Eat. Weight Disord. Stud. Anorex. Bulimia Obes. 20(2), 241–247 (2015).
Acknowledgements
We are grateful to all participants and assistants of this study.
Author contributions
Study concept and design: F.R. and M.R. Analysis and interpretation of data: H.A., A.S., R.D., G.S. and D.S.
Drafting the manuscript: R.D., M.N., S.M.S.I. Critical revision of the manuscript for important intellectual
content: F.R., R.M., S.M.S.I., R.D. Study supervision: F.R. All authors provided final approval of the version to
be published and agreed to be accountable for all aspects of the work.
Funding
The study was supported by Mashhad University of Medical Sciences, Mashhad, Iran (Code: IR.MUMS.
fm.REC.1395.64). The funder was not involved in the study design, data analysis and interpretation, or
manuscript writing.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary Information The online version contains supplementary material available at https://doi.org/
10.1038/s41598-023-32129-y.
Correspondence and requests for materials should be addressed to F.R. or R.D.-K.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Vol.:(0123456789)