Abstract: | BACKGROUND: Fatty liver disease (FLD) has become a rampant condition. It is associated with a high rate of morbidity and mortality in a population. The condition is commonly referred as FLD. Early prediction of FLD would allow patients to take necessary preventive, diagnosis, and treatment. The main objective of this research is to develop a machine learning (ML) model to predict FLD that can help medics to classify individuals at high risk of FLD, make novel diagnosis, management, and prevention for FLD. METHODS: Total of 3,419 subjects were recruited with 845 having been screened for FLD. Classification models were used in the detection of the disease. These models include logistic regression (LR), random forest (RF), artificial neural networks (ANNs), k-nearest neighbors (KNNs), extreme gradient boosting (XGBoost), and linear discriminant analysis (LDA). Predictive accuracy was assessed by area under curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value. RESULTS: We demonstrated that ML models give more accurate predictions, the best accuracy reached to 0.9415 in the XGBoost model. Feature importance analysis not only confirmed some well-known FLD risk factors, but also demonstrated several novel features for predicting the risk of FLD, such as hemoglobin. CONCLUSION: By implementing the XGBoost model, physicians can efficiently identify FLD in general patients; this would help in prevention, early treatment, and management of FLD. |