Show simple item record

dc.contributor.authorMike, Nsubuga
dc.contributor.authorNsubuga, Mike
dc.date.accessioned2024-02-28T08:06:49Z
dc.date.available2024-02-28T08:06:49Z
dc.date.issued2023
dc.identifier.urihttp://hdl.handle.net/10570/13162
dc.description.abstractBackground: Antimicrobial resistance (AMR) is a significant global health threat, particularly impacting low- and middle-income countries(LMICS) such as Uganda, where reliable and rapid methods for detecting AMR in E. coli and other pathogens are scarce. This lack can lead to inappropriate treatment and the spread of drug-resistant infections. This thesis undertakes a comprehensive study, where various machine learning models to predict AMR in E. coli for ciprofloxacin(CIP), ampicillin(AMP), and cefotaxime(CTX) were trained on whole genome sequencing (WGS) data from England where such data is more readily available. A separate Ugandan dataset was used for validation purposes, further demonstrating the generalizability and effectiveness of the models in LMICS. Methods: 1496 (CIP), 1428 (CTX), and 1396 (AMP) sequences from England were divided into training and testing. 42 from Uganda were used for validation. Eight different machine learning models were trained and tested: Logistic Regression(LR), Random Forest(RF), Gradient Boosting(GB), XGBoost(XGB), LightGBM(LGBM), CatBoost(CB), Feed-Forward Neural Network(FFNN), and Support Vector Machine(SVM). The models were evaluated based on precision, recall, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Upsampling techniques were implemented to address class imbalance in the data. Results: Model predictive performance varied significantly across different antibiotics, underlining the critical role of model selection and dataset characteristics. Notably, the FFNN model demonstrated superior performance during testing for CIP (accuracy 84%; F1 0.55; AUC 91%), LR for CTX (accuracy 91%; F1 0.37; AUC 83%) and GB for AMP (accuracy 57%; F1 0.62, AUC 53%), while the LGBM and RF models outperformed others in same scenarios (p < 0.001). Upsampling did not significantly improve the models' performance, underscoring the complexity and high-dimensionality of SNP data. Despite high accuracy scores with the Ugandan validation dataset(FFNN with CIP accuracy 95%, LR with AMP accuracy 98% and GB with CTX accuracy 65%), the models struggled with the recall metric due to severe class imbalance. Key mutations associated with antimicrobial resistance were identified for these antibiotics. Conclusion: As the threat of AMR continues to rise, the successful application of these models - particularly on the Ugandan dataset, signals a promising avenue for improving AMR detection and treatment strategies in LMICS were genomic data is scarce. This work thus not only expands our current understanding of the genetic underpinnings of AMR but also provides a robust methodological framework that can guide future research and applications in the fight against antimicrobial resistance.en_US
dc.description.sponsorshipThe author was funded by the East African Network for Bioinformatics Training (EANBIT) under Fogarty International Center at the U.S. National Institutes of Health (NIH) under award number U2RTW010677 as a Masters scholar. The author would also like to acknowledge the Open Science Grid (OSG) consortium which provided computational resources to carry out this study. The OSG is supported by the National Science Foundation award number 2030508 and 1836650.en_US
dc.language.isoenen_US
dc.publisherMakerere Universityen_US
dc.subjectAntimicrobial Resistanceen_US
dc.subjectAMRen_US
dc.subjectMachine Learningen_US
dc.subjectGenomicsen_US
dc.subjectMLen_US
dc.subjectE. colien_US
dc.subjectEscherichia Colien_US
dc.subjectAntibiotics drugsen_US
dc.titleA machine learning approach to predict E. coli antibacterial resistance using whole-genome sequencing dataen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record