This is the code for the paper "Prediction and interpretation of antibiotic-resistance genes occurrence at recreational beaches using machine learning models" authored by Sara Iftikhar, Asad Mustafa Karim, Aoun Murtaza Karim, Mujahid Aizaz Karim, Muhammad Aslam, Fazila Rubab, Sumera Kausar Malik, Jeong Eun Kwon, Imran Hussain, Esam I. Azhar, Se Chan Kang and Muhammad Yasir. Folder "data" in "scripts" folder contains the Pakistan data used in this study to build this model in addition with Busan data which can be found on https://github.com/AtrCheema/ai4water-datasets.
Fully reproducible code is available in "scripts" folder, which further contains folder for each ARG considered in this study. For "aac_coppml(6โฒ-Ib-cr)", the folder named "aac" contains files for cross validation and interpretation methods. File named "aac_cross_validation.py" was used to apply 10-Fold timeseries cross validation. Files "ale_results.py", "lime_results.py", "pdp_results.py", "permutation_results.py", "sensitivity_results.py", and "shap_results.py" were used to generate results for Accumulated Local Effects (ALE), Local Interpretable Model 90 Explanations (LIME), Partial Dependance PLots (PDP), Permutation Analysis, Sensitivity Analysis and Shaply Additive exPlanations (SHAP) respectively.
The codes and other relevent files for "sul1_coppml" and "tetx_coppml" are available in their respective folders.