Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration

cg.contactlamanehoudageo@gmail.comen_US
cg.contributor.centerInternational Center for Agricultural Research in the Dry Areas - ICARDAen_US
cg.contributor.centerHassan II University - UH2Cen_US
cg.contributor.centerInstitute of Agronomy and Veterinary Hassan II - IAV HASSAN IIen_US
cg.contributor.centerNational Institute of Agronomic Research Morocco - INRA Moroccoen_US
cg.contributor.centerTechnical University of Applied Sciences Lübeck - ULUBECKen_US
cg.contributor.centerIlia State University - ISU Georgiaen_US
cg.contributor.centerRiver Basin Agency of Bouregreg and Chaouia - AABHBCen_US
cg.contributor.funderNot Applicableen_US
cg.contributor.projectCODIS - Corporate-Communication and Documentation Information Servicesen_US
cg.contributor.project-lead-instituteInternational Center for Agricultural Research in the Dry Areas - ICARDAen_US
cg.coverage.countryMAen_US
cg.coverage.regionNorthern Africaen_US
cg.identifier.doihttps://doi.org/10.1016/j.ijsrc.2024.10.002en_US
cg.isijournalISI Journalen_US
cg.issn1001-6279en_US
cg.issue1en_US
cg.journalInternational Journal of Sediment Researchen_US
cg.reviewStatusPeer Reviewen_US
cg.subject.agrovocsoil erosionen_US
cg.volume40en_US
dc.contributorMouhir, Latifaen_US
dc.contributorMoussadek, Rachiden_US
dc.contributorBaghdad, Bouamaren_US
dc.contributorKisi, Ozguren_US
dc.contributorEl Bilali, Alien_US
dc.creatorLamane, Houdaen_US
dc.date.accessioned2025-09-18T20:06:08Z
dc.date.available2025-09-18T20:06:08Z
dc.description.abstractMachine learning (ML) has become a powerful tool for predicting suspended sediment concentration (SSC). Nonetheless, the ability to interpret the physical process is considered the main issue in applying most of ML approaches. In this regard, the current study presents a novel framework involving four standalone ML models (extra trees (ET), random forest (RF), categorical boosting (CatBoost), and extreme gradient boosting (XGBoost)) and their combination with genetic programming (GP). Three metrics (coefficient of correlation (r), root mean square error (RMSE), and Nash–Sutcliffe model-fit efficiency (NSE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP) are used to assess the performance of these models applied to hydro-climatic datasets for prediction of SSC. The calibration process was based on data from 2016 to 2020, and the validation was done for 2021 data. Further description and application of the framework are provided based on a case study of the Bouregreg watershed. The results revealed that all implemented models are efficient in SSC prediction with NSE, RMSE, and r varying from 0.53 to 0.86, 1.20–2.55 g/L, and 0.83–0.91 g/L respectively. Box plot diagrams confirm the enhanced performance of these combined models, and the best-performing ones for the four hydrological stations being the combined RF + GP model at the Aguibat Ziar station, the combined XGBoost + GP model at the Ain Loudah station, the CatBoost model at the Ras Fathia station, and the RF model at the Sidi Med Cherif station. The interpretability results showed that flow (Q) and seasonality (S) are the features most impacting SSC. These outcomes indicate that the applied models can extract accurate and detailed information from the interactions between the hydroclimatic factors and the generation of sediment by erosion (output). ML approaches illustrated the good reliability and transparency of the models developed for predicting SSC in a semi-arid setting, offered new perspectives for reducing ML models' “black box” character, and provided a useful source of information for assessing the consequences of SSC on water quality. The SHAP system and exploring other interpretable techniques are recommended to provide further information in future research. In addition, incorporating additional input data could enhance SSC predictions and deepen understanding of sediment transport dynamics.en_US
dc.formatPDFen_US
dc.identifierhttps://mel.cgiar.org/reporting/downloadmelspace/hash/77ca19aa28b7ffed9db50ab6ddde4f63en_US
dc.identifier.citationHouda Lamane, Latifa Mouhir, Rachid Moussadek, Bouamar Baghdad, Ozgur Kisi, Ali El Bilali. (1/2/2025). Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration. International Journal of Sediment Research, 40 (1), pp. 91-107.en_US
dc.identifier.statusOpen accessen_US
dc.identifier.urihttps://hdl.handle.net/20.500.11766/70118
dc.languageenen_US
dc.publisherKeAi Communicationsen_US
dc.rightsCC-BY-NC-ND-4.0en_US
dc.sourceInternational Journal of Sediment Research;40,(2025) Pagination 91-107en_US
dc.subjectmachine learning (ml)en_US
dc.subjectinterpretabilityen_US
dc.subjectshapley valuesen_US
dc.subjectsuspended sediment concentration (ssc)en_US
dc.subjectbouregreg watershed (bw)en_US
dc.titleInterpreting machine learning models based on SHAP values in predicting suspended sediment concentrationen_US
dc.typeJournal Articleen_US
dcterms.available2025-01-11en_US
dcterms.extent91-107en_US
dcterms.hasVersionV3 - 2025-09-18en_US
dcterms.issued2025-02-01en_US
mel.impact-factor3.7en_US

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description: