Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration

Lamane, Houda; Mouhir, Latifa; Moussadek, Rachid; Baghdad, Bouamar; Kisi, Ozgur; El Bilali, Ali

doi:https://doi.org/10.1016/j.ijsrc.2024.10.002

Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration

cg.contact	lamanehoudageo@gmail.com	en_US
cg.contributor.center	International Center for Agricultural Research in the Dry Areas - ICARDA	en_US
cg.contributor.center	Hassan II University - UH2C	en_US
cg.contributor.center	Institute of Agronomy and Veterinary Hassan II - IAV HASSAN II	en_US
cg.contributor.center	National Institute of Agronomic Research Morocco - INRA Morocco	en_US
cg.contributor.center	Technical University of Applied Sciences Lübeck - ULUBECK	en_US
cg.contributor.center	Ilia State University - ISU Georgia	en_US
cg.contributor.center	River Basin Agency of Bouregreg and Chaouia - AABHBC	en_US
cg.contributor.funder	Not Applicable	en_US
cg.contributor.project	CODIS - Corporate-Communication and Documentation Information Services	en_US
cg.contributor.project-lead-institute	International Center for Agricultural Research in the Dry Areas - ICARDA	en_US
cg.coverage.country	MA	en_US
cg.coverage.region	Northern Africa	en_US
cg.identifier.doi	https://doi.org/10.1016/j.ijsrc.2024.10.002	en_US
cg.isijournal	ISI Journal	en_US
cg.issn	1001-6279	en_US
cg.issue	1	en_US
cg.journal	International Journal of Sediment Research	en_US
cg.reviewStatus	Peer Review	en_US
cg.subject.agrovoc	soil erosion	en_US
cg.volume	40	en_US
dc.contributor	Mouhir, Latifa	en_US
dc.contributor	Moussadek, Rachid	en_US
dc.contributor	Baghdad, Bouamar	en_US
dc.contributor	Kisi, Ozgur	en_US
dc.contributor	El Bilali, Ali	en_US
dc.creator	Lamane, Houda	en_US
dc.date.accessioned	2025-09-18T20:06:08Z
dc.date.available	2025-09-18T20:06:08Z
dc.description.abstract	Machine learning (ML) has become a powerful tool for predicting suspended sediment concentration (SSC). Nonetheless, the ability to interpret the physical process is considered the main issue in applying most of ML approaches. In this regard, the current study presents a novel framework involving four standalone ML models (extra trees (ET), random forest (RF), categorical boosting (CatBoost), and extreme gradient boosting (XGBoost)) and their combination with genetic programming (GP). Three metrics (coefficient of correlation (r), root mean square error (RMSE), and Nash–Sutcliffe model-fit efficiency (NSE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP) are used to assess the performance of these models applied to hydro-climatic datasets for prediction of SSC. The calibration process was based on data from 2016 to 2020, and the validation was done for 2021 data. Further description and application of the framework are provided based on a case study of the Bouregreg watershed. The results revealed that all implemented models are efficient in SSC prediction with NSE, RMSE, and r varying from 0.53 to 0.86, 1.20–2.55 g/L, and 0.83–0.91 g/L respectively. Box plot diagrams confirm the enhanced performance of these combined models, and the best-performing ones for the four hydrological stations being the combined RF + GP model at the Aguibat Ziar station, the combined XGBoost + GP model at the Ain Loudah station, the CatBoost model at the Ras Fathia station, and the RF model at the Sidi Med Cherif station. The interpretability results showed that flow (Q) and seasonality (S) are the features most impacting SSC. These outcomes indicate that the applied models can extract accurate and detailed information from the interactions between the hydroclimatic factors and the generation of sediment by erosion (output). ML approaches illustrated the good reliability and transparency of the models developed for predicting SSC in a semi-arid setting, offered new perspectives for reducing ML models' “black box” character, and provided a useful source of information for assessing the consequences of SSC on water quality. The SHAP system and exploring other interpretable techniques are recommended to provide further information in future research. In addition, incorporating additional input data could enhance SSC predictions and deepen understanding of sediment transport dynamics.	en_US
dc.format	PDF	en_US
dc.identifier	https://mel.cgiar.org/reporting/downloadmelspace/hash/77ca19aa28b7ffed9db50ab6ddde4f63	en_US
dc.identifier.citation	Houda Lamane, Latifa Mouhir, Rachid Moussadek, Bouamar Baghdad, Ozgur Kisi, Ali El Bilali. (1/2/2025). Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration. International Journal of Sediment Research, 40 (1), pp. 91-107.	en_US
dc.identifier.status	Open access	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11766/70118
dc.language	en	en_US
dc.publisher	KeAi Communications	en_US
dc.rights	CC-BY-NC-ND-4.0	en_US
dc.source	International Journal of Sediment Research;40,(2025) Pagination 91-107	en_US
dc.subject	machine learning (ml)	en_US
dc.subject	interpretability	en_US
dc.subject	shapley values	en_US
dc.subject	suspended sediment concentration (ssc)	en_US
dc.subject	bouregreg watershed (bw)	en_US
dc.title	Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration	en_US
dc.type	Journal Article	en_US
dcterms.available	2025-01-11	en_US
dcterms.extent	91-107	en_US
dcterms.hasVersion	V3 - 2025-09-18	en_US
dcterms.issued	2025-02-01	en_US
mel.impact-factor	3.7	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Agricultural Research Knowledge