Developing automated machine learning approach for fast and robust crop yield prediction using a fusion of remote sensing, soil, and weather dataset


Views
0% 0
Downloads
0 0%
CC-BY-4.0

Citation

Ahmed M. S. Kheir, Ajit Govind, Vinay Nangia, Mina Devkota Wasti, Abdelrazek Elnashar, Mohie Omar, Til Feike. (25/4/2024). Developing automated machine learning approach for fast and robust crop yield prediction using a fusion of remote sensing, soil, and weather dataset. Environmental Research Communications, 6 (4).
Estimating smallholder crop yields robustly and timely is crucial for improving agronomic practices, determining yield gaps, guiding investment, and policymaking to ensure food security. However, there is poor estimation of yield for most smallholders due to lack of technology, and field scale data, particularly in Egypt. Automated machine learning (AutoML) can be used to automate the machine learning workflow, including automatic training and optimization of multiple models within a userspecified time frame, but it has less attention so far. Here, we combined extensive field survey yield across wheat cultivated area in Egypt with diverse dataset of remote sensing, soil, and weather to predict field-level wheat yield using 22 Ml models in AutoML. The models showed robust accuracies for yield predictions, recording Willmott degree of agreement, (d>0.80) with higher accuracy when super learner (stacked ensemble) was used (R2=0.51, d=0.82). The trained AutoML was deployed to predict yield using remote sensing (RS) vegetative indices (VIs), demonstrating a good correlation with actual yield (R2=0.7). This is very important since it is considered a low-cost tool and could be used to explore early yield predictions. Since climate change has negative impacts on agricultural production and food security with some uncertainties, AutoML was deployed to predict wheat yield under recent climate scenarios from the Coupled Model Intercomparison Project Phase 6 (CMIP6). These scenarios included single downscaled General Circulation Model (GCM) as CanESM5 and two shared socioeconomic pathways (SSPs) as SSP2-4.5and SSP5-8.5during the mid-term period (2050). The stacked ensemble model displayed declines in yield of 21% and5%under SSP5-8.5 and SSP2-4.5 respectively during mid-century, with higher uncertainty under the highest emission scenario (SSP5- 8.5). The developed approach could be used as a rapid, accurate and low-cost method to predict yield for stakeholder farms all over the world where ground data is scarce.