Investigating Fine-tuning Strategies in Statistical and Machine Learning Models for Developing Region-Specific Potato Yield Prediction Models

Accurate crop yield prediction, especially for potatoes with high price volatility, is essential for informed market decisions, maintaining supply–demand balance, preventing distressed sales, and stabilizing prices. As weather impacts crop yield more significantly at the regional level, district-level modelling is ideal. Accurate predictions require proper model selection, data preparation, feature selection and parameter tuning. This study introduced path coefficient-based weather indices alongside correlation-based indices that weigh weekly weather based on its true effect on potato yield. Three penalized regression models (Ridge regression, LASSO and Elastic Net) and two machine learning (ML) models (artificial neural networks [ANN] and support vector regression [SVR]) were evaluated. To prevent overfitting, stepwise regression, principal component analysis, and partial least squares regression (PLSR) were employed for feature selection. The ANN model was tested under three different nonlinear activation functions with optimized learning rates and hidden layer neurons, while the SVR model was tested with three different kernel functions and optimized hyperparameters. The proposed path coefficient-based indices improved yield prediction alongside correlation-based indices. ANN and SVR models with PLSR-selected features outperformed others with the lowest prediction errors. The best-performing configurations were ANN with tangent hyperbolic activation function, learning rate below 0.1 and fewer hidden-layer neurons than input layers and SVR with radial basis function kernel and optimized hyperparameters. This systematic and comprehensive approach, which combines the construction of information-rich weather indices-based input, PLSR-based feature selection and fine-tuning of ML models, effectively captures complex nonlinear weather-crop interactions and is crucial for developing robust location-specific potato yield prediction models.