Abstract:Accurate acquisition of soil moisture dynamics is crucial for hydrological process modeling, precision agriculture implementation, and climate change impact studies. This study aimed to predict soil water content at a 5 cm depth. Utilizing high-frequency in-situ monitoring data of layered soil temperature within 0-10 cm depths (0, 1, 3, 5, 10 cm) and surface net radiation, a multi-dimensional feature dataset was constructed by extracting physical features such as phase, amplitude, and daily temperature range through harmonic analysis and statistical analysis. A systematic comparison of the predictive performance of six machine learning models: Ridge Regression, Lasso Regression, Support Vector Machines, Random Forests, Gradient Boosting Decision Tree, and eXtreme Gradient Boosting (XGBoost). Model generalization capabilities were evaluated by dividing the dataset into time-series segments. The results indicated that the XGBoost model performed relatively well, achieving a coefficient of determination (R2) of 0.565 on the independent test set, outperforming linear models and the Support Vector Machine. Further comparison of different input datasets reveals that the dataset of daily-scale feature parameters extracted from physical processes significantly outperforms the dataset using half-hourly resolution observations in terms of prediction accuracy, indicating that feature extraction can effectively filter noise and focus on the core mechanism of hydrothermal coupling. Feature importance analysis revealed that thermal dynamic features from the surface soil layers (0-3 cm) were key drivers for the model, with higher importance than those from the target layer (5 cm), confirming the vertical transmission pattern of soil hydrothermal coupling. Models relying solely on temporal features failed completely, demonstrating that the seasonal background provided by time information must work in synergy with physical features reflecting diurnal fluctuations to achieve effective prediction. The proposed approach of combining "physical feature extraction and XGBoost" in this study enhances the generalizability and interpretability of soil moisture time series prediction, providing a methodological reference for deriving shallow soil moisture from easily observable physical quantities.