Soil moisture governs surface water evaporation, runoff, and the energy exchange between land and atmosphere. During drought conditions, soil moisture levels remain persistently low, while prior to heavy rainfall events, the initial soil water content directly influences flood formation. Despite the importance of this variable, traditional observation methods carry significant drawbacks: ground-based monitoring stations are sparse and unevenly distributed, satellite remote sensing is susceptible to cloud interference, and numerical weather models carry substantial computational costs as well as systematic biases.
The new dataset, published in the journal Advances in Atmospheric Sciences, is designated CSMX and enables daily monitoring of soil dryness and wetness conditions across China. It provides critical support for drought early warning, flood forecasting, and agricultural management.
The team trained a CatBoost machine learning model using daily data from more than 2,300 automated soil moisture observation stations operated by the China Meteorological Administration (CMA). The modeling approach innovatively incorporated feature selection and automated hyperparameter optimization techniques to improve accuracy and generalizability.
In benchmark comparisons, the CSMX dataset outperforms the vast majority of existing soil moisture products in terms of bias correction. Most notably, it significantly mitigates the long-standing "wet bias" problem found in reanalysis data, a systematic overestimation of soil moisture that has been especially pronounced in southern China.
"Our model significantly reduces soil moisture estimation errors while preserving the temporal evolution characteristics of soil humidity," said Prof. Huiling Yuan, the corresponding author of the study.
The dataset has been made publicly available through the Tibetan Plateau Data Center. The research team identifies three primary application domains. In flood forecasting, CSMX provides more accurate antecedent soil moisture conditions for hydrological models, improving predictions of how saturated soils will respond to incoming precipitation. In land-atmosphere interaction research, the dataset supports improved simulation of land surface processes. For agricultural drought monitoring, it enables early identification of drought risks affecting crops before they become critical.
"This dataset is particularly well-suited for capturing extreme events such as 'rapid transitions between droughts and floods'," said Yifan Dong, a PhD candidate and the lead author of the study, highlighting the operational value of daily temporal resolution at fine spatial scales.
The fusion framework draws on multi-source data inputs, integrating ground station observations with satellite retrievals and reanalysis fields to produce a spatially continuous and temporally consistent product at 1 km resolution across the full national domain.
Research Report:China's 1 km Daily Surface Soil Moisture Fusion Dataset (2000-2025) Based on Explainable Machine Learning
Related Links
Institute of Atmospheric Physics, Chinese Academy of Sciences
Farming Today - Suppliers and Technology
| Subscribe Free To Our Daily Newsletters |
| Subscribe Free To Our Daily Newsletters |