
This is the official repository for the paper “CN-AEBench: Hourly Atmospheric Environmental Dataset for China (Since 2023) by Fusing Station Monitoring Data and ECMWF IFS Data”.
Dataset validity (extreme events) research and visualization, benchmark test for details, please visit: https://aiweather126.github.io/CN-AEBench/pages
To download the datase, please visit: https://huggingface.co/datasets/AIWeather126/CN-AEBench
CN-AEBench is a comprehensive multi-source atmospheric & environmental dataset integrating ground meteorological observations, environmental monitoring, and ECMWF NWP forecast data.

📊 Data Coverage: 2023-2025-Future
🌍 Meteorological Stations: 2,250+; Environmental Monitoring Station: ~2600
⏱️ Resolution: 1 Hour
| No. | Variable Name | Unit | Description |
|---|---|---|---|
| 1 | elevation | m | Station elevation |
| 2 | lon | degree | Station longitude |
| 3 | lat | degree | Station latitude |
| 4 | station_province | – | Province where station is located |
| 5 | station_city | – | City where station is located |
| 6 | station_id | – | Station identifier |
| 7 | type | – | Land use type at station location |
| 8 | ndvi | (-1 ~ 1) | ndvi value at station location |
| No. | Variable Name | Unit | Description |
|---|---|---|---|
| 1 | ws_2min | m/s | 2-minute average wind speed |
| 2 | ws_10min | m/s | 10-minute average wind speed |
| 3 | wd_2min | degree | 2-minute average wind direction |
| 4 | wd_10min | degree | 10-minute average wind direction |
| 5 | wd_instant | degree | Instantaneous wind direction |
| 6 | ws_instant | m/s | Instantaneous wind speed |
| 7 | vis | m | Horizontal visibility |
| 8 | t | °C | Air temperature |
| 9 | dt | °C | Dew point temperature |
| 10 | precipitation | mm | Hourly precipitation |
| 11 | rh | % | Relative humidity |
| 12 | p | hPa | Atmospheric pressure |
| 13 | slp | hPa | Sea level pressure |
| 14 | vapor | hPa | Vapor pressure |
| 15 | phenomena | – | Weather phenomena |
| 16 | ec_vis | m | NWP horizontal visibility |
| 17 | ec_sh2 | kg/kg | NWP 2m specific humidity |
| 18 | ec_t2m | °C | NWP 2m air temperature |
| 19 | ec_d2m | °C | NWP 2m dew point temperature |
| 20 | ec_sp | hPa | NWP surface pressure |
| 21 | ec_msl | hPa | NWP mean sea level pressure |
| 22 | ec_u10 | m/s | NWP 10m u-component of wind |
| 23 | ec_v10 | m/s | NWP 10m v-component of wind |
| 24 | ec_rh | % | NWP relative humidity (diagnostic variable) |
| 25 | ec_ws | m/s | NWP wind speed (diagnostic variable) |
| 26 | ec_wd | degree | NWP wind direction (diagnostic variable) |
| 27 | ec_cbh | m | NWP cloud base height |
| 28 | ec_sf | m of water equivalent | NWP snowfall |
| 29 | ec_blh | m | NWP boundary layer height |
| 30 | ec_fal | (0 ~ 1) | NWP albedo |
| 31 | ec_lcc | (0 ~ 1) | NWP low cloud cover |
| 32 | ec_mcc | (0 ~ 1) | NWP medium cloud cover |
| 33 | ec_hcc | (0 ~ 1) | NWP high cloud cover |
| 34 | ec_tp | m | NWP total precipitation |
| 35 | PM2.5 | μg/m³ | Hourly mean PM2.5 concentration |
| 36 | PM10 | μg/m³ | Hourly mean PM10 concentration |
| 37 | SO2 | μg/m³ | Hourly mean SO2 concentration |
| 38 | NO2 | μg/m³ | Hourly mean NO2 concentration |
| 39 | O3 | μg/m³ | Hourly mean O3 concentration |
| 40 | CO | mg/m³ | Hourly mean CO concentration |
| 41 | AQI | – | Real-time AQI value |
The experiment was conducted on CN-AEBench-L3.
To standardize the conditions, we set the time interval for the first version of L3 to 2023.09.01–2025.08.31. You may also choose to run your experiments on this version.
We selected four representative regions across China for comprehensive evaluation:
🏔️ Lanzhou Urban Agglomeration
🌺 Kunming Urban Agglomeration
❄️ Harbin Urban Agglomeration
🌊 Shanghai
Each region is trained and tested independently to ensure robust regional performance.
Overall ratio - Training:Validation:Test = 7:1:2
Validation Set: Days 1-10 of Oct 2023, Jan/Apr/Jul/Sep/Dec 2024, Mar/Jun 2025
Test Set: Days 14-end of the same months
Training Set: All remaining data
Configuration: 24 steps → 1\4\12\24 steps
Evaluation: Lead times at 1h (nowcasting), 4h, 12h, 24h (short-term), and overall performance
Configuration: 48 steps → 1\24\48\72\96\120 steps
Evaluation: Lead times at 1h (nowcasting), 24h, 48h, 72h, 96h, 120h (extended range), and overall performance
The training configuration was as follows: maximum epochs were set to 500 with an early stopping patience of 30 epochs. We employed the Adam optimizer with an initial learning rate of 0.001, coupled with a MultiStepLR scheduler for learning rate decay. The batch size was set to 512. All reported results were obtained by selecting the checkpoint with the lowest validation loss and evaluating it on the held-out test set to ensure unbiased performance assessment.
All experiments were conducted on a single server equipped with 4× NVIDIA A100 GPUs, an Intel Xeon Ice Lake processor (32 cores, 64 threads at 2.6GHz), and 188GB RAM. The system runs Ubuntu Linux with CUDA 11.7 and PyTorch 1.13.
Experiment results: https://aiweather126.github.io/CN-AEBench/pages
CN-AEBench L3 data is specifically designed for building end-to-end intelligent forecasting models and is currently at version 1.0.0.
To ensure benchmark stability and comparability of research results, we release new versions only when significant improvements are made to accommodate new weather and environmental changes, with clear version numbering.
Data for timeliness tasks are automatically uploaded to HuggingFace repository daily around 12:30 and 21:30. Please note that due to copyright restrictions on the raw data, there is a delay of approximately 2 days.
The raw multi-source data are obtained from provincial-level platforms including https://sthj.sh.gov.cn/, https://sthjt.jiangsu.gov.cn/, https://aqi.zjemc.org.cn/, https://sthjt.ah.gov.cn/site/tpl/5371, https://nmg.merryai.com:33864/publish/#/realtime, http://hjzlxxfb.sthjt.jiangxi.gov.cn:9317/eipp/, https://sthjj.zhengzhou.gov.cn/air24.jhtml, https://sthjt.henan.gov.cn/, https://fjaqi.fjemc.org.cn/fb/, ……. Additional data sources will be continuously monitored and integrated.
Domain: