Hanoi & HCMC PM$_{2.5}$ Forecasting Experiments (2026)
This page serves as our “Lab Notebook” for the 2026 forecasting study. We set out to build a robust 72-hour operational PM$_{2.5}$ forecast using only public meteorological data. What started as a simple statistical baseline quickly evolved into a deep dive into atmospheric physics, neural network architectures, and the stark differences in tropical climates.
Here are the complete findings of our experiments, from our early upper-air data tests to the final deployment architectures.
AI Assistance Disclosure: The code generation, data processing, and analysis for these 2026 experiments were conducted with the assistance of AI models, specifically the Gemini family and Claude via AWS Bedrock.
The “Predictability Ceiling”
Before diving into the experiments, we must state our primary scientific finding: When forecasting purely from weather data without explicit emission inventories, statistical models hit a mathematical ceiling at around RMSE 20 $\mu g/m^3$ for T+24h.
Weather tells the model how pollutants will disperse (e.g., “wind speeds are dropping, expect trapping”). But without knowing what is actually being emitted by local traffic, agriculture, and industry, the model must guess the baseline volume based on recent history. Every experiment below was an attempt to push as close to this theoretical ceiling as possible.
Phase 1: The Search for Upper-Air Physics (MERRA-2 & IGRA)
Our first hypothesis was that PM$_{2.5}$ is driven by vertical mixing. If we could detect temperature inversions trapping air near the surface, we could predict severe pollution events.
What we tried: We incorporated MERRA-2 (satellite reanalysis for Boundary Layer Height) and IGRA (Radiosonde weather balloon soundings for inversion strength).
What we found:
- IGRA Failed: Weather balloons are launched only twice a day. The data was too temporally sparse. Forward-filling 12 hours of data confused the models.
- MERRA-2 Helped:
MERRA-2boundary-layer features added small but repeatable value at long horizons (T+24htoT+72h), successfully acting as a proxy for trapped air. - The Catch:
MERRA-2is a retrospective reanalysis dataset, meaning it has a 3-day delay. We could not use it for live operational forecasting.
Phase 2: Ground-Level Convective Proxies (V3 vs V4)
Since we couldn’t use MERRA-2 operationally, we needed to engineer “proxies” for those upper-air dynamics using only ground-level weather forecasts (Open-Meteo). We built several datasets:
- V3 (Continuous Physics): Continuous momentum features, raw wind speed/direction, and basic temperature.
- V4 (Binary Proxies): Highly engineered, non-linear thresholds like
precip_washout(heavy rain cleaning the air),wind_stagnant, andinversion_risk.
What we found:
- The “Episodic Window”: The V4 binary proxies were highly effective in the T+6h to T+24h window for tree-based models, catching immediate spikes and washouts.
- The Long-Range Noise: For long-range outlooks (T+48h, T+72h), the V3 continuous features were significantly safer. Strict binary weather proxies become too noisy; if a weather forecast predicts rain at hour 70, but it arrives at hour 73, a binary “washout” flag penalizes the model severely.
Phase 3: The Architecture Showdown (XGBoost vs Delta-Skip MLP)
With our datasets finalized, we pitted the current king of tabular data (XGBoost) against Deep Learning (Multi-Layer Perceptron).
The Setup: To predict 72 hours into the future, XGBoost required training 72 completely independent models. The PyTorch MLP, however, could predict all 72 horizons in a single pass. To help the MLP, we engineered a “Delta-Skip” connection: the model takes the current PM$_{2.5}$ level, and the neural network is asked only to predict the change (delta) from that anchor point.
The Results (T+24h RMSE):
- XGBoost (V4): 19.32 $\mu g/m^3$
- Delta-Skip MLP (V3): 20.10 $\mu g/m^3$
The Verdict: While XGBoost technically squeezed out an extra point of accuracy by effectively splitting on our dense V4 proxies, the Delta-Skip MLP (V3) is our Operational Champion.
The V4 dataset’s massive dimensionality (1,901 features) caused the MLP to overfit, but feeding the leaner V3 dataset into the MLP yielded a remarkably elegant, lightweight system. The Delta-Skip natively enforces physical boundaries, anchoring the prediction to reality, and instantly outputs a smooth 3-day curve without the maintenance nightmare of 72 separate XGBoost models.
2026 Final Leaderboard (Validation RMSE, $\mu g/m^3$)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | MLP (V3) | MLP (V4) | XGBoost (V3) | XGBoost (V4) | 👑 Best Model |
|---|---|---|---|---|---|---|---|
| T+01h | 13.26 | 21.35 | 11.88 | 12.03 | 11.71 | 11.76 | XGBoost V3 |
| T+06h | 26.78 | 23.81 | 18.75 | 19.02 | 18.07 | 18.07 | XGBoost V4 |
| T+12h | 29.16 | 25.51 | 19.71 | 20.04 | 19.15 | 19.13 | XGBoost V4 |
| T+24h | 29.01 | 28.02 | 20.10 | 20.62 | 19.46 | 19.41 | XGBoost V4 |
| T+48h | 33.36 | 30.76 | 21.02 | 22.17 | 20.06 | 20.15 | XGBoost V3 |
| T+72h | 35.15 | 32.00 | 22.70 | 23.93 | 21.15 | 21.18 | XGBoost V3 |
Note: Nowcast baseline assumes PM$_{2.5}$ at T+h remains exactly the same as T=0.
Phase 4: The Temporal Catastrophe & Sinusoidal Redemption
We noticed strong visual seasonal patterns during our Exploratory Data Analysis (winters are much worse than summers in Hanoi). We hypothesized that adding Temporal features (Month, Day of Week, Rush Hour) would boost the forecast.
Experiment 4A: Temporal-Only We stripped out all weather data and trained a model strictly on time. Result: Catastrophic Failure. The RMSE was 34.2 $\mu g/m^3$ with an R² of -0.25 (worse than blindly guessing the historical average). PM$_{2.5}$ physics (inertia and weather) completely dominate human climatological routines.
Experiment 4B: Binary vs Sinusoidal (V3 + Temporal) We tried adding temporal features back into the winning V3 dataset. First, we added them as binary flags (is_weekend, is_winter). This bloated the dimensions and worsened the MLP’s performance at long horizons. Then, we replaced the flags with continuous Sinusoidal Embeddings (smooth waves representing the cycle of a week or a year).
Result: The Sinusoidal embeddings allowed the MLP to extract the signal without overfitting! It achieved the best T+6h score of the entire project. However, for 72-hour operational stability, the Pure V3 (Weather Only) model still reigns supreme.
Phase 5: The Tale of Two Cities (Hanoi vs HCMC)
Finally, we deployed the exact same Delta-Skip MLP architecture to Ho Chi Minh City (HCMC) to test spatial generalization.
The Context: Hanoi (North) experiences harsh winter inversions where cold air gets trapped by surrounding mountains, causing PM$_{2.5}$ to frequently exceed 150 $\mu g/m^3$. HCMC (South) is a coastal, flat city with persistent ocean breezes and a relatively stable, low PM$_{2.5}$ baseline (~20-40 $\mu g/m^3$).
The Results (T+24h RMSE):
- Hanoi: ~20.10 $\mu g/m^3$
- HCMC: ~8.50 $\mu g/m^3$
What this means: The system generalizes beautifully. In HCMC, the variance is much lower, and the weather physics are primarily driven by continuous ocean winds rather than severe frontal inversions. The MLP easily maps the HCMC baseline, resulting in a dramatically lower absolute error. It proves that the framework is sound, but context—and local geography—is everything in air quality forecasting.
Final Operational Configuration
For our live demo and deployment, we run:
- Model: PyTorch Delta-Skip MLP
- Inputs: V3 Continuous Weather Dataset (Open-Meteo) + PM$_{2.5}$ Momentum Lags
- Output: Simultaneous T+1h to T+72h forecast.
Appendix: Extended Performance Metrics
To provide a complete statistical picture of the models, we include the R² and Mean Absolute Error (MAE) leaderboards for Hanoi, followed by the comprehensive performance metrics for the Ho Chi Minh City (HCMC) generalization test.
Table 1: Hanoi Validation R² (Variance Explained)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | MLP (V3) | MLP (V4) | XGBoost (V3) | XGBoost (V4) |
|---|---|---|---|---|---|---|
| T+01h | 0.801 | 0.485 | 0.815 | 0.812 | 0.825 | 0.823 |
| T+06h | 0.185 | 0.360 | 0.580 | 0.565 | 0.590 | 0.596 |
| T+12h | 0.052 | 0.285 | 0.512 | 0.501 | 0.521 | 0.525 |
| T+24h | 0.060 | 0.155 | 0.450 | 0.425 | 0.458 | 0.464 |
| T+48h | -0.220 | -0.050 | 0.370 | 0.315 | 0.385 | 0.380 |
| T+72h | -0.355 | -0.120 | 0.390 | 0.320 | 0.400 | 0.395 |
Table 2: Hanoi Validation MAE ($\mu g/m^3$)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | MLP (V3) | MLP (V4) | XGBoost (V3) | XGBoost (V4) |
|---|---|---|---|---|---|---|
| T+01h | 8.12 | 14.25 | 7.45 | 7.55 | 7.32 | 7.36 |
| T+06h | 18.30 | 16.40 | 12.85 | 13.10 | 12.60 | 12.54 |
| T+12h | 20.45 | 17.80 | 14.15 | 14.50 | 14.05 | 13.94 |
| T+24h | 20.15 | 19.50 | 14.85 | 15.30 | 14.75 | 14.68 |
| T+48h | 24.10 | 22.10 | 16.10 | 17.05 | 15.87 | 15.95 |
| T+72h | 25.50 | 23.40 | 16.80 | 17.80 | 16.33 | 16.45 |
HCMC Validation Metrics (Baselines vs. Delta-Skip MLP)
As discussed in Phase 5, HCMC’s coastal climate and lower baseline variance result in significantly tighter error margins. The tables below outline the comprehensive performance of the operational Delta-Skip MLP against the persistence baselines on the HCMC holdout set.
Table 3: HCMC Validation RMSE ($\mu g/m^3$)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | Delta-Skip MLP |
|---|---|---|---|
| T+01h | 6.50 | 9.80 | 5.20 |
| T+06h | 11.20 | 10.50 | 6.80 |
| T+12h | 13.50 | 11.20 | 7.50 |
| T+24h | 14.80 | 12.50 | 8.50 |
| T+48h | 16.50 | 14.00 | 9.20 |
| T+72h | 18.00 | 15.50 | 10.10 |
Table 4: HCMC Validation R² (Variance Explained)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | Delta-Skip MLP |
|---|---|---|---|
| T+01h | 0.750 | 0.520 | 0.880 |
| T+06h | 0.250 | 0.380 | 0.750 |
| T+12h | 0.100 | 0.310 | 0.690 |
| T+24h | 0.050 | 0.220 | 0.610 |
| T+48h | -0.150 | -0.050 | 0.550 |
| T+72h | -0.250 | -0.100 | 0.490 |
Table 5: HCMC Validation MAE ($\mu g/m^3$)
| Horizon | Nowcast (Baseline) | Rolling 24h (Baseline) | Delta-Skip MLP |
|---|---|---|---|
| T+01h | 4.80 | 6.50 | 3.80 |
| T+06h | 8.20 | 7.10 | 5.10 |
| T+12h | 9.90 | 7.90 | 5.90 |
| T+24h | 10.50 | 8.80 | 6.40 |
| T+48h | 11.80 | 9.50 | 7.10 |
| T+72h | 12.50 | 10.20 | 7.90 |
Visual Validation: The Value of Multi-Day Horizons
The plots below visualize the operational Delta-Skip MLP forecasting 1 hour, 24 hours, and 48 hours into the future, compared against actual measurements and persistence baselines across two distinct weeks in 2023.
Why focus on T+24h and T+48h? As shown in the T+1h (Next Hour) plots, the simple Nowcast (carrying the current value forward) effectively overlaps the actual PM2.5 measurements. For immediate next-hour predictions, there is little need for fancy machine learning—the atmosphere is highly autocorrelated on an hourly scale, and simple persistence is “good enough”.
The true value of our Deep Learning approach reveals itself at the T+24h (Next Day) and T+48h (Two Days) horizons. At these scales, basic rolling averages completely smooth out, missing all diurnal cycles and severe pollution spikes. The Delta-Skip MLP, driven by meteorological foresight, continues to capture the daily peaks in Hanoi and the coastal stability of HCMC days in advance.
Hanoi Validation (Jan 2024)

Hanoi Validation (May 2024)

HCMC Validation (Jan 2023)

HCMC Validation (May 2023)
