8 of 8 · reference

Appendix

Sources, terms, and a couple of the technical specifics — for the curious or for anyone who wants to dig deeper. Skip if you're skimming.

Glossary

RMSE Root mean squared error — the standard accuracy metric for forecast predictions vs. truth. Lower is better.
GFS Global Forecast System — NOAA's operational global weather model. The industry-standard baseline.
ERA5 ECMWF's gold-standard reanalysis dataset. Used as ground truth for validation.
CHIRPS Climate Hazards InfraRed Precipitation with Station data. A satellite + station blended precipitation dataset. Second validation source.
EMOS-NGR Ensemble Model Output Statistics — Non-homogeneous Gaussian Regression. The post-processing method that combines the three ensemble members and produces calibrated uncertainty.
FCN3 NVIDIA's spherical Fourier neural operator weather model. One of the three ensemble members.
GraphCast Google DeepMind's graph neural network weather model. Another ensemble member.
Calibrated uncertainty A probability distribution where the stated probability matches reality — when the model says "80% chance of more than 5 mm," it should be right 80% of the time.
DMH / DINAC Dirección de Meteorología e Hidrología, the Paraguayan national met service, sitting inside the civil aviation authority DINAC.
MAG Ministerio de Agricultura y Ganadería — Paraguay's agriculture ministry.
Itaipú Binacional The binational (Paraguay + Brazil) hydroelectric dam authority. Operates a substantial gauge network in the Paraná basin — exactly the region where this project's biggest validation gap is.

Source materials

The validation regions

The Paraguayan domain used throughout:

RegionLat rangeLon rangeUse
Full domain −27.5 to −19.0 −63.0 to −54.0 National-level reporting
East / soybean belt −27.0 to −22.0 −57.0 to −54.0 Primary commercial relevance
Chaco (northwest) −23.0 to −19.0 −63.0 to −58.0 Drier; secondary cattle/cotton region

Stratified results — the honest table

Per-bin RMSE on the full × ERA5 view (60 dates × 35 × 37 cells = 77,700 observations). This is the table that lives behind the +25.7% headline.

Truth bin n obs Mean truth (mm) Ensemble RMSE GFS RMSE Skill vs. GFS
Dry (<1 mm) 48,904 0.17 1.86 3.20 +41.7%
Light (1–10 mm) 20,705 3.80 4.98 9.18 +45.8%
Moderate (10–25 mm) 6,056 15.48 10.85 14.00 +22.5%
Heavy (>25 mm) 2,035 40.57 33.98 30.68 −10.8%

The pattern is consistent across all four validation views (full × ERA5, east × ERA5, full × CHIRPS, east × CHIRPS). Heavy events lose to GFS by 6–19% across the four. The bias is structural at 25 km cell resolution.

What's running passively right now

Why I'm doing this

Two reasons.

First, I think AI weather models are interesting and the application to underserved regions is more impactful than another marginal improvement on a North American or European benchmark. Paraguay is a real economy with real exposure to weather risk and worse forecasting infrastructure than its neighbors.

Second, I wanted to find out whether one person with a laptop can take a credible technical artifact from "research" to "actually used by a stakeholder" without an institution behind them. The honest answer to that is "I don't know yet — that's what the next 90 days are about."

That's the end of the plan site. Back to the overview if you want to reread anything, or how to help if anything specific came to mind.