1 of 8 · ~3 min read

The project

Paraguay's agricultural sector — soybean especially — runs on weather. A drought or a flash flood at the wrong time wipes out a harvest. Existing forecasts are not very good in this part of South America. Arawave is an attempt to do better.

The problem in one paragraph

Paraguay is the world's 4th-largest soybean exporter. Most production is concentrated in the eastern part of the country — Itapúa, Alto Paraná, Caaguazú, Canindeyú departments. Farmers and cooperatives make planting, spraying, and harvest decisions weeks in advance, and those decisions are sensitive to whether it's going to rain. The operational global weather model that everyone defaults to (NOAA's GFS) doesn't perform especially well over this region. There's headroom for a better product.

What I built

A three-model AI ensemble. The members are:

FCN3 — NVIDIA's spherical Fourier neural operator weather model.
GraphCast — Google DeepMind's graph neural network weather model.
GFS — the standard NOAA physical model, kept as a baseline + ensemble member.

The three models each forecast the next 10 days at 6-hour intervals. A statistical post-processing layer (called EMOS-NGR — non-homogeneous Gaussian regression) combines them and adds a calibrated uncertainty estimate. So instead of "it will rain 12 mm on Tuesday," the output is "Tuesday's rainfall is most likely 12 mm, with an 80% chance of being between 6 and 22 mm." That kind of probabilistic framing is what farmers actually need to make a decision.

Why the three-model approach works

Each model has different biases — FCN3 tends to under-predict heavy events, GraphCast tends to be smoother than reality, GFS handles convective events differently than either. Combining them with weights that account for those biases produces a forecast that's better than any single member.

The number

+25.7% RMSE improvement over raw GFS

Validated on 60 forecast dates against ERA5 reanalysis (the industry-standard ground truth) and CHIRPS satellite-based precipitation, over a 25 km grid covering the Paraguayan soybean belt. The improvement holds on both validation sources, on both regional and sub-regional cuts.

The honest caveat

The +25.7% headline lives in the dry, light, and moderate precipitation bins, which are about 90% of the data. On heavy events (more than 25 mm in a day), the model actually loses to GFS by 6–19% across all four validation views.

This is structural. A 25 km grid cell is bigger than most thunderstorm cells, so the model averages over what's really a small intense feature. Fixing this requires either denser ground-truth data (gauges) or a fundamentally different architecture (kilometer-scale diffusion models, which are 100× more expensive).

The system is best described as a regional aggregate forecaster and dry-day false-alarm fixer at scale — not a heavy-event predictor. That's an honest framing. It's still a useful product for most of what cooperatives need to plan around.

What the demo shows

The technical demo at stellar-pika-fa02e6.netlify.app has:

The headline number with confidence intervals.
Per-event maps for five showcase events (heavy rain, drought, moderate, ties), showing forecast vs. truth side by side.
Calibrated probability distributions per grid cell.
The honest caveats spelled out: heavy-bin loss, point-station gap, sample size.

What it cost to build

About $120 in total compute on Modal over two months of experimentation — $60 out of my own pocket plus $60 from Modal's monthly free-tier credits ($30/month × 2 months). The compute is cheap; the work is in the methodology, the validation, and the post-processing. Solo project, self-funded.

Next: where we are — honestly, the technical work is done; the relational work hasn't started.