2 of 8 · ~2 min read

Where we are — honestly

Two scoreboards run in parallel on a project like this. One is technical; one is relational. The technical board passed. The relational board hasn't started.

Technical: passed

The original kill metric was +15% improvement over GFS on precipitation forecasts at 72-hour lead time in the soybean belt. The current model passes it at +25.7% with confidence intervals that don't cross zero. There is a honest "heavy events" caveat, which is documented openly. The demo is live and reviewable. A technically literate reader can read the demo and form their own judgement in 10 minutes.

Relational: not yet

Zero Paraguayan institutions have evaluated this. Zero cooperatives have used it for a decision. The DMH (Paraguay's national met service) doesn't know it exists. Itaipú Binacional (the dam authority that operates the gauge network we'd want for tighter validation) doesn't know either. The Ministry of Agriculture doesn't know. No newspaper has covered it. No academic has cited it.

That's not a complaint. It's the actual state. The technical work happened first because that's what one person with a computer can do alone. The relational work is what comes next, and it's not something one person with a computer can do alone — especially when that person doesn't speak Spanish and doesn't live in Paraguay.

What changed when I realized this

For most of the last two months I was optimizing the model. There was always one more thing — better calibration, more dates, another ensemble member, a fancier post-processing scheme. Each one promised a few more percentage points. Some of them delivered.

A few weeks ago I noticed the percentage points were no longer the bottleneck. Even at +25.7%, no Paraguayan stakeholder was going to look at it on their own. The bottleneck was that nobody knew the project existed, and the people who could change that — particularly my Paraguayan collaborator — were not consistently following through on the introductions and calls.

The reframe

The kill metric isn't another percentage point. It's one Paraguayan stakeholder seriously evaluating this product. That's a relational outcome, not a technical one. It changes the priority order for everything else.

When the kill metric is "improve RMSE by another point," the work is familiar — write code, run experiments, look at numbers. When the kill metric is "get a cooperative to actually look at this," the work is unfamiliar to me — make calls in Spanish, navigate Paraguayan ministries, build trust over months, accept that file movement is slow.

I can't do that work directly. So I'm hiring someone who can.

What I have (concretely)

What I need

Next: the plan — three parallel tracks for the next 90 days.