Applied AI · public benefits

AI for the public-benefits safety net

Four build patterns (agents, retrieval, fine-tuning, and the Model Context Protocol) applied to one domain: public benefits eligibility. A wrong answer here costs a family its food or medical coverage. That stake is why every project ships a measured evaluation.

The four projects

Each runs standalone and is documented and evaluated on its own. Together they compose into one intake-to-determination pipeline a state agency could run.

MCP

rules-as-code-mcp

Deterministic SNAP eligibility exposed as auditable MCP tools. Every determination returns a result, a rule trace, and a policy citation behind it.

100% on 18 labeled cases (decision, rule-trace, citation) · 9/9 robustness cases handled cleanly
RAG

policy-manual-rag

Grounded Q&A over the real Michigan eligibility manual, every answer cited to its exact section, refusing when the corpus cannot ground it.

retrieval Hit@4 0.92→1.00, citable 0.97→1.00 · faithfulness 0.98, grounded-refusal 0.88
Agents

benefits-intake-agent

A document-intake triage layer: reads an application and its uploads, screens via the rules core, and flags what needs a human. Never auto-denies.

0% wrongful wave-through, 0 denials, 100% conflict recall · 100% vision extraction
Fine-tuning

plain-language-notices

A small fine-tune that rewrites bureaucratic notices into plain language while a faithfulness gate proves it kept every operative fact.

prompting hits the reading target 20% of the time, RAG 100% · faithfulness gate disqualifies any dropped fact

How they compose

The MCP rules core is the deterministic decision layer the others hand the real call to. The agent screens through it. The RAG index sits inside it as a policy-lookup tool, and the notice the agent produces is what the fine-tune makes readable.

application + documents intake agent rules-as-code (MCP) determination plain-language notice

The thread through all four

Deterministic core, probabilistic edge

The model reads and reasons; the legal determination is made by auditable code. Each project draws that boundary explicitly, in a named graph node or a tool call.

Citations and auditability

Every answer and determination traces to a source document or a cited rule. "The model said so" is not a basis for denying benefits.

Evaluation is the deliverable

Every repo ships a hand-built labeled set and reports real numbers, including how often it fails and how it recovers.

Synthetic data only

Every project ships a synthetic generator. Handling applicant data correctly is a core public-sector competency.

Design system

Civil

A design system in the USWDS lineage, elevated. It is the shared visual language across these tools.

DESIGN_SYSTEM.md ↗