How it works

A tour of the architecture, the research it copies, and the difference between a measured result and a self-reported one.

Grant discovery is a fragmentation problem before it is an AI problem. A researcher in climate-adapted agriculture has to check Grants.gov, NSF's portal, USDA's NIFA, DOE, and a long tail of private foundations, each with its own search box, its own vocabulary, and its own idea of what a deadline field looks like. The work is not hard so much as it is scattered, and scattered work is exactly what a well-built agent can compress.

This tool is an independent build of a published system: Zhisheng Tang and Mayank Kejriwal's "A Compound AI Agent for Conversational Grant Discovery", out of USC and GRAIL, presented at the 2026 ACM Conference on Agentic and AI Systems. Their live version runs at grail.page. I rebuilt the architecture to understand it from the inside, and show that today, AI can help us bring academic ideas to life in the real world - quickly and easily.

Two layers, loosely coupled

The paper's core idea is a compound system: several components in sequence rather than one model doing everything. There are two.

The first is an aggregation layer. In the paper, LLM-equipped browser agents crawl Grants.gov, DARPA, and foundation websites on a biweekly schedule, pass the raw HTML through a language model with a schema, and extract canonical fields like title, description, URL, and end date into a unified index. By their count the index holds about 11,800 opportunities, and foundations dominate it at 8,696, followed by NSF at 805 and a tail of NIH, DARPA, and DOE. I followed that idea but took a shortcut that happens to be a real improvement: instead of scraping foundations, this tool queries Kindora, which already maintains a corpus from IRS 990 filings and funder sites. Federal opportunities come live from the Grants.gov public API. Both are normalized into one record shape at request time, so the agent never has to know which portal a result came from.

The second layer is the query agent: a ReAct-style loop with exactly two tools, search_index and web_search. You can hand it a proposal PDF and it reads the full text as context, pulling out the domain and methods itself rather than asking you to type keywords. Its strategy is fixed and deliberately conservative: search the structured index first, because that data is grounded and verifiable, and only fall back to web search when the index comes up thin or you ask about something posted in the last few days. Every result it shows is tied to a real URL from a tool call. The design exists specifically to stop a language model from doing the thing language models love to do, which is confidently describe a grant program that does not exist.

One deviation: where the paper serves cached, pre-crawled data, this prototype queries Kindora and Grants.gov live on each request. For a single-instance build that trades a little latency for zero crawl infrastructure and fresher federal data. If you wanted the paper's exact setup, you would mirror both sources into a Postgres index on a schedule and search that. But why?

The dubious 30-to-10 number

The headline claim from the paper is that discovery time drops from 30 to 45 minutes of manual portal searching to under 10 minutes. It is a genuinely useful framing of the problem, and I use it as a design target. It is also not a benchmark.

The authors are explicit about this. The paper offers no formal evaluation and no controlled comparison. The 30-to-45-minute baseline is described as what a manual search "typically" takes, and the under-10-minutes figure comes from a walkthrough scenario, not a timed study with participants. The same goes for the other numbers you will see quoted: 3,000-plus users, a first token within two seconds, the biweekly refresh. These are deployment observations and product telemetry, reported in good faith, but they are not results in the sense a reviewer would mean the word. There is no control group, no task suite, no measure of whether the opportunities surfaced were the right ones.

So the real version is this: the architecture is sound and the efficiency story is plausible, but anyone citing specific time savings as a proven outcome is overreading the paper. AI can arguably make discovery faster, but its not going to make it better without your very human judgement of the results.

What to measure next

The interesting open question is not speed, it is precision. A system that returns twelve opportunities in eight seconds has done nothing for you if three of them are off-topic and the best fit is missing. The claims that would actually move me are recall and relevance against a labeled set: given a real proposal and an expert's shortlist of programs that fit it, how many does the agent find, and how much of what it returns is noise. None of the deployed systems, this one included, report that yet. Until they do, the efficiency numbers are a story about convenience, not about quality.

That gap is the whole reason to build in public. You can read the code, run it against your own work, and see exactly which tool was called and why before you trust a single result.