Methodology

    How we collect, normalize, and verify program data

    Where the numbers come from, how we keep them fresh, and what we don't claim to know.

    Source extraction

    For each program, we identify the canonical sources of authority on its public web presence - typically the program's own page on the school's website, the school's graduate catalog, the school's tuition page, and any program-specific FAQs. We use the Exa search API to surface these pages, then extract structured fields from them. Every program record stores the list of source URLs we read; you can see them at the bottom of any program detail page.

    We do not use unofficial aggregators, blog posts, or LLM-generated summaries as the source of record. If a school's own pages are contradictory or out of date, we reflect that on the program page rather than picking the version that sounds best.

    Normalization

    Schools publish the same information in wildly different formats. One says "tuition: $1,940/credit"; another says "Total cost of attendance: $71,266 over 12 months." One school's GPA minimum is a number; another's is "B+ average." We normalize every field into common units (total tuition in USD, GPA on a 4.0 scale, months to completion, deadlines as ISO dates) so two programs can actually be compared side by side.

    Where a school's stated value is ambiguous (e.g., a GPA expressed as a percentage with no scale), we keep the original text in a public field rather than guessing at a number.

    Verification

    Every program record carries a "last verified" date that reflects when our pipeline most recently re-checked the source pages. The top programs in our database - those most frequently viewed - are re-verified manually. The long tail is re-verified by automated re-extraction on a rolling basis. We display the date on every program page and surface it on the programs index so you can sort by recency.

    What we don't claim to know

    • Real-time deadlines. Programs occasionally extend or move deadlines mid-cycle. We capture the published deadline at verification time; check the program's own site within a week of applying.
    • Acceptance rates. Most programs don't publish program-level acceptance rates; the figures we surface are school-level when available, and we mark missing data as missing rather than imputing.
    • Funding amounts. Whether a program offers scholarships or assistantships is binary; the dollar amounts vary widely by applicant. Treat funding flags as a signal to dig deeper, not a guaranteed offer.
    • Outcomes. When we surface salary or placement data, it comes from school-published outcomes reports. These reports have known biases (response rates skew positive). We link to the source so you can read the methodology.

    Corrections

    If you spot incorrect data, email hello@gradsmatch.com with the program URL and the value you believe is wrong. We re-verify against source within 7 days. If the school's own page is the issue, we'll note that on the program page and link the source.

    For more on who runs GradsMatch, see the About page.