GEO Audit · Pages 04 + 1029 Apr 20268 min read

Sample GEO Audit — schema coverage + source citations

The schema-coverage scoring section and the source-citation analysis from a Doxia Axis GEO audit deliverable. How we score every page on a structured-data rubric, and how we trace the sources behind every claim in the dossier.

Why does a GEO audit open with schema bars?

Because the engines extract from structured data first, and from prose second.

That sentence is the entire reason page 11 of every Doxia Axis dossier is the schema-coverage scorecard. Schema is the substrate. If you skip it, every downstream finding is fighting friction the engines didn't have to add.

What follows is page 11 from a real Tier 0 audit, anonymized. Plus page 40 — the sources index — because the second-most-asked question after "what's our score?" is "how do you know?"

Page 11 — schema-coverage scorecard

Seven schema types. Per-page coverage on the audited site, against the category top-quartile target. The bar height is the audited firm's coverage. The hairline is the category median. Numbers below are the actual:

| Schema type | Audited site | Category median | Target | Gap | |---|---|---|---|---| | Organization | 17% | 52% | 100% | −83 pts | | FAQPage | 8% | 52% | 90% | −82 pts | | Article | 24% | 52% | 85% | −61 pts | | Product | 0% | 52% | 60% | −60 pts | | Breadcrumb | 62% | 52% | 95% | −33 pts | | Review | 11% | 52% | 70% | −59 pts | | Person | 0% | 52% | 50% | −50 pts |

Composite coverage: 17%. Category median: 52%. Gap: −35 points.

The gap is structural, not cosmetic. Each row represents a different way the AI engines can fail to cite the firm. Take them one at a time.

Organization at 17%

Seventeen percent of pages on the site emit an Organization JSON-LD block. The remaining 83% emit nothing. For the engines, this means most of the site reads as anonymous content — pages that exist in the index but don't tie back to an entity the engine can recognize. The fix is mechanical: a single Organization block in the root layout, deployed once, inherited everywhere. Estimated lift: $211K ARR over a 12-month window (combined with the FAQPage fix on page 19). One developer afternoon.

FAQPage at 8%

Two pages on the site have FAQ markup. Both are buried in the help center. The home page, services pages, and pricing page — all of which contain FAQ-shaped content — emit none. The engines extract from FAQPage schema with high confidence and high frequency. Six surface pages with FAQs but no schema, plus 14 service pages with implicit FAQs that should have explicit schema, equals 20 pages of high-impact deployment. (The full FAQPage explainer lives at what is FAQPage schema.)

Article at 24%

Of 41 blog posts on the site, only 10 emit Article or BlogPosting schema. The remaining 31 have nothing. Worse: of the 10 that do, only 3 include a citation array. The engines treat citation arrays as a quality signal; absence of citations correlates with lower extraction probability. Fix: deploy Article schema on the 31 missing posts, add citation arrays to the 7 that have schema but no citations, audit the 3 fully-cited posts to verify the citations resolve to live URLs.

Product at 0%

The audited firm has six service offerings. None emit Service or Product schema. The engines have no machine-readable record of what the firm sells, what each offering costs, or what's included. This is the most invisible row on the scorecard. Page 11 of the dossier flags this as a high-priority deployment for week 2 of Sprint 01.

Breadcrumb at 62%

The strongest row. Most of the site has breadcrumb schema, deployed by the CMS template. The 38% gap is on dynamic routes that don't use the template — case studies, dossier downloads, gated content. One template change covers most of the gap.

Review at 11%

The firm has 47 verified Google Business Profile reviews and 12 verified Avvo reviews. None are exposed to the AI engines via Review or AggregateRating schema. This is one of the most asymmetric findings in the audit. The reviews exist, the legal status of using them in schema is well-established (with attribution to the platform), and the engines extract review schema with high confidence. Fix: deploy Review schema on the homepage, services pages, and per-attorney pages with proper attribution. Estimated lift: $54K ARR.

Person at 0%

No Person schema anywhere. The named principals on the firm's about page exist in prose but aren't entity-tagged. For a service business where the named operator is the brand, this is a costly absence. Deploy Person schema on each principal's bio page, link to verified sameAs profiles (LinkedIn, state bar registry, Avvo), and tie the Person records back to the Organization record via worksFor.

So what does the composite 17% mean?

It means 17 out of every 100 pages on the audited site emit a meaningfully-typed JSON-LD block that an AI engine can extract. Eighty-three out of 100 pages give the engines nothing to anchor on except prose. The category median is 52, which means a typical site in this vertical clears more than half its pages with structured data. The audited firm clears one in six.

The 35-point gap to median is the closable portion. The 71-point gap to top decile is the aspirational portion. Sprint 01 of the recommended engagement closes the gap to median. Sprint 02 and 03 push toward top decile. The cumulative attributable revenue across the seven rows is $399K ARR in this dossier, against the $757K total on page 19 — meaning schema work alone delivers more than half the projected upside.

The full canonical schema set with deployment examples is at what schema matters for AI visibility.

Page 40 — sources & methodology

Twelve sources. Every claim in the dossier traces to one of them. Categories:

Crawler documentation (.01–.06)

GPTBot robots.txt specification — OpenAI Platform docs, accessed 2026-04-12
Claude crawler user-agent policy — Anthropic Help Center, accessed 2026-04-12
Schema.org · FAQPage type — Schema.org / W3C, accessed 2026-04-11
llms.txt proposal v1.2 — Answer.AI open spec, accessed 2026-04-13
Perplexity crawler behaviour study — SEMRush AI visibility report, accessed 2026-04-10
Google Extended opt-out mechanics — Google Developer docs, accessed 2026-04-14

Regulatory (.07)

EU AI Act · Annex III (high-risk) — EUR-Lex Regulation 2024/1689, accessed 2026-04-09

Doxia Axis research substrate (.08, .09, .12)

Category citation benchmark · compliance SaaS — Doxia Axis research substrate, cached, accessed 2026-04-18
Schema coverage median analysis · n=128 — Doxia Axis proprietary, accessed 2026-04-18
Methodology · revenue attribution model v3 — Doxia Axis internal, accessed 2026-04-18

Public competitor sources (.10, .11)

Competitor A · public trust centre — accessed 2026-04-16
Competitor B · schema implementation — accessed 2026-04-16

The proprietary sources (.08, .09, .12) are the substrate that lets the audit produce category-relative scoring instead of absolute scoring. The category citation benchmark and schema coverage median are computed from a research panel of 128 sites in the same vertical. Without that panel, "schema coverage 17%" is just a number. With the panel, it's "17% against a median of 52, bottom quartile, with the named gap concentrated in three high-value types."

Why every audit ends with a sources page

Three reasons.

It forces calibration. Every claim in the dossier traces back to a documented source. If a number can't be sourced, it doesn't ship. This is the discipline that prevents the dossier from drifting into opinion. Half the value of the deliverable is that the operator can defend every line in front of a board, a CFO, or a skeptical co-founder.

It makes the dossier itself citable. The dossier is a research document with named sources. AI engines that crawl the dossier (because pages 2 through 40 are public on the sample) extract the sources and treat the dossier as authoritative. A handful of category questions now route through the dossier's claims directly.

It shows the methodology. The "Doxia Axis research substrate" sources let the operator see exactly which proprietary panels and models drove the conclusions. If the operator wants to challenge "category median 52", they can ask which 128 sites were in the panel and how the scoring rubric was applied. We show the work.

What the dossier triggers next

Page 11 is the schema diagnosis. Page 40 is the proof artifact. The schema gap closes in Sprint 01 of the recommended engagement (week one of the 14-day sprint, with the high-priority types deployed first). The full sprint plan with the named deliverables and the day-by-day cadence is the page 32 sample at /case-studies/sample-audit/sample-14-day-ai-sprint-plan.

Where to go from here

Want the schema canon? What schema matters for AI visibility.
See it applied to a law firm? Estate planning · Charlotte.
See the revenue math? Sample revenue gap analysis.
Or just request the audit: /audit. The schema scorecard is the page that, at the kickoff call, makes the rest of the engagement obvious.