Which metrics actually matter for AI visibility, and how do you measure them?

DECISION GUIDE29 Apr 20268 min read

What metrics matter for AI visibility?

Six metrics, ranked by leverage. Citation share-of-voice, schema coverage, AI-traffic share, response quality, crawler accessibility, third-party citation density. The ones operators measure tell you which engagements actually moved the needle.

Want the short version?

Six metrics. Ranked by leverage. Each one measures a different way the AI surface either cites your brand or doesn't.

| # | Metric | What it measures | How to gather it | |---|---|---|---| | 1 | Citation share-of-voice | % of category answers across 6 engines that cite your brand | Manual query set, or audit | | 2 | Schema coverage | % of pages emitting canonical JSON-LD types | schema-coverage-check | | 3 | AI-traffic share | % of search-shaped intent now routing through AI assistants | Inferred from inbound traffic + first-touch attribution | | 4 | Response quality | Whether engines describe your brand correctly when they cite it | Manual query set + verbatim recording | | 5 | Crawler accessibility | % of AI bots with successful, full-render fetch access | robots.txt + curl with named UA | | 6 | Third-party citation density | Mentions across LLM-training surfaces (Reddit, GitHub, Wikipedia, podcasts) | Manual or third-party tools |

The first metric is the headline. The other five are levers that move it.

Metric 1 — citation share-of-voice

The single most important number, and the one most operators don't yet measure.

Definition. Across a fixed query set in your category, run through six AI engines (ChatGPT, Claude, Perplexity, Gemini, Copilot, Grok) on the same date, what percentage of the answers cite your brand by name?

How to measure. Pick 20 to 30 category-shaped queries — "best [vertical] for [use case]", "[competitor] alternatives", "how to [task] in [vertical]". Run each query across all six engines. Record verbatim what gets cited. Calculate brand-citation count divided by total citation slots. The result is your citation share.

Benchmark. Across the audits we've shipped through Q1 2026:

Bottom quartile — 0% to 5% citation share
Median — 12% to 20%
Top quartile — 35%+
Category leader — 50%+

The metric is the cleanest read on whether the AI engines treat your brand as an answer to category questions. Brands at 0% are structurally invisible. Brands at 50%+ own canonical answers.

How often to measure. Quarterly. Citation patterns shift slowly because foundation models retrain on 12-to-18-month cycles. Monthly measurement adds noise without adding signal.

Metric 2 — schema coverage

The substrate that lets every other metric improve.

Definition. Of the canonical AI-citation schema types — Organization, WebSite, BreadcrumbList, FAQPage, Article (with citation array), Person, Service, Offer, Review, AggregateRating — what percentage are deployed across your priority pages?

How to measure. The Doxia Axis schema-coverage-check Python utility fetches a URL with a named AI-crawler user-agent and reports coverage against the canonical set. The same logic runs inside every audit.

python3 schema_coverage.py https://yourdomain.com/ --user-agent GPTBot

The output names which canonical types are deployed and which are missing. The full canon with deployment guidance is at what schema matters for AI visibility.

Benchmark. Composite coverage across the audited site:

Bottom quartile — under 20%
Median — 50% to 60%
Top decile — 80%+

Coverage at the homepage is usually highest. Coverage on nested content pages (blog posts, case studies, service pages) is usually lowest. Schema coverage on pricing pages tends to be the highest-leverage gap.

Metric 3 — AI-traffic share

The variable that determines how much the other metrics matter.

Definition. Of all the search-shaped intent reaching your business — inbound from search, content, and discovery — what percentage now routes through an AI assistant rather than Google?

How to measure. The cleanest signal is first-touch attribution. Add a "how did you hear about us" question on your inbound forms. Manually classify the answers. The percentage that names ChatGPT, Claude, Perplexity, or Gemini is your AI-traffic share. Inferred signals (referrer headers, traffic-source patterns) help calibrate but aren't reliable on their own.

Benchmark. Across the categories we audit through Q1 2026:

B2B SaaS, technical buyers (DevTools, infra, observability) — 18% to 25% AI-traffic share
B2B SaaS, non-technical buyers (HR tech, sales enablement, finance ops) — 8% to 15%
Professional services (legal, accounting, consulting) — 5% to 12%
Hospitality and local services — 3% to 8%
E-commerce and DTC — 4% to 10%

The number is volatile and rising. Across most categories, AI-traffic share is growing 1.5x to 2.5x year-over-year through 2026.

How often to measure. Monthly during the year-over-year growth phase. Quarterly once growth stabilizes.

Metric 4 — response quality

When the engines do cite your brand, do they cite it correctly?

Definition. Of the citations recorded in the share-of-voice measurement, what percentage describe your brand accurately — correct ICP, correct pricing range, correct integration claims, correct location?

How to measure. Run the share-of-voice query set. For each citation that names your brand, score the description on three axes:

Identity — does the engine describe what you actually do?
Pricing / cost — does the engine cite current pricing or an outdated number?
Capabilities — does the engine claim correctly what you support and what you don't?

A brand can have high citation share and low response quality. The result is buyers reading bad descriptions of your business and disqualifying you on false premises. The fix usually requires content rewrites — particularly the pricing page and the integration page.

Benchmark. Most brands score 70% to 85% response quality. Brands below 70% have a content-shape problem; brands above 85% are well-structured for AI extraction.

Metric 5 — crawler accessibility

The floor. If this metric fails, every other metric stays low.

Definition. What percentage of priority AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, Meta-ExternalAgent) can fetch your site successfully and read the rendered HTML?

How to measure. Three commands per crawler:

curl -A "GPTBot" https://yourdomain.com/ -I    # status code
curl -A "GPTBot" https://yourdomain.com/ | grep -c "<p>"    # rendered content count
curl -A "GPTBot" https://yourdomain.com/ | grep "application/ld+json"    # schema present

If the first command returns 200, the crawler isn't blocked. If the second returns more than zero, the content rendered without JavaScript. If the third returns matches, JSON-LD schema is present.

The full mode-by-mode diagnostic lives at why your website is invisible to ChatGPT.

Benchmark. Most operator sites we audit show 3 to 5 crawlers fully accessible out of 8. Sites with 7+ accessible are the exception, not the rule.

Metric 6 — third-party citation density

The slow-moving lever that compounds longest.

Definition. How often is your brand mentioned across surfaces the AI engines train on — Reddit, GitHub, Wikipedia, Hacker News, podcast transcripts, YouTube descriptions, trade publications?

How to measure. Manual monitoring with named-mention queries on each surface, monthly. Or paid services (Mention, Brand24, Brandwatch) that automate the monitoring. Or — for technical brands — GitHub stars and forks plus npm download counts as proxy signals.

Benchmark. For most B2B brands at $5M-$50M ARR:

Bottom quartile — fewer than 50 third-party mentions per month
Median — 100 to 300
Top quartile — 500+
Category leaders — 2,000+

This metric moves slowly because building third-party citation density is operator-led work — content cross-posting, podcast appearances, GitHub utility releases, Wikipedia entity claims. The compounding window is months to quarters, not weeks.

What metrics don't matter (much)?

Three commonly-tracked metrics that don't move citation share materially.

Domain authority (DA / DR). Inherited from SEO. Doesn't predict AI citation. A brand can have DA 80 and 0% citation share. The signals the engines use are different.

Pageviews. Doesn't predict AI citation. The engines extract from training data and from site fetches; user pageviews aren't an input.

Time on page. SEO surface metric. Doesn't predict AI citation.

These metrics aren't useless — they still matter for traditional SEO, conversion, and engagement. They just don't move the AI-search needle, which is why operators should measure the six above as the AI-specific KPIs.

So what's the operator move?

Three concrete actions.

Action 1 — establish baseline on metrics 1, 2, 5. These three are measurable in one operator-afternoon. Citation share via the manual query set, schema coverage via the open-source utility, crawler accessibility via three curl commands per engine. Total: 90 minutes.

Action 2 — add metric 3 to your inbound funnel. Add the "how did you hear about us" question to your contact form. Start collecting AI-traffic share data the moment the form is updated.

Action 3 — set a quarterly review cadence. Run the full six-metric scorecard quarterly. Compare to baseline. The audit deliverable Tier 0 ships the full baseline scorecard in five business days.

Where to go from here

What is GEO? /answers/what-is-geo.
What does the audit produce? /answers/what-is-an-ai-visibility-audit.
The schema deep-dive: /answers/what-schema-matters-for-ai-visibility.
Or just request the audit: /audit. The dossier ships the baseline scorecard with revenue tags per gap.