How do I measure AI citation share-of-voice?
Five steps. One operator afternoon. Six engines, twenty to thirty queries, the results recorded verbatim. Here's the exact methodology Doxia Axis uses inside every audit, ready for any operator to run on their own brand today.
Want to run this yourself?
You can. The methodology Doxia Axis uses inside every audit is mechanical, defensible, and runnable by any operator in one afternoon.
Five steps. Roughly 90 minutes. One spreadsheet. The output is your citation share-of-voice — the single most important AI-visibility metric, and the one most operators don't yet track.
The steps below are the same steps the audit dossier reproduces at scale across six engines and 20 to 30 queries. If you want the full audit, /audit ships it free in five business days. If you want to validate the gap yourself first, run the methodology below.
Step 1 — write the query set
Twenty to thirty queries. The kind of queries your prospects would actually type into an AI assistant when looking for a service like yours.
Three buckets, roughly equal weight:
Bucket A — category-shaped queries. "Best [vertical] for [use case]", "top [vertical] in [region]". Six to ten queries.
Bucket B — comparison-shaped queries. "[Competitor] alternatives", "[Brand X] vs [Brand Y]". Five to eight queries.
Bucket C — capability-shaped queries. "How do I [task] for [vertical]", "Tool that does [specific feature]". Five to eight queries.
The full set should cover the realistic spectrum of intent — discovery, evaluation, comparison, capability fit. Worked examples by vertical:
Estate-planning law firm:
- Best estate planning attorney Charlotte NC
- Board-Certified estate planning specialist Charlotte
- Flat-fee estate planning North Carolina
- Trust attorney Charlotte vs estate attorney Raleigh
- Cost of estate planning North Carolina
B2B SaaS in DevTools:
- Best observability platform for mid-market SaaS
- Datadog alternatives for startups
- How do I monitor microservices on a budget
- Open-source observability tools 2026
- LangSmith vs Langfuse comparison
The query set is the foundation. Spend 20 minutes on it. Bias toward queries with measurable buyer intent.
Step 2 — list six engines and three competitors
The engines:
- ChatGPT (OpenAI) — both the standard mode and the search-enabled mode
- Claude (Anthropic) — with web search enabled when available
- Perplexity — citation-chain heavy
- Gemini (Google) — including AI Overview when it triggers
- Microsoft Copilot (Bing-grounded) — workplace context
- Grok (xAI) — emerging surface
The competitors — three to five direct competitors in your category. Not your dream competitors; the real ones who win the same buyers. Write them down before you start running queries; the engines will produce surprises that bias your selection if you pick after.
Step 3 — set up the recording sheet
A simple Google Sheet or Airtable. One row per (query × engine) combination. So if you have 25 queries and 6 engines, you get 150 rows.
Columns:
| Column | Type | Notes | |---|---|---| | Query | text | The verbatim query | | Engine | enum | ChatGPT, Claude, Perplexity, Gemini, Copilot, Grok | | Date / time | timestamp | The engines drift; record when | | Brands cited (verbatim) | text | Comma-separated names in citation order | | Your brand cited? | boolean | Yes / No | | Your brand position | int | 1 = first cited, 2 = second, etc.; null if not cited | | Description accuracy | enum | Correct / Mostly correct / Wrong / N/A | | Quoted page (if any) | URL | The URL the engine quoted from, if visible | | Notes | text | Anything noteworthy |
The sheet structure matters. "Brands cited verbatim" is the load-bearing column — without it, the data is noise. Record exactly what the engine wrote, not your interpretation.
Step 4 — run the queries
Walk through the matrix. 150 rows takes 60 to 75 minutes if you don't get distracted.
Three discipline points:
Discipline 1 — fresh session per engine. Use a new chat, no prior context. Memory features inside ChatGPT, Claude, etc. will bias the answer toward what the engine knows about you specifically. Test with a clean state.
Discipline 2 — record verbatim. Copy the engine's response into the "Brands cited" column. Don't paraphrase. The exact wording is what you'll need for the response-quality scoring.
Discipline 3 — note the citation source if visible. Perplexity and ChatGPT-with-search show the source URL the engine quoted. Record it. The URLs the engines preferentially quote are the URLs your competitors built schema around — they tell you what extraction shapes work.
Step 5 — score the citation share
Three calculations.
Calculation 1 — overall citation share.
your brand cited count / total query count
If your brand was cited 18 times out of 150 rows, your overall citation share is 12%. Below 10% is the bottom quartile across the categories Doxia Axis audits; 12% to 20% is median; 35%+ is top quartile.
Calculation 2 — citation share per engine.
your brand cited count in [engine] / total query count for [engine]
If your brand was cited 8 times out of 25 queries in ChatGPT, your ChatGPT-specific citation share is 32%. The per-engine number tells you which engines are favoring you and which are blind to you.
Most brands we audit show wide variance across engines. A brand might score 35% in Perplexity (which is hyper-citation-friendly) and 5% in Gemini (which weights Google-indexed authority heavily). The variance is diagnostic — a 30-point gap usually points to a specific schema or technical fix.
Calculation 3 — competitor share.
[competitor] cited count / total query count
Run this for each of your three to five named competitors. The output is a competitive scorecard. The brand with the highest competitor citation share is your category leader on AI search. The gap between you and that brand is the gap the engagement is built to close.
What does the output look like?
A worked example from a real audit (anonymized):
| Brand | Citation share | Top engine | Bottom engine | |---|---|---|---| | Competitor A | 68% | Perplexity (88%) | Gemini (52%) | | Competitor B | 52% | ChatGPT (68%) | Grok (32%) | | Competitor C | 38% | Gemini (52%) | Perplexity (24%) | | Audited brand | 0% | All engines (0%) | All engines (0%) |
Zero. Across 30 queries times 6 engines, the audited brand never appeared. Top competitor at 68%. The gap is structural — the audited brand is missing the substrate (schema, llms.txt, third-party citation density) that the cited competitors deployed.
The full sample dossier walking through this output lives at the sample competitive citation benchmark.
What do you do with the output?
Three concrete moves.
Move 1 — set the baseline. Whatever number you get, it's your starting line. Re-run the same query set quarterly to measure drift. The number above is the baseline before any visibility work; the same query set after a Doxia Axis sprint typically shifts 8 to 25 percentage points within 90 days.
Move 2 — diagnose the gap pattern. Per-engine variance tells you what's broken. Wide gap on Perplexity? Probably citation-chain weak — you need inline source links. Wide gap on ChatGPT? Probably schema-extraction weak. Wide gap on Gemini? Probably backlink and traditional-SEO authority weak.
Move 3 — pick the highest-leverage fix. If your citation share is below 10% and your top competitor is above 50%, the fix is the substrate (schema + llms.txt + crawler unblocks). If you're at 30% and the leader is at 50%, the fix is content shape and third-party density. The full diagnosis is what the audit deliverable produces against your specific gap pattern.
What this DIY measurement misses
Three things only a structured audit produces:
Miss 1 — content-shape grading. The DIY method tells you whether you're cited, not what content shape would change the answer. The audit grades section-by-section citability against a structured rubric.
Miss 2 — revenue quantification. The DIY method gives you a percentage. The audit ties each percentage point to a dollar number, sequenced by impact, so you can prioritize the fix.
Miss 3 — full-engine triangulation. The DIY method runs the engines once. The audit runs the same query set across multiple sessions, controls for cache and personalization effects, and produces a confidence-banded share-of-voice number rather than a point estimate.
For most operators, the DIY method is sufficient to see whether the gap exists. The full audit is for sequencing the fix once you've confirmed it.
Where to go from here
- The full metrics list: /answers/ai-visibility-metrics-that-matter.
- Sample competitive benchmark: /case-studies/sample-audit/sample-competitive-citation-benchmark.
- What the audit produces: /answers/what-is-an-ai-visibility-audit.
- Or just request the audit: /audit. Five business days. The dossier ships the citation-share scorecard with revenue tags per gap.