How we measure 'are AI engines mentioning us' — quarterly answer-engine audits
The question every operator now asks about their digital strategy is some variation of “are we showing up in AI?” The honest answer requires measurement, and measuring AI citations is messier than measuring search rankings. This is the methodology we use for quarterly answer-engine audits.
Why this measurement matters
For most operator-run businesses, AI engines are increasingly part of the buyer’s research process. Some buyers go to ChatGPT first; many use AI as a complement to classic search. The buyers who don’t use AI are still seeing Google AI Overviews when they search.
Whether you’re cited or not affects whether buyers see you. The question of whether you’re cited isn’t speculative — it’s measurable, and operators who measure can act on what they find.
What makes this measurement hard
Three difficulties:
Personalization. AI responses can vary based on user history, geography, and conversation context. Two users running the same query may get different answers. This is less true for queries run in fresh sessions, but the variability is real.
Volatility. AI engines update their training data, retrieval indexes, and prompting on schedules nobody outside the companies fully tracks. A query that produced one citation pattern in March might produce a different pattern in May for reasons that have nothing to do with your work.
Lack of ground truth. Search Console gives you Google’s own data about your queries. No equivalent dashboard exists from OpenAI, Anthropic, Perplexity, or others. You’re sampling externally.
These difficulties don’t make measurement impossible — they make it qualitative as well as quantitative, and they reward consistent methodology over absolute precision.
The query set
A good query set for an operator-run business covers three categories:
Category one — vendor recommendation queries
Queries an ideal customer might ask when looking for someone like you:
- “Best [your category] for [your buyer profile]”
- “Who should I hire for [your service]?”
- “What firms specialize in [your specific scope]?”
- “Recommend a [category] that [specific differentiator]”
These test whether the AI recommends you when it has the chance. The ideal answer cites you among the top recommendations with an accurate description.
Category two — branded queries
Queries that explicitly reference your business:
- “What is [your business name]?”
- “Tell me about [your business name]”
- “Does [your business name] do [specific service]?”
- “What does [your business name] specialize in?”
These test whether the AI knows who you are and represents you accurately. The ideal answer correctly describes what you do, who you serve, and any distinguishing details.
Category three — informational queries in your expertise territory
Queries in topics you have content about:
- “How do I [problem you have content about]?”
- “What’s the difference between [A] and [B] in your domain?”
- “When should I [scenario your content addresses]?”
- “Best practices for [your expertise area]”
These test whether your content gets cited as authority on topics you’ve published about. The ideal answer cites your content as one of the sources, ideally with attribution to your business.
A balanced query set is 20–40 queries across these categories, weighted toward what matters most for your business.
Running the audit
Once a quarter, run each query across the major engines. As of 2026:
- ChatGPT (with web search enabled)
- Perplexity
- Claude (with web search enabled)
- Google AI Overview (via search, capturing the AI-generated response)
- Bing Copilot
For each query and each engine, capture:
- The full response text
- Whether your business is mentioned
- Where in the response (top, middle, buried, in a “see also” section)
- What description accompanies the mention
- Whether a link is provided and whether it’s correct
- Whether the description is accurate, neutral, flattering, or unflattering
Run queries in fresh sessions where possible to minimize personalization effects. Repeat each query 2–3 times to test consistency.
Scoring framework
A consistent rubric makes quarter-over-quarter comparison meaningful. A simple scoring framework:
| Score | Meaning |
|---|---|
| 0 | Not mentioned at all |
| 1 | Mentioned in a peripheral way (e.g., “see also”) |
| 2 | Mentioned among multiple sources, accurate description |
| 3 | Mentioned prominently, accurate and flattering description |
| 4 | Cited as a primary or definitive source |
Aggregate across the query set to get an average citation score per engine and an overall average.
Track the score quarter over quarter. Movement up suggests your AEO work is compounding. Movement down suggests something has changed in either your work, the engines’ treatment of you, or competitors’ work catching up.
What the audit reveals
Three categories of insight typically emerge:
Insight one — gaps to fill
Queries where you’re absent or peripherally mentioned. These are the action items: content gaps, schema gaps, third-party authority gaps, or simply queries where you should be more visible.
Insight two — accuracy issues
Queries where you’re mentioned but the description is wrong or misleading. The AI has a representation of you that doesn’t match what you’d want to communicate. The fix is usually content work — making the right narrative more prominent than the inaccurate one.
Insight three — pattern signals
Patterns across queries that reveal something about how the AI sees you overall. Maybe you’re well-cited for one service but absent for another. Maybe one engine cites you frequently while another doesn’t. These patterns inform strategy.
What to do with the results
The audit produces a punch list. Prioritize:
- High-value gaps where the fix is straightforward (content, schema, basic AEO patterns)
- High-value accuracy issues where the AI has the wrong story about you
- Lower-value gaps where the fix is harder (third-party authority, PR, broader brand work)
- Strategic shifts if the pattern reveals something that should change about positioning
The work is then sequenced for the quarter and reviewed in the next audit. Results compound over 4–8 quarters as content matures, schema strengthens, and citation patterns stabilize.
Tools and automation
The tooling space is maturing. As of 2026, several tools partially automate this work:
- Profound, Athena, and similar AI-citation tracking platforms — automate query running and citation detection across engines
- Custom scripts using AI APIs — viable for businesses with technical capacity; produces real data at low cost
- Manual sampling — still produces the highest-quality qualitative insights
The right approach for most operators is a hybrid: tooling for breadth (running 100+ queries automatically) combined with manual review of the most important queries for qualitative depth.
What “we handle” looks like at this layer
For operators who want to track and improve AI citations without running the audit themselves:
- The query set is built from the operator’s actual buyer profile and competitive context
- The audit runs quarterly with consistent methodology and tooling
- Results get a written summary with prioritized action items
- The action items become part of the ongoing content, schema, and authority work
- Trends are tracked over multiple quarters with clear visibility into what’s working
This is the highest-leverage AEO measurement an operator-run business can do, and it’s the work most providers don’t offer because the methodology is still maturing. We do it because the alternative — flying blind on whether AI engines are citing your business — isn’t an acceptable answer for premium operators in 2026.
Where to start if you’ve never run this
A first audit you can run yourself in roughly two hours:
- Write down 10 queries an ideal customer might ask. Mix recommendation, branded, and informational.
- Run each query in ChatGPT, Perplexity, and Google (with AI Overview). Just three engines for a first audit.
- Note for each: are you mentioned? Accurately? Prominently?
- Add up the scores. This is your baseline.
- Identify the three biggest gaps and write down what would close them.
You now have a measurement and a plan. The work to close the gaps is the next several quarters of AEO investment, and the next audit tells you whether the work is paying back.
The audit itself is straightforward; running it consistently is what produces the compounding signal. Operators who measure are positioned to improve. Operators who don’t are guessing.
You don't have to act on any of this yourself.
Everything in this article — the strategy, the build, the integration, the ongoing tending — is the kind of work we own end-to-end for premium operators. One partner. One number. Off your plate.
SEO & AEO
- April 17, 2026
Why your operator-tier business should stop chasing keyword rankings alone
Keyword rankings used to be the right metric for organic strategy. They're not anymore — here's what to track instead and why the shift matters for premium operators.
- March 31, 2026
Schema markup: the part of SEO that actually still matters
What schema markup is, which schemas are worth implementing, and why structured data has become more valuable for AEO than it ever was for classic SEO.
- March 13, 2026
What ChatGPT and Perplexity look at when recommending a vendor
An honest look at how AI answer engines decide which vendors to recommend — what signals they weigh, how to be cited, and where the practice is still evolving.