How AI scoring works
Five metrics, five surfaces, one composite score. Here's exactly what each one measures and how we calculate it.
The 5 surfaces
For every prompt you track, we send the exact same question to each of:
- ChatGPT (GPT-4o) — OpenAI's chat assistant.
- Claude (Sonnet 4.5) — Anthropic's chat assistant.
- Gemini (1.5 Pro) — Google's chat assistant.
- Perplexity — search-grounded AI with explicit citations.
- Google AI Overviews — the AI block at the top of Google search results.
We capture the raw response and parse it for your brand name, competitor brands, sentiment cues, and any cited URLs.
Visibility Score (0–100)
The headline number. It's a weighted composite of:
- Mention Rate × 40 (how often you appear at all)
- Average Position × 25 (when you appear, how prominent)
- Sentiment × 20 (positive vs. neutral vs. negative tone)
- Citation Rate × 15 (how often your own domain gets cited as a source)
A score of 90+ means you reliably show up first with positive framing and your own site cited. Most newly-tracked brands start in the 30–60 range. Below 30 = significant visibility gap.
Mention Rate
Percentage of scans (prompt × surface combinations) where your brand name appears in the response. If you track 5 prompts and your brand appears in 3 of the 25 (5 prompts × 5 surfaces) total responses, mention rate is 12%.
Average Position
When your brand is mentioned, where does it rank in the list? Position 1 = first brand named (best). Position 5 = fifth (still visible but lower attention). Unmentioned scans don't count toward the average.
Sentiment
For each mention, we classify the surrounding sentence as Positive ("a leading…"), Neutral ("offers basic features"), or Negative ("known for outages"). Sentiment shifts week-over-week are usually more meaningful than absolute numbers.
Citation Rate
Perplexity and Google AI Overviews cite URLs explicitly. We track what percent of those citations point at your own domain (vs. competitors, vs. third-party reviews). High citation rate = your content is the source LLMs trust.
Why scores fluctuate day-to-day
LLM outputs aren't deterministic. Even with temperature 0, the same prompt can return different brand orderings, different mention sets, and slightly different sentiment from one scan to the next. Single-day swings of ±5 points are normal noise. Trust 7-day trends over single data points.
Improving your score
The Recommendations page surfaces ranked actions: write content targeting the prompts where you're missing, request citations from sources LLMs already trust, fix factual errors in how AI describes you. Most score improvements take 2–6 weeks because AI training cutoffs lag the live web.
Want to see exactly what an LLM said about your brand? Open the Inspector page in your dashboard — it shows the raw response per scan.