How AI scoring works — Brandswarm Docs

Five metrics, five surfaces, one composite score. Here's exactly what each one measures and how we calculate it.

The 5 surfaces

For every prompt you track, we send the exact same question to each of:

ChatGPT (GPT-4o) — OpenAI's chat assistant.
Claude (Sonnet 4.5) — Anthropic's chat assistant.
Gemini (1.5 Pro) — Google's chat assistant.
Perplexity — search-grounded AI with explicit citations.
Google AI Overviews — the AI block at the top of Google search results.

We capture the raw response and parse it for your brand name, competitor brands, sentiment cues, and any cited URLs.

Visibility Score (0–100)

The headline number. It's a weighted composite of:

Mention Rate × 40 (how often you appear at all)
Average Position × 25 (when you appear, how prominent)
Sentiment × 20 (positive vs. neutral vs. negative tone)
Citation Rate × 15 (how often your own domain gets cited as a source)

A score of 90+ means you reliably show up first with positive framing and your own site cited. Most newly-tracked brands start in the 30–60 range. Below 30 = significant visibility gap.

Mention Rate

Percentage of scans (prompt × surface combinations) where your brand name appears in the response. If you track 5 prompts and your brand appears in 3 of the 25 (5 prompts × 5 surfaces) total responses, mention rate is 12%.

Average Position

When your brand is mentioned, where does it rank in the list? Position 1 = first brand named (best). Position 5 = fifth (still visible but lower attention). Unmentioned scans don't count toward the average.

Sentiment

For each mention, we classify the surrounding sentence as Positive ("a leading…"), Neutral ("offers basic features"), or Negative ("known for outages"). Sentiment shifts week-over-week are usually more meaningful than absolute numbers.

Citation Rate

Perplexity and Google AI Overviews cite URLs explicitly. We track what percent of those citations point at your own domain (vs. competitors, vs. third-party reviews). High citation rate = your content is the source LLMs trust.

Why scores fluctuate day-to-day

LLM outputs aren't deterministic. Even with temperature 0, the same prompt can return different brand orderings, different mention sets, and slightly different sentiment from one scan to the next. Single-day swings of ±5 points are normal noise. Trust 7-day trends over single data points.

Improving your score

The Recommendations page surfaces ranked actions: write content targeting the prompts where you're missing, request citations from sources LLMs already trust, fix factual errors in how AI describes you. Most score improvements take 2–6 weeks because AI training cutoffs lag the live web.