What is the CiteLayer Index?

The CiteLayer Index measures five structural signals that published AI research correlates with citation probability in generative answer engines. Each dimension maps to a stage in the RAG pipeline: Findability (Crawl), Describability (Parse), Summarizability (Chunk), Comparability (Retrieve), and Recommendability (Cite).

How is the CiteLayer AI Score calculated?

Each of five dimensions (Findability, Describability, Summarizability, Comparability, Recommendability) is scored 0-10 based on automated checks evaluating structural signals. The five scores sum to the CiteLayer AI Score (0-50), which is translated into a letter grade from A+ to F.

What research is the CiteLayer methodology based on?

The CiteLayer Index operationalizes findings from seven published studies including the Princeton GEO Study (KDD 2024), Rand Fishkin/SparkToro's analysis of 2,961 prompts, NVIDIA's chunking research, AirOps brand visibility study, and others. CiteLayer does not claim to invent the underlying research — it applies published findings into a diagnostic framework.

What is the difference between SEO and AEO?

SEO (Search Engine Optimization) focuses on ranking in traditional search results. AEO (Answer Engine Optimization) focuses on being cited by AI answer engines like ChatGPT, Perplexity, and Google AI Overviews. A page can rank #1 on Google and be invisible to AI if it lacks structured data or can't be extracted into citable passages. The CiteLayer Index measures AEO readiness, not SEO performance.

Does a high CiteLayer score guarantee AI citations?

No. The CiteLayer Index measures structural readiness — whether your content has the signals that make citation possible and probable. AI responses vary by query, region, and model version. Research shows fewer than 1 in 100 identical prompts produce the same brand list. What the Index measures is whether you have the prerequisites for citation.

The CiteLayer Index — Methodology

Why This Matters Now

AI-referred website sessions grew 527% year-over-year in 2025. Traditional search volume is projected to drop 25% by 2026 and 50% by 2028. The question isn’t whether AI answer engines will replace search — it’s whether your business is structured to be cited when they do.

The CiteLayer Index doesn’t measure SEO. It measures whether AI systems can find, understand, extract, compare, and recommend your business when someone asks a relevant question. These are different problems with different solutions.

527%

YoY growth in AI-referred
website sessions (2025)

88%

of AI citations come from
non-Google-page-one sources

2.8x

higher citation rate for
pages with structured data

How AI Selects Sources

Every major AI answer engine uses Retrieval-Augmented Generation (RAG) — a two-stage process where the system first retrieves relevant documents from an index, then synthesizes an answer citing the sources it relied on. The CiteLayer Index measures readiness at each stage of this pipeline.

Crawl

Findability

Parse

Describability

Chunk

Summarizability

Retrieve

Comparability

Cite

Recommendability

Five Dimensions, Five Pipeline Stages

🔍

Findability

Can AI crawlers access and index your content?

Maps to the RAG ingestion stage. If GPTBot, ClaudeBot, or PerplexityBot is blocked in robots.txt — or if your content requires JavaScript to render — nothing downstream matters. AI crawlers do not execute JavaScript. Content must be visible in raw HTML.

We check: crawler permissions (robots.txt), redirect chains, sitemap presence, server-side rendering, content accessibility without JavaScript execution, and /llms.txt file presence. A valid llms.txt file earns a scored point — it gives AI systems a structured summary of your business, services, and key pages in a format designed for machine consumption.

Research basis: 60% of ChatGPT queries are answered from parametric knowledge alone — but the other 40% rely on real-time retrieval. If your pages aren’t crawlable, you’re excluded from that 40% entirely. Princeton GEO Study →

🏷️

Describability

Does AI have structured data to understand what you do?

Maps to the knowledge graph construction stage. AI systems build internal entity representations from structured data — Organization schema, LocalBusiness schema, FAQPage markup, product/service attributes. Without this, AI guesses what you offer — or cites a competitor who made it explicit.

We check: JSON-LD schema markup, entity consistency, business attribute completeness (hours, location, services, reviews), and whether schema is server-side rendered.

Research basis: A Relixir study of 50 sites found FAQPage schema produced 2.7x higher citation rates (41% vs. 15%). Microsoft confirmed at SMX Munich (March 2025) that schema markup directly helps LLMs understand content.

📄

Summarizability

Can AI extract clean, citable passages from your content?

Maps to the RAG chunking and extraction stage. AI systems break pages into segments and evaluate each for relevance. Content structured as modular, self-contained sections (200-500 words each) with clear headings and direct answers in the first paragraph scores highest.

We check: first-paragraph answer density, heading structure (questions vs. vague labels), content modularity, FAQ presence, and overall extractability.

Research basis: NVIDIA’s chunking research found page-level chunking achieves 0.648 accuracy with the lowest variance. The Princeton GEO study showed well-designed content optimization boosts source visibility by up to 40%. Pages with first-answer paragraphs under 40 words generated 67% more AI citations.

⚖️

Comparability

Does AI have enough data to compare you against alternatives?

Maps to the retrieval and ranking stage. When a user asks “best X in Y,” AI must compare entities across consistent attributes. Businesses with structured, explicit differentiation data — pricing, specialties, service areas, unique value — win the comparison. Those without it lose by default.

We check: brand entity consistency across platforms, sameAs profile links, category identification, contact detail consistency, and cross-platform presence.

Research basis: Rand Fishkin’s study (2,961 prompts across ChatGPT, Claude, and Google AI) found entities present across knowledge graphs, document indices, AND concept spaces are chosen “far more reliably.” Brands on 4+ platforms are 2.8x more likely to appear in ChatGPT responses.

⭐

Recommendability

Does AI have trust signals to justify citing you by name?

Maps to the citation and attribution stage. AI systems need verifiable trust signals — third-party mentions, review data, content depth, freshness signals — to justify recommending a specific entity. Without these, AI defaults to safer, more documented alternatives.

We check: third-party credibility signals, review data accessibility, content depth and freshness, competitive positioning signals, and citation history across AI platforms.

Research basis: Brand search volume has a 0.334 correlation with LLM citations — the strongest single predictor. Adding statistics to content boosts visibility by 22%. Adding quotations: 37%. Semantic completeness shows 0.87 correlation with citation inclusion.

Scoring Methodology

Each dimension is scored 0-10 based on automated checks that evaluate structural signals. The five dimension scores sum to the CiteLayer AI Score (0-50). Letter grades translate the composite score into an at-a-glance assessment.

Grade	Score Range	What It Means
A+ / A / A-	42-50	AI systems can reliably find, describe, extract, compare, and recommend your business.
B+ / B / B-	33-41	Most structural signals are in place. Targeted improvements will close remaining gaps.
C+ / C / C-	24-32	Partial visibility. AI can find you but lacks enough data to consistently recommend you.
D+ / D / D-	15-23	Significant structural gaps. AI defaults to competitors with better-structured data.
F	0-14	Structurally invisible to AI answer engines regardless of traditional SEO performance.

Citation Status

In addition to the five structural dimension scores, every CiteLayer audit includes a Citation Status check — a real-time test of whether your business is currently being recommended by AI answer engines for your niche and location.

Citation Status has three states:

Appearing — Your business name appears in AI results when someone asks for recommendations in your category.
Partially Appearing — Your business appears in some queries but not others.
Not Appearing — Your business does not appear in any tested AI recommendation queries.

Unlike the structural dimension scores, Citation Status reflects actual AI behavior at the time of the scan — not just whether your site is structurally capable of being cited. The gap between a high structural score and a “Not Appearing” Citation Status is the most important signal in the audit: your site is built correctly, but AI systems aren’t citing you yet. Closing that gap is what CiteLayer implementation delivers.

What We Don’t Measure (And Why)

The CiteLayer Index deliberately excludes traditional SEO metrics like keyword rankings, backlink profiles, and domain authority. These are valuable for Google Search — but AI answer engines use a fundamentally different selection process. A page can rank #1 on Google and be completely invisible to ChatGPT if it lacks structured data, isn’t crawlable by AI bots, or can’t be cleanly extracted into a citable passage.

We also don’t claim to predict exact AI responses. Rand Fishkin’s research showed fewer than 1 in 100 identical prompts produced the same brand list from AI systems. What we measure is structural readiness — whether your content has the signals that make citation possible and probable, not guaranteed.

Research Foundation

GEO: Generative Engine Optimization

Princeton, Georgia Tech, Allen AI, IIT Delhi — Published at KDD 2024. Foundational paper defining generative engine optimization. Well-designed optimizations boost source visibility by up to 40%.

AI Recommendations Are Inconsistent — Here’s Why and How to Fix It

Rand Fishkin / SparkToro — 2,961 prompts across ChatGPT, Claude, and Google AI. Multi-representation entities are chosen far more reliably.

Best Chunking Strategy for Accurate AI Responses

NVIDIA Research — Page-level chunking achieves 0.648 accuracy with lowest variance across document types. Validates modular content architecture.

How Citations & Mentions Impact Brand Visibility in AI Search

AirOps — Brand search volume has 0.334 correlation with LLM citations, the strongest single predictor. Cross-platform presence increases citation 2.8x.

LLM Consistency and Recommendation Share: The New SEO KPI

Search Engine Land — Visibility percentage across multiple runs is statistically meaningful. Defines recommendation share as a measurable KPI.

Generative Engine Optimization: Complete 2026 Guide

Frase — Comprehensive analysis of GEO factors. Adding statistics: +22% visibility. Adding quotations: +37% visibility. Semantic completeness: 0.87 correlation.

Comparative Analysis of LLM Citation Behavior

Search Atlas — Platform-specific citation patterns across ChatGPT, Perplexity, Gemini. Wikipedia: 47.9% of ChatGPT citations. Reddit: 46.7% of Perplexity citations.

The CiteLayer Index measures structural signals correlated with AI citation in published research. It does not guarantee citation by any specific AI platform. AI system responses vary by query, region, index state, and model version. The methodology is updated as new research emerges. Signal assessment is point-in-time as of scan date.

CiteLayer AI does not claim authorship of the underlying research. We operationalize published findings into a diagnostic framework. All research sources are cited and linked above.