GLOSSARY
Search Marketing FAQ
Concise answers to the most common questions relevant to SEO, GEO, CRO, and PPC. Filter by discipline, platform, and topic. Cortex references its corpus of platform-published best practices to draft each answer, with citations linking back to the source documents.
Showing 1345-1368 of 1947 questions
How do I block ClaudeBot?GEOSEOClaudeAI Overviews & CitationsRobots.txt+
Add 'User-agent: ClaudeBot' then 'Disallow: /' to robots.txt. ClaudeBot is Anthropic's training crawler. Note: blocking ClaudeBot does not affect Claude's web search feature (which uses real-time retrieval, not training data). Most sites benefit from allowing both bots for maximum AI engine visibility.
in llms.txt & AI Bot Management
How do I block PerplexityBot?GEOSEOPerplexityAI Overviews & CitationsRobots.txt+
Add 'User-agent: PerplexityBot' then 'Disallow: /' to robots.txt. Note: Perplexity has been accused of ignoring robots.txt in some cases. For stronger blocking, configure server-level or Cloudflare blocks. Blocking PerplexityBot reduces Perplexity's ability to cite your site, lowering citation share in their answers.
in llms.txt & AI Bot Management
How do I block Bytespider?GEOSEOCrawl EfficiencyRobots.txt+
Add 'User-agent: Bytespider' then 'Disallow: /' to robots.txt. Bytespider is ByteDance's (TikTok parent) AI training crawler, known for aggressive crawling. Many sites block Bytespider to reduce server load. Cloudflare AI Audit also offers one-click Bytespider blocking. ByteDance respects robots.txt directives in most cases.
in llms.txt & AI Bot Management
What is Cloudflare AI Audit?GEOSEOReporting & KPIs+
Cloudflare AI Audit is a free dashboard showing which AI bots crawl your site, how much data they pull, and offering one-click controls to block or allow specific bots. Available to all Cloudflare-fronted sites. Combines bot identification, traffic analytics, and policy controls. Most actionable AI bot management tool in 2026.
in llms.txt & AI Bot Management
How does Cloudflare AI Audit help manage AI bot traffic?GEOSEOPerplexityClaudeAI Overviews & CitationsReporting & KPIsRobots.txt+
Three capabilities. Identify: shows AI bot traffic by user agent (GPTBot, ClaudeBot, PerplexityBot, etc.). Quantify: data pulled per bot, page-level breakdown. Control: one-click allow/block per bot or per URL pattern. Replaces manual robots.txt management with a visual dashboard. Available to all Cloudflare-fronted sites at no additional cost.
in llms.txt & AI Bot Management
Can Cloudflare block AI crawlers automatically?GEOSEORobots.txt+
Yes. Cloudflare offers AI Bot Management with one-click 'Block AI Bots' that disallows known AI training crawlers while preserving access for search engines and legitimate visitors. Available on Pro and higher plans. Provides server-level enforcement that supersedes robots.txt (which some bots ignore). The strongest AI bot control mechanism for non-technical site owners.
in llms.txt & AI Bot Management
Which AI bots respect robots.txt?GEOSEOPerplexityClaudeAI Overviews & CitationsCrawl EfficiencyRobots.txt+
Reputable AI bots respect robots.txt: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot (mostly), Google-Extended, Bytespider (mostly). Less reputable scrapers may ignore. Recent investigations have found Perplexity and others occasionally ignoring directives. For strict control, layer Cloudflare or server-level blocks behind robots.txt as defense-in-depth.
in llms.txt & AI Bot Management
Do AI bots ignore robots.txt?GEOSEOPerplexityAI Overviews & CitationsReporting & KPIsRobots.txt+
Most reputable bots respect it. Some have been documented ignoring it in specific cases (Perplexity, Anthropic, OpenAI have all faced reporting on this). The reliability of robots.txt as the only AI bot control mechanism has weakened in 2024-2026. Combine robots.txt with server-level or Cloudflare-level enforcement for stronger guarantees.
in llms.txt & AI Bot Management
What is the best way to control AI crawling: robots.txt, server rules, or Cloudflare?GEOSEORobots.txt+
Three-layer stack. robots.txt for low-effort baseline (respected by most reputable bots). Server-level rules (Nginx, Apache config) for stricter enforcement. Cloudflare AI Bot Management for visual control and one-click policy updates. Most sites use robots.txt + Cloudflare; enterprise sites add server rules. Pick by site complexity and risk tolerance.
in llms.txt & AI Bot Management
What is the difference between crawling, indexing, and training for AI bots?GEOSEOChatGPTPerplexityGeminiAI Overviews & Citations+
Crawling: bot fetches your pages. Indexing: bot stores the content for retrieval. Training: content used to train the AI model. Search-enabled AI engines (ChatGPT with search, Perplexity, Gemini grounded) use real-time retrieval (crawl + index) rather than training data. Blocking training crawlers (GPTBot) doesn't block real-time retrieval (OAI-SearchBot).
in llms.txt & AI Bot Management
How do I make content discoverable to AI bots without exposing everything?GEOSEOStructured Data / SchemaAI Overviews & CitationsRobots.txt+
Block AI bots from sensitive sections (admin, internal tools, pricing pages) via robots.txt. Allow AI bots on public content. Use llms.txt to highlight specific high-value URLs. Add Schema.org structured data to extractable content. The goal is selective visibility: AI sees what you want them to cite, not your entire site.
in llms.txt & AI Bot Management
What are the risks of blocking AI bots?GEOSEOAI Overviews & Citations+
Lost visibility in AI engine citations. AI search referrals (limited but growing). Future indexing if AI search becomes a major traffic source. Most sites should err on the side of allowing AI bots unless server costs or content theft are real concerns. Test impact by monitoring AI bot crawl volume vs citation share before blocking.
in llms.txt & AI Bot Management
How do I test whether GPTBot, ClaudeBot, or PerplexityBot can access my site?GEOSEOPerplexityClaudeAI Overviews & CitationsRobots.txt+
Use curl with the bot's user agent: 'curl -A "GPTBot" -I https://yoursite.com/some-page'. Check the response code (200 means allowed; 403 or 404 means blocked). Or use Google's robots.txt tester with the appropriate user agent. Verify both robots.txt rules and any server-level blocks (Cloudflare, Nginx) before assuming access state.
in llms.txt & AI Bot Management
What is the best practice for AI bot management on a modern website?GEOSEOPerplexityClaudeAI Overviews & CitationsRobots.txt+
Five-step setup. Publish robots.txt with selective AI bot rules. Publish llms.txt with a clean site summary. Enable Cloudflare AI Bot Management or equivalent for visibility. Monitor AI bot traffic monthly via server logs or Cloudflare AI Audit. Block known scrapers (Bytespider, GPTBot if training is a concern); allow citation bots (OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended).
in llms.txt & AI Bot Management
Is RDFa still used for structured data, and how does it compare to JSON-LD?SEOStructured Data / Schema+
RDFa is rarely used today. JSON-LD is the dominant format (Google's recommended) because it lives in a separate script tag and doesn't pollute HTML markup. RDFa inlines structured data into HTML attributes - harder to maintain and prone to breakage when designers edit markup. Migrate any RDFa or microdata to JSON-LD as a one-time cleanup.
in Schema Formats & Validation
Which schema format is best for SEO: JSON-LD, microdata, or RDFa?SEOStructured Data / Schema+
JSON-LD. Google's recommended format; cleanly separated from HTML markup; easier to maintain; supported by every major search engine and AI engine. Microdata and RDFa still validate but offer no advantage. The migration cost is one-time; the maintenance benefit is permanent. Standardize on JSON-LD across the site.
in Schema Formats & Validation
Should I use JSON-LD, microdata, or RDFa for structured data?SEOStructured Data / Schema+
JSON-LD. Three reasons. Recommended by Google. Cleaner separation from HTML (no markup pollution). Easier to maintain (one script block per page, not scattered attributes). Use JSON-LD for all new structured data. Migrate legacy microdata or RDFa during scheduled refactors, not as urgent cleanup.
in Schema Formats & Validation
Where should JSON-LD be placed in the HTML document?SEOStructured Data / Schema+
Inside <script type='application/ld+json'> tags. Placement: head or body both work for Google. Head placement is conventional and faster for crawlers to discover. For dynamic JSON-LD injected via JavaScript, ensure it's in the DOM by the time the page reaches rendering. Multiple JSON-LD blocks per page are allowed.
in Schema Formats & Validation
Can structured data be added in both the <head> and <body>?SEOGoogleStructured Data / SchemaSERP Features+
Yes. Google parses JSON-LD from anywhere in the document. Convention is head for static schemas; body for content-specific schemas (Article schema near the article, FAQPage near the FAQ). Multiple JSON-LD blocks on one page are fine. Tools like Google Rich Results Test detect all blocks regardless of placement.
in Schema Formats & Validation
Can structured data be added dynamically with JavaScript?SEOStructured Data / Schema+
Yes, with caveats. Google's Web Rendering Service renders JavaScript and detects dynamically-injected JSON-LD. But injection delays the visibility - some bots (and AI engines without full rendering) may miss it. Best practice: server-render JSON-LD when possible. Dynamic injection works but adds risk for partial-rendering crawlers.
in Schema Formats & Validation
How do I validate Schema.org structured data before publishing?SEOGoogleStructured Data / SchemaSERP Features+
Three validators. Schema.org Validator (validator.schema.org) - generic syntax + semantic checks. Google Rich Results Test (search.google.com/test/rich-results) - Google-specific rich result eligibility. Structured Data Testing Tool (SDTT) for bulk batch testing via CLI. Validate before deploying. Re-validate after deploy to catch rendering issues.
in Schema Formats & Validation
What is the Google Rich Results Test used for?SEOGoogleStructured Data / SchemaSERP Features+
Tests whether a page is eligible for Google's rich result features (FAQ snippets, breadcrumbs, product cards, recipe cards, etc.). Reports which structured data types Google detected, errors that block eligibility, and warnings for missing recommended properties. Run before publishing any page with structured data. Available free at search.google.com/test/rich-results.
in Schema Formats & Validation
How can I check whether Google has detected my structured data?SEOGoogleStructured Data / SchemaAnalytics & TrackingIndexingSERP Features+
Three sources. Google Search Console -> Enhancements reports (per schema type: FAQs, Articles, Products, etc.). Rich Results Test on individual URLs. URL Inspection tool in GSC for live structured data view. GSC Enhancement reports are the definitive site-wide view; Rich Results Test is for one-off checks. Check weekly during rollout, monthly thereafter.
in Schema Formats & Validation
What are the most common structured data errors?SEOStructured Data / Schema+
Eight common errors. Missing required properties (name, image for Product; question/answer for FAQPage). Wrong @type for the content. Invalid date formats (use ISO 8601). Image URLs not absolute. Mismatched content (schema says one thing, page shows another). Multiple Organization schemas on one page. AggregateRating without reviews. Schemas that don't match visible content.
in Schema Formats & Validation