Perplexity is built as a retrieval-first AI search engine, where the model is a synthesis layer working from retrieved evidence rather than the source of truth - so a citation depends on being both retrievable and cleanly extractable.
PerplexityBot. Perplexity runs two distinct agents, and the difference decides whether a site is reachable. PerplexityBot is the indexing crawler - independent from Bing, Google, and Brave - that builds the search index Perplexity links to, and it obeys robots.txt, so a Disallow keeps your pages out of that index. Perplexity-User is the separate, user-triggered fetcher that visits a page in real time when someone asks a question, and because a person requested it, Perplexity states it generally ignores robots.txt. The practical takeaway is that blocking PerplexityBot removes you from proactive retrieval even though a live user fetch can still reach you. For GEO, you want PerplexityBot explicitly allowed in robots.txt and through any WAF or IP filtering, because being in the index is what makes you eligible for citation rather than dependent on a one-off live fetch.
Multi-source retrieval. Perplexity does not rely on a single backbone - for a given query it runs over its own crawler-built index alongside third-party search results, and returns a set of candidate sources rather than a single best link. The system combines keyword and semantic retrieval, so a page can surface either because it matches the literal terms or because it is conceptually close to the question. The exact mix is dynamic per query, weighted toward what looks current and relevant for that intent. This is the gate before anything can be cited: if your page is neither in Perplexity's index nor returned by the third-party search it pulls from, it never enters the candidate pool. For GEO, the first job is simply being retrievable for your priority queries - indexed by PerplexityBot and ranking in the conventional search results Perplexity draws on.
Reranking + chunk selection. The candidate sources are not handed to the model whole. Perplexity reranks them on signals like authority, relevance, factual density, and freshness, then breaks the survivors into passages and selects the specific chunks that best answer the query. Those chunks become the grounding context, and each inline citation maps a claim back to the passage it came from. The unit of citation is therefore the passage, not the page - a page can be retrieved yet still lose if its relevant answer is buried in long, meandering prose the reranker cannot cleanly lift. For GEO, this is why extractable, passage-level structure matters: a direct answer in the opening lines, self-contained sections, clear headings, and tight paragraphs give the reranker a clean chunk to choose and cite.
Multi-model generation. Perplexity does not depend on one model to write the answer. It routes generation across frontier models from other labs - OpenAI's GPT and Anthropic's Claude - alongside its own in-house Sonar family built specifically for grounded search and reasoning. The model that gets selected shapes the tone, depth, and reasoning style of the synthesis, and Pro users can pin a preferred model. Critically, the model choice does not decide which sources get cited - that is settled upstream by the retrieval and rerank layers - so the citation game is won before generation begins. For GEO, the implication is freeing: you optimize for retrieval and extractability, not for any single model's quirks, because whichever model writes the answer is drawing from the same selected chunks.
Pro Search & Deep Research. These are Perplexity's multi-step research modes, and they widen the surface for citation. Pro Search runs a deeper pass than a standard query, issuing follow-up searches and pulling in several times more sources before answering. Deep Research goes further still - it runs many searches across a topic, reads through the results, and synthesizes a longer, heavily cited report rather than a single answer. Because both modes retrieve and cite far more sources per question, they reward breadth of credible coverage: a brand referenced across multiple authoritative pages has many more chances to be pulled in. For GEO, this is where sustained, topic-wide presence pays off - thin single-page coverage may catch a quick answer, but the multi-step modes surface the brands that own the whole subject.