Meet Cortex - AI Powered, Expertise Refined Decision EngineYour AI Optimization Engine
Bing Ranking Algorithms

The Complete Guide to Bing SPTAG

SPTAG (Space Partition Tree And Graph) is the open-source approximate-nearest-neighbor library Microsoft Research and Bing use to find semantically related results at billion-vector scale. Here is what it is, what Microsoft has actually confirmed, and what it means for your search visibility.

Key takeaways

SPTAG (Space Partition Tree And Graph) is an open-source approximate-nearest-neighbor (ANN) library Microsoft Research and Bing built to search billions of vector embeddings in milliseconds. It powers part of Bing's semantic layer: when exact keyword matches are sparse, SPTAG finds documents whose meaning is closest to the query. In practice Bing's semantic matching is more conservative than Google's, so exact-match keywords still carry real ranking weight on Bing.

  • SPTAG is a Microsoft-confirmed, MIT-licensed open-source ANN library released on GitHub on May 15, 2019.
  • It searches an index of 100 billion-plus vectors and returns the most related results in about 5 milliseconds, per Microsoft.
  • It uses vector embeddings: deep learning models turn queries and documents into numbers so meaning can be compared, not just keywords.
  • Bing's semantic layer is more conservative than Google's, so exact-match keywords in titles and headings still matter on Bing (practitioner observation, not a Microsoft statement).
  • Microsoft Clarity is a separate, free behavioral-analytics tool; claims that it feeds Bing ranking are industry inference, not confirmed by Microsoft.

What is SPTAG?

Definition

SPTAG (Space Partition Tree And Graph) is an open-source approximate-nearest-neighbor vector search library from Microsoft Research and Microsoft Bing, used to find the documents whose meaning is closest to a search query. It converts queries and documents into numerical vectors, then locates the closest stored vectors among billions, in milliseconds.

SPTAG is one retrieval component, not Bing's entire ranking system. It is part of a broader stack of Bing's ranking algorithms, sitting alongside the neural relevance model RankNet and the conversational layer Prometheus. SPTAG handles the semantic-retrieval job: surfacing meaning-based matches when literal keyword matches are thin.

SPTAG at a glance

Full name
Space Partition Tree And Graph
Built by
Microsoft Research and Microsoft Bing
Released
GitHub, May 15, 2019
License
MIT (open source)
Job
Approximate nearest neighbor (ANN) vector search
Scale
100 billion-plus vectors searched
Speed
About 5 milliseconds per query
Variants
SPTAG-KDT and SPTAG-BKT

What SPTAG actually is

SPTAG stands for Space Partition Tree And Graph. It is an open-source library for large-scale vector approximate-nearest-neighbor (ANN) search, released by Microsoft Research and Microsoft Bing. The project lives on GitHub under the MIT license, which means anyone can read, use, and modify the code.

The core job of SPTAG is narrow but powerful: given a query represented as a vector of numbers, find the stored vectors that are closest to it. Microsoft describes it as a library that provides high-quality vector index building, search, and distributed online serving toolkits for large-scale vector search scenarios. Closeness is measured by either L2 (straight-line) distance or cosine distance.

It is important to be precise about scope. SPTAG is one retrieval component, not Bing's whole ranking system. Microsoft has confirmed SPTAG exists, is open source, and is used in Bing's vector search. Microsoft has not published a complete map of how SPTAG output is blended with traditional keyword ranking, freshness, links, and other signals to produce the final results page.

Vector embeddings and ANN, explained simply

To understand SPTAG you need two concepts: embeddings and approximate nearest neighbor search.

An embedding is a numerical representation of a piece of content. A deep learning model reads a word, a sentence, a document, or even an image and outputs a list of numbers (a vector) that captures what the content means. Two pieces of content with similar meaning end up with vectors that sit close together in this number space, even if they share no words. Microsoft frames a vector as a numerical representation that helps capture what a piece of data actually means.

Nearest neighbor search is the act of finding which stored vectors are closest to a query vector. Doing this exactly across billions of vectors would be far too slow for live search. So engines use approximate nearest neighbor (ANN) search, which trades a tiny amount of accuracy for an enormous speed gain. This is exactly the trade-off SPTAG is built to make.

  • Keyword search matches the literal words a user typed.
  • Vector search matches the meaning behind those words, so it can surface a great answer that uses entirely different wording.

Microsoft's own example: a user asks how tall the tower in Paris is, and Bing can return the Eiffel Tower's height even though the query never says the words Eiffel Tower. That is meaning-based retrieval at work.

How SPTAG works under the hood

SPTAG combines two data structures, which is where the name comes from: a space partition tree and a graph. The library contains two basic modules, an index builder and a searcher.

The graph is a relative neighborhood graph (RNG) built on top of a k-nearest-neighbor graph to boost how well-connected the vectors are. The search itself works in two stages, which the documentation describes plainly: the search begins in the space partition trees to find several seed points, then continues from those seeds through the relative neighborhood graph, and the searches in the trees and the graph are conducted iteratively. The tree gives you a fast, rough starting point; the graph refines it into genuinely close matches.

Microsoft ships two variants:

  • SPTAG-KDT pairs a kd-tree with the relative neighborhood graph. Microsoft notes it is advantageous in index-building cost.
  • SPTAG-BKT pairs a balanced k-means tree with the relative neighborhood graph. Microsoft notes it is advantageous in search accuracy on very high-dimensional data.

The library also supports online vector deletion and insertion and distributed serving across multiple machines, which is how it scales to a production search engine rather than a single benchmark.

The scale: 100 billion-plus vectors in 5 milliseconds

The numbers Microsoft published are the most concrete confirmed facts about SPTAG in practice. According to Microsoft, Bing processes billions of documents every day and can search through an index of 100 billion-plus vectors to find the most related results in about 5 milliseconds. Microsoft program manager Jeffrey Zhu is the source of that figure in the company's 2019 announcement.

Microsoft has since gone further. A 2021 Microsoft Research talk by principal researcher Qi Chen introduced SPTAG++, described as supporting hundreds-of-billions-scale vector search in production with millisecond response time and more than ten thousand queries per second. Related research papers from the same team include SPANN (NeurIPS 2021) on billion-scale ANN, SPFresh (SOSP 2023) on incremental index updates, and VBASE (OSDI 2023) on unifying vector and relational queries.

The takeaway: this is not a research toy. Vector retrieval at this scale runs in live web search, and the response budget is measured in single-digit milliseconds because, as Microsoft's Rangan Majumder put it, even a couple of seconds for a search can make an app unusable.

History of SPTAG: a timeline

SPTAG moved from a Microsoft Research project into open source in 2019, then evolved through a series of published advances toward hundreds-of-billions-scale vector search.

  1. 2019

    Microsoft open-sources SPTAG

    On May 15, 2019, Microsoft released Space Partition Tree And Graph (SPTAG) on GitHub under the MIT license, alongside example techniques and a video from Microsoft's AI Lab.

  2. 2019

    Bing vector search at scale

    Microsoft states Bing can search a 100 billion-plus vector index and return the most related results in about 5 milliseconds.

  3. 2021

    SPANN published

    The SPTAG team publishes SPANN at NeurIPS 2021, a memory-disk hybrid approach to billion-scale approximate nearest neighbor search.

  4. 2021

    SPTAG++ presented

    Principal researcher Qi Chen presents SPTAG++ at Microsoft Research Summit 2021: hundreds-of-billions-scale vector search, millisecond response, 10,000-plus queries per second in production.

  5. 2023

    SPFresh and VBASE

    Related research advances ship: SPFresh (SOSP 2023) for incremental in-place index updates and VBASE (OSDI 2023) for unifying vector similarity and relational queries.

Why Bing's semantic layer is more conservative than Google's

Here is the practitioner-facing point that matters most for your strategy. Across the SEO community, the widely held observation is that Bing applies its semantic and vector layer more conservatively than Google does. Bing tends to keep significant weight on traditional, literal signals: exact-match keywords in page titles, headings, and on-page copy. Google, by contrast, leans harder on interpreting intent and rewarding broad topical coverage even when the exact phrase is absent.

This is an industry inference, not an official Microsoft statement. Microsoft confirms that SPTAG and vector search exist and are used in Bing; it has not published the relative weighting of semantic versus exact-match signals. What practitioners consistently report is the behavior, not the internal formula.

The practical implication is useful precisely because it is conservative. On Bing, the older playbook still pays off:

  • Use the target keyword exactly in the title tag and an H1 or H2, not just a paraphrase.
  • Match the phrasing real users type, rather than relying on the engine to infer synonyms.
  • Treat clean on-page optimization as a ranking lever, not a legacy formality.

Doing this does not hurt you on Google, and it tends to help more on Bing, which is the index that also feeds AI answer engines such as ChatGPT search and Microsoft Copilot. The same conservatism shows up in how Bing weighs UX behavior signals and social signals compared with Google.

Signals that matter given how SPTAG works

SPTAG handles one job: semantic retrieval. But ranking well on Bing depends on a small set of signals working together. Here is where SPTAG sits among them, and which are confirmed versus inferred.

How SPTAG-related and on-page signals influence Bing visibility
Signal What it does for you
Exact-match keyword usage Confirmed-behavior leaning: Bing keeps meaningful weight on exact keywords in titles, H1/H2 headings, and body copy. Use the literal query phrasing, not only paraphrases.
Semantic / topical coverage Vector retrieval via SPTAG surfaces meaning-based matches when exact terms are sparse. Comprehensive coverage of the concept improves your odds of being retrieved as a near neighbor.
Crawlability and indexation Vector and keyword retrieval only help once Bing has the page in its index. Verify in Bing Webmaster Tools and submit URLs via IndexNow to get content indexed faster.
AI answer surfaces (Bing-fed) ChatGPT search and Microsoft Copilot draw on Bing's index, so Bing visibility now extends to AI answers, not just the Bing results page.

The practical takeaway is that pages satisfying both literal phrasing and broad semantic coverage are the best-retrieved on a conservative engine like Bing.

What this means for SEO and AI search visibility

SPTAG matters to you for two compounding reasons. First, Bing's index increasingly underpins AI answer surfaces. ChatGPT search and Microsoft Copilot draw on Bing's results, so being well-retrieved by Bing is no longer just about the Bing search box.

Second, because Bing rewards both exact-match and semantic signals, the highest-leverage content is content that satisfies both at once: pages that use the literal query phrasing and cover the surrounding concepts thoroughly. That dual approach is well-retrieved by a conservative engine like Bing today and remains durable as semantic retrieval matures everywhere.

Concretely, that means writing genuinely comprehensive pages, anchoring them with the exact terms your audience searches, and making sure Bing can actually crawl and index them. Verifying your site in Bing Webmaster Tools and submitting fresh URLs through IndexNow gets new and updated content into the index that feeds these AI surfaces faster.

How to optimize for Bing given how SPTAG works

To get the most out of Bing's hybrid keyword-plus-vector retrieval, pair exact-match phrasing with comprehensive topical coverage, then make sure Bing indexes the page quickly.

  1. Put the exact target keyword in the title tag and an H1 or H2.

    Bing's more conservative semantic layer still rewards literal exact-match signals, so explicit phrasing earns ranking weight that a paraphrase alone may not.

  2. Write comprehensively around the concept, not just the phrase.

    SPTAG retrieves semantic near-neighbors when exact matches are sparse, so thorough topical coverage increases the chance your page is found as a close vector match.

  3. Mirror the literal wording real users type, including questions.

    Bing leans on user-facing phrasing, and Microsoft's own example shows vector search resolving natural-language questions, so matching real query language helps on both fronts.

  4. Verify the site in Bing Webmaster Tools and submit URLs via IndexNow.

    Neither keyword nor vector retrieval can help a page Bing has not indexed; fast indexation also speeds visibility on Bing-fed AI surfaces like ChatGPT search and Copilot.

  5. Install Microsoft Clarity for UX insight, but not as a ranking tactic.

    Clarity is a free, high-value behavioral analytics tool; there is no Microsoft confirmation it feeds Bing ranking, so use it to improve the page, not to chase an unverified signal.

A note on Microsoft Clarity (don't confuse the two)

SPTAG often gets discussed alongside Microsoft Clarity, so it is worth separating them clearly. Clarity is a free behavioral-analytics tool for understanding how people use your site, through session recordings, heatmaps, and event and funnel tracking. Microsoft describes it as a behavioral analytics tool to understand user interaction with your website or app, and it is free forever.

A common claim in SEO circles is that Clarity engagement data feeds Bing's ranking. Treat this as industry inference, not a confirmed fact. Microsoft's Clarity documentation positions it as a UX and conversion analytics product; it does not state that Clarity data is a Bing ranking input. The honest position is that there is no public Microsoft confirmation that installing Clarity changes your Bing rankings. Install Clarity because it is a strong, free way to see user behavior, not because of an unverified ranking benefit. For how Bing does treat behavioral data, see our guide to Bing's Clarity-derived UX signals.

SPTAG myths vs. reality

SPTAG sits at the intersection of search infrastructure and SEO folklore, which breeds confusion. Here are the most common myths and what is actually true.

Myth SPTAG is Bing's ranking algorithm.

Reality SPTAG is one retrieval component, an approximate-nearest-neighbor vector library. It helps find semantically related candidates; it is not the full system that ranks and orders the final results page.

Myth Because Bing uses vector search, exact-match keywords no longer matter.

Reality The opposite is the practitioner consensus. Bing applies its semantic layer more conservatively than Google, so exact-match keywords in titles and headings still carry real weight on Bing.

Myth Microsoft Clarity data feeds Bing's search ranking.

Reality This is industry inference, not a confirmed fact. Microsoft documents Clarity as a free behavioral-analytics tool and does not state that Clarity data is a Bing ranking input.

Myth SPTAG is a proprietary black box you cannot inspect.

Reality SPTAG is open source on GitHub under the MIT license. Anyone can read the code, the algorithms (SPTAG-KDT and SPTAG-BKT), and the documentation.

Myth Vector search means keyword optimization is dead everywhere.

Reality Vector retrieval complements keyword retrieval rather than replacing it. The durable strategy is content that satisfies both literal phrasing and broad semantic coverage at the same time.

Frequently asked questions

SPTAG stands for Space Partition Tree And Graph. It is an open-source approximate-nearest-neighbor (ANN) vector search library from Microsoft Research and Microsoft Bing. The name reflects its design: it combines a space partition tree to find starting seeds with a neighborhood graph that refines those seeds into genuinely close matches.

Yes. Microsoft released SPTAG on GitHub on May 15, 2019, and the entire codebase is under the MIT license. That means anyone can freely read, use, and modify it. The repository documents both algorithm variants, SPTAG-KDT and SPTAG-BKT, along with index-building and search modules.

According to Microsoft, Bing can search an index of 100 billion-plus vectors and return the most related results in about 5 milliseconds. A later evolution, SPTAG++, was presented in 2021 as supporting hundreds-of-billions-scale vector search in production with millisecond response and more than ten thousand queries per second.

Keyword search matches the literal words a user typed. Vector search converts content and queries into numerical embeddings and matches by meaning, so it can surface a strong answer that shares no words with the query. SPTAG is the engine that finds those meaning-based near neighbors quickly at scale.

No. The widely held practitioner observation is that Bing applies its semantic layer more conservatively than Google, so exact-match keywords in titles and headings still carry real weight on Bing. This is an industry inference about behavior, not an official Microsoft statement about Bing's internal ranking weights.

There is no Microsoft confirmation that it does. Microsoft documents Clarity as a free behavioral-analytics tool offering heatmaps, session recordings, and funnel tracking to understand user behavior. The claim that Clarity engagement data influences Bing ranking is industry inference, so treat it as unverified rather than fact.

Use the exact target keyword in your title and a heading, cover the topic comprehensively so vector search can match you on meaning, and mirror the literal phrasing real users type. Then verify the site in Bing Webmaster Tools and submit URLs via IndexNow so Bing indexes the content quickly.

Because AI answer engines, including ChatGPT search and Microsoft Copilot, draw on Bing's index. Being well-retrieved by Bing now extends your visibility into those AI surfaces. Strong on-page keywords plus thorough semantic coverage is the approach that performs across both classic results and AI answers.

The bottom line

Bottom line

SPTAG is the confirmed, open-source engine behind Bing's meaning-based retrieval, capable of searching 100 billion-plus vectors in about 5 milliseconds. But Bing applies that semantic layer conservatively, so exact-match keywords in titles and headings still carry real weight. Write pages that satisfy both at once, exact phrasing plus thorough topical coverage, then verify in Bing Webmaster Tools and submit via IndexNow so the content reaches Bing and the AI surfaces it feeds.

About the author

Capconvert Search Intelligence Team

Search and AI Visibility Research at Capconvert

The Capconvert Search Intelligence Team studies how search engines and AI answer engines retrieve, rank, and cite web content. Their guides translate confirmed engine documentation and field-tested practitioner observation into actionable SEO and GEO strategy.

References

  1. microsoft/SPTAG - GitHub repository (README)
  2. As search needs evolve, Microsoft makes AI tools for better search available to researchers and developers - Microsoft Source
  3. Research talk: SPTAG++: Fast hundreds of billions-scale vector search with millisecond response time - Microsoft Research
  4. Microsoft open-sources a crucial algorithm behind its Bing Search services - TechCrunch
  5. Microsoft goes open source with one of its Bing algorithms - Search Engine Land
  6. Clarity Overview - Microsoft Learn
  7. 5 Big Ways Bing SEO Differs From Optimizing For Google - Search Engine Journal