UPDATED 16:01 EDT / MAY 15 2019

BIG DATA

Microsoft open-sources one of the core algorithms powering Bing

Microsoft Corp. today open-sourced one of the cornerstone algorithms powering its Bing search engine in an effort to help developers build faster, more easily navigable applications.

The Space Partition Tree And Graph algorithm, or SPTAG for short, is available under the permissive MIT License. Microsoft has bundled it into a library that includes tools to help developers to incorporate the code into their projects.

SPTAG is what allows Bing to instantly display relevant search results even when a user enters a query that can’t be processed by simply matching keywords to web pages. Looking up the phrase “largest lake in the United States,” for instance, brings up a panel with information about Lake Superior even though there is only one shared word. 

SPTAG makes that possible by transforming queries into data constructs known as vectors. A vector is essentially a long sequence of numbers that can encapsulate various kinds of information, from individual words to entire web pages.

Translating different records into a common numerical format has the benefit of allowing them to be compared more easily. The vector for the phrase “largest lake in the United States” will share similarities with, among others, the vector that Bing generates from the text of the Wikipedia page “List of largest lakes of the United States by area.” And that Wikipedia page has Lake Superior at the top of the ranking.

Bing groups the vectors representing web content based on similarity to speed up searches. “Once the numerical point has been assigned to a piece of data, vectors can be arranged, or mapped, with close numbers placed in proximity to one another to represent similarity. These proximal results get displayed to users, improving search outcomes,” Microsoft detailed in a blog post.

According to the company, SPTAG enables Bing to sift through billions of pieces of data in just a few milliseconds. The search engine has access to a repository of more than 150 billion vectors that is continuously expanded with new content from the web.

One obvious application for SPTAG is improving the search experience for users of collaboration services, email clients and other text-heavily applications. But the algorithm is not limited to processing written content. SPTAG is also capable of generating vectors for images and audio files, which means developers can use it to build advanced capabilities such as automated photo comparison.

SPTAG is available on GitHub.  

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.