UPDATED 22:08 EDT / JULY 18 2017

BIG DATA

Yandex open-sources CatBoost, a machine learning library that can be trained with minimal data

Russia’s search engine market leader Yandex Europe AG has just open-sourced a new machine learning library called CatBoost.

The company is the latest in a long line of tech giants to offer a machine learning framework, following in the footsteps of Google Inc., Facebook Inc., Microsoft Corp. and others. However, while these companies have focused on building neural networks, systems modeled on the human brain that can be trained to recognize specific objects, images and events, CatBoost is described as a “gradient boosting” library.

Gradient boosting is a branch of machine learning that aims to teach systems when there’s only a limited amount of data available, with a specific focus on transactional or historical data, Yandex’s head of machine intelligence and research, Misha Bilenko, explained in a blog post.

The method is “widely applied to the kinds of problems businesses encounter every day like detecting fraud, predicting customer engagement and ranking recommended items like top web pages or most relevant ads,” Bilenko said. “It delivers highly accurate results even in situations where there is relatively little data, unlike deep learning frameworks that need to learn from a massive amount of data.”

According to CatBoost’s Github page, the framework is designed for “open-source gradient boosting on decision trees.” In other words, it offers a way to classify and rank data via a collection of decision-making mechanisms, called “learners,” rather than just one. The results generated by these learners are weighted and classified based on the strengths and weaknesses of each one. The idea is that by combining multiple learners, CatBoost can produce more accurate results than frameworks that use only one learner.

Bilenko said Yandex has already begun using CatBoost with its own services. The framework is replacing the older MatrixNet machine learning algorithm that Yandex uses for tasks such as search engine rankings, weather forecasts, recommendations and even its Yandex.Taxi service, which is being spun off in a $3.72 billion joint venture with ride-sharing company Uber Technologies Inc. Yandex said the transition from MatrixNet to CatBoost has already started and should be complete within a few months.

In addition, Yandex is making CatBoost available as a free service under an Apache Software Foundation license, which means anyone can use it in their own programs and services.

One organization that has already taken Yandex up on this offer is CERN, the Switzerland-based European Organisation for Nuclear Research, which is using CatBoost to improve the performance of its particle identification systems. “Catboost will improve how efficiently we can identify charged particles, providing greater accuracy in the selection of our data,” said Marianna Fontana and Donal Hill, coordinators of the particle identification project in LHCb.

“By making CatBoost available as an open-source library, we hope to enable data scientists and engineers to obtain top-accuracy models with no effort, and ultimately define a new standard of excellence in machine learning,” Bilenko said.

Image: Yandex

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.