UPDATED 22:08 EDT / JULY 18 2017

BIG DATA

Yandex open-sources CatBoost, a machine learning library that can be trained with minimal data

Russia’s search engine market leader Yandex Europe AG has just open-sourced a new machine learning library called CatBoost.

The company is the latest in a long line of tech giants to offer a machine learning framework, following in the footsteps of Google Inc., Facebook Inc., Microsoft Corp. and others. However, while these companies have focused on building neural networks, systems modeled on the human brain that can be trained to recognize specific objects, images and events, CatBoost is described as a “gradient boosting” library.

Gradient boosting is a branch of machine learning that aims to teach systems when there’s only a limited amount of data available, with a specific focus on transactional or historical data, Yandex’s head of machine intelligence and research, Misha Bilenko, explained in a blog post.

The method is “widely applied to the kinds of problems businesses encounter every day like detecting fraud, predicting customer engagement and ranking recommended items like top web pages or most relevant ads,” Bilenko said. “It delivers highly accurate results even in situations where there is relatively little data, unlike deep learning frameworks that need to learn from a massive amount of data.”

According to CatBoost’s Github page, the framework is designed for “open-source gradient boosting on decision trees.” In other words, it offers a way to classify and rank data via a collection of decision-making mechanisms, called “learners,” rather than just one. The results generated by these learners are weighted and classified based on the strengths and weaknesses of each one. The idea is that by combining multiple learners, CatBoost can produce more accurate results than frameworks that use only one learner.

Bilenko said Yandex has already begun using CatBoost with its own services. The framework is replacing the older MatrixNet machine learning algorithm that Yandex uses for tasks such as search engine rankings, weather forecasts, recommendations and even its Yandex.Taxi service, which is being spun off in a $3.72 billion joint venture with ride-sharing company Uber Technologies Inc. Yandex said the transition from MatrixNet to CatBoost has already started and should be complete within a few months.

In addition, Yandex is making CatBoost available as a free service under an Apache Software Foundation license, which means anyone can use it in their own programs and services.

One organization that has already taken Yandex up on this offer is CERN, the Switzerland-based European Organisation for Nuclear Research, which is using CatBoost to improve the performance of its particle identification systems. “Catboost will improve how efficiently we can identify charged particles, providing greater accuracy in the selection of our data,” said Marianna Fontana and Donal Hill, coordinators of the particle identification project in LHCb.

“By making CatBoost available as an open-source library, we hope to enable data scientists and engineers to obtain top-accuracy models with no effort, and ultimately define a new standard of excellence in machine learning,” Bilenko said.

Image: Yandex

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU