Twitter’s Big Data crunching ‘BotMaker’ muscles in on spam
If you’re an avid Twitter user, you might have noticed a significant drop in the amount of spam messages and tweets bugging you. That’s because Twitter’s introduced a new anti-spam system called BotMaker that’s helped it to achieve a 40 percent reduction in its key spam metrics.
Twitter’s Raghav Jeyaraman describes in a lengthy blog post why fighting Twitter spam is much more challenging than defending against traditional email spam. He also revealed how Twitter’s developers went about creating BotMaker, and provides a simplistic look at its architecture.
Why spam loves Twitter
There’s a good reason why Twitter is so vulnerable to spam – it’s wide-ranging APIs, which are designed to let developers easily interact with the site, means that spammers “know (almost) everything” there is to know about how it functions. As a result, it’s proven very easy to create and distribute spam, and very difficult to deploy countermeasures against it.
Twitter’s real-time nature presents another problem too, because it means countermeasures that are deployed do not add to the latency of the user’s overall experience.
Keeping in mind these challenges, Twitter’s spam fighters needed to design a system that would do three things – prevent spam from being created; reduce the amount of time spam is visible; and reduce the reaction time to new spam attacks. At the same time, Twitter had to ensure that no one was able to tamper or bypass its system, and that it didn’t lead to more latency.
BotMaker to the rescue!
Such a complex challenge requires an even more complex system, and BotMaker was devised in three parts. “Scarecrow” is a low-latency subsystem designed to check for spam in the write path of Twitter’s main processes (tweets, retweets, favorites, messages and so on) in real-time. Meanwhile, “Sniper” is described as a “computationally-intense and learning sub-system” that checks in “near real-time” the user and content event logs of Scarecrow.
Finally there’s BotMaker itself, which is constantly being fed data from Scarecrow and Sniper. It’s job is to issue one of three commands to the write path (accept, challenge or deny), and also to the actioner (delete message, reset password, suspend), to cut out much of the spam. In addition to these efforts, Twitter runs periodic checks on all of the data BotMaker compiles to try and sniff out more spam and dodgy accounts.
Image credit: Twitter blog
The end result is an anti-spam system with a low-latency filter that’s capable of cleaning up spam with high-latency processes. It’s also capable of machine learning, which means it can adapt to get better as time goes by.
BotMaker’s rule language and data structures were built in a way that allows for rapid development, testing and deployment of system wide code changes. This allows BotMaker to quickly iterate and refine its rules and models in the evolving fight against spam.
“Spam evolves constantly,” wrote Jeyaraman. “Spammers respond to the system defenses and the cycle never stops. In order to be effective, we have to be able to collect data, and evaluate and deploy rules and models quickly.”
Jeyaraman explained that this was achieved by making BotMaker language typw safe, all functions pure and all data structures immutable, while ensuring the runtime supports common functional programming idioms.
photo credit: Tinkerbots via photopin cc
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU