In cybersecurity, it’s AI vs. AI: Will the good guys or the bad guys win?
Artificial intelligence research group OpenAI last month made the unusual announcement: It had built an AI-powered content creation engine so sophisticated that it wouldn’t release the full model to developers.
Anyone who works in cybersecurity immediately knew why. Phishing emails, which try to trick recipients into clicking malicious links, originated 91 percent of all cyberattacks in 2016, according to a study by Cofense Inc. Combining software bots to scrape personal information from social networks and public databases with such a powerful content generation engine could produce much more persuasive phishing emails that might even mimic a certain person’s writing style, said Nicolas Kseib, lead data scientist at TruSTAR Technology Inc.
The potential result: Cybercriminals could launch phishing attacks much faster and on an unprecedented scale.
That danger neatly sums up the never-ending war that is the state of cybersecurity today, one in which no one can yet answer a central question: Will artificial intelligence provide more help to criminals or to the people trying to stop them? AI is a new weapon that some people believe could finally give security professionals a leg up on their adversaries. At the same time, experts worry that the potential for criminal misuse of AI technologies such as machine learning and deep learning could quickly outstrip the benefits.
The debate hasn’t stopped cybersecurity from co-opting AI as a marketing slogan. That will be evident this week as the industry gathers for the annual RSA Conference in San Francisco, one of the cybersecurity industry’s biggest gatherings of the year. But in the process, they may be creating false expectations about how much the technology can do.
Most technology that’s touted as AI today is little more than classification algorithms built upon a single type of data, said John Omernik, a distinguished technologist at MapR Technologies Inc. and a veteran security officer in the banking industry. “Most AI solutions are typically solving a problem better than humans,” he said. “We haven’t found ways to solve problems humans can’t solve.”
Until that happens, said Rob Westervelt, research director in International Data Corp.’s security practice, “The cat-and-mouse game will continue because attackers will be able to use tools to evade defenses.”
Smarter attacks
In fact, they already are. So far there have been no documented cases of an attack enabled primarily by machine learning, but there’s growing evidence that cybercrime tools are getting smarter. For example, a new version of the Rakhni Trojan that struck last July included a context-aware feature that installed malware best-suited for the systems it infected. Computers with cryptocurrency wallets installed were infected with ransomware, while others were co-opted for cryptocurrency mining. “This is just one small example of where the future of threats can lead,” said Jacob Serpa, a product marketing manager at Bitglass Inc.
Most of the experts contacted by SiliconANGLE said machine learning and deep learning tools could boost the ability of a skills-starved industry to identify intrusions and target responses better as well as to strengthen defenses against access breaches. But they also noted that AI technologies are still in their relative infancy and criminals have yet to exploit their potential fully.
The impact of growing state-sponsored cybercrime as well as the availability of training data for malicious use on the dark web, the shady part of the internet accessible with special software, are among the factors that could tip the balance the other way. What’s important right now is that organizations are starting to understand the potential of AI to boost their defenses while also remaining realistic about limitations.
Looking for patterns
Many security experts scoff at the use of the term “intelligence” to describe technology that today can barely mimic the cognitive capabilities of a two-year-old child. But they agree that fields such as machine learning, which is useful in detecting patterns across very large data sets, and deep learning, which excels in areas such as image and voice recognition, can deliver value to security organizations right now.
For example, Microsoft Corp. researchers are fighting the battle against phishing attacking by using machine learning to spot fraudulent log-in attempts. “We worked with customers to train our ML to be 99.999 percent accurate identifying malicious log-ins,” said Ram Shankar Siva Kumar, whose formal Microsoft title is data cowboy. “Even a single digit margin of error is too much for enterprise customers, who routinely deal with billions of sign-ins.”
Security operations centers struggle with such a high volume of false alerts from intrusion detection systems that one survey found that nearly 32 percent of security pros admit to sometimes ignoring them. Chasing false alarms is wasted time that organizations can ill afford with a shortage of nearly 2 million cybersecurity professionals. Nearly 60 percent of the activity in enterprise SOCs is repetitive, said Koos Lodewijkx, chief information security officer at IBM Corp.: “Detection has accelerated, but response is still at human speed.”
Machine learning algorithms can be trained to spot patterns in log data that have a high likelihood of correlating with malicious activity, enabling cybersecurity personnel to focus their response efforts better. Combing through masses of data that are too large for humans to handle is one area where machines excel, especially when they’re armed with machine learning and deep learning, which enable them to learn without explicit programming.
“Humans are good at finding patterns, but when dealing with the fast types of data you see in detection, it’s impossible to keep attributes and patterns in mind,” said Anna Westelius, senior director of engineering at Arkose Labs. “That’s what large-scale machine learning models allow us to do.”
Microsoft uses machine learning algorithms in a new service called Azure Sentinel that “correlates trillions of activities from different products…to piece together tens of high-fidelity, security-interesting cases,” Kumar said. “Security interesting” cases are those that go beyond the bounds of statistical probability. The result is that Azure Sentinel can improve alert prioritization by 90 percent, giving security pros a short list of likely problems, he said.
But the process isn’t black magic, Kumar emphasized. “It’s a combination of domain knowledge from the security researchers and AI expertise from algorithm designers,” he said. “Machine learning is not a silver bullet and needs human intelligence.” Most experts agree that machine learning will be, at best, an assistant to human operators for the foreseeable future.
Machine learning’s roots in predictive analytics make it useful in that role. Balbix Inc. figures that the average Fortune 1000 company today has more than 260 potential attack vectors, including cloud services, mobile phones, “internet of things” devices and partner accounts. The company’s machine learning-based software predicts which vulnerabilities are most likely to be exploited, enabling security organizations to make better deployment choices.
Pattern detection can also be used to detect anomalies by tying behavior to known user profiles, said TruSTAR’s Kseib. “Your system could detect when you share sensitive information with someone outside of your organization and trigger a set of defensive behaviors such as blocking the action,” he said.
In addition, machine learning can be applied to user and behavioral analytics, a discipline that establishes baselines for user activity and detects departures from the norm. That can help identify and prevent user error, a problem that two-thirds of respondents to an Experian Corp. survey last year identified as the weakest link in their security fabrics.
“Most users don’t know they’ve committed an error, are never informed, never guided on how to correct the error and never learn how to avoid it in the future,” said Scott Totman, vice president of engineering at DivvyCloud Corp. Machine learning can be used to identify inconsistencies and “create a feedback loop that educates the end user.”
Deep learning, the branch of AI that has been responsible for breakthroughs in speech and image recognition in recent years, is becoming a key weapon in cybersecurity as well. It can improve detection of hypertargeted “spear phishing” emails as well as strengthen access controls in areas such as biometric recognition.
For example, Jumio Corp. uses deep learning to combine image and document recognition for user authentication. Its technology compares a selfie photograph taken on the spot with government-issued identification to verify that the user is one and the same. The software can adjust for variations in image quality as well as factors such as user weight change or the addition or loss of facial hair. It can also detect subtle factors that indicate an ID has been tampered with, such as small variations in typeface.
Providers pounce
Companies have been quick to jump on AI terminology, and users are buying in: A recent survey of more than 400 security professionals commissioned by ProtectWise Inc. found that 73 percent of respondents say they’ve implemented security products that incorporate at least some aspect of AI.
However, the relative immaturity of the technology was evident in the 46 percent who said rules creation and implementation are a chore and the one in four who said they don’t plan to implement more AI-enabled products in the future.
For all its promise, there are areas in which AI adds little value or may even create new vulnerabilities. Machine and deep learning work best when the problem domain is well-known, and the variables don’t change very much. Algorithms are good at detecting variations in patterns but not at identifying new ones. “To say you’re going to find the unknown is really tough,” said Tom Clare, senior product marketing manager at Fidelis Cybersecurity Inc., which specializes in threat detection and response.
Changing variables can flummox machine learning algorithms, which is one reason they have so far had limited value in combating malware, the incidence of which has risen fivefold since 2013, according to SafetyDetective.com. Machine learning algorithms “inherently fail because the training set of malware changes too quickly,” said Doug Swanson, chief technology officer at Malwarebytes Corp. “The malware your model will see in the future will end up looking little to nothing like the malware it has seen, and been trained on, in the past.“
AI models rely upon large amounts of high-quality source data for training, which can limit their ability to respond quickly to all but well-known threat patterns. “At the end of the day, [models] can be large and slow,” said Labhesh Patel, Jumio’s chief scientist.
The value of the results is also only as good as the data that’s used for training. That’s why MapR’s Omernik advises organizations to exercise skepticism when providers claim that they have a comprehensive approach to AI for security.
He recalled his experiences working for two different banks, one with a local customer base and another whose customers traveled the world. “People would log in from computers in Africa or Russia and that was normal,” he said, but such activity would send an intrusion detection system that was hardwired to consider location off the rails.
No black boxes
Machine learning also can’t be the black box it often is if it’s to be useful in battling cyberthreats. Models need constant attention to ensure that training data is complete, relevant and, not least, untainted by attackers. The introduction of false or misleading data can cause results to degrade or worse. “If the technology is relied on too heavily, it eventually begins to teach itself,” said Arkose Labs’ Westelius. “Machine learning could retrain itself to think a normal behavior is no longer normal.”
Then there are nuances of how computers work that defy human logic. For example, researchers have demonstrated ways to fool voice assistants such as Amazon.com Inc.’s Alexa and Apple Inc.’s Siri by embedding hidden commands in ordinary human speech and even music. Self-driving cars have been duped into misidentifying road signs by the introduction of small stickers that humans would barely even notice.
The upshot: The same tools that fortify corporate defenses can also be used to breach them in novel ways. Take XEvil, a tool that can be used to decipher the twisted and obscured characters is CAPTCHA codes with up to 90 percent accuracy. It’s a byproduct of the deep learning-based machine vision software that was developed to guide autonomous vehicles. And it can also be used to defeat a common second line of defense in password authentication systems.
Open to all
Like many AI models, XEvil is open source, making it simple for good and bad actors alike to build upon it. For better or for worse, most popular machine learning models have been released to open source, meaning there’s no way to know who’s using them.
That fact triggered an ominous warning from IBM’s Lodewijkx. “The knowledge and skills are democratizing,” he noted. Programming skills have historically been based upon working with databases, but the model-based approach common to machine learning is rapidly going mainstream. “The kids we’re hiring from college have already shifted their programming skills,” he said. “That also impacts the criminal side.”
Machine learning excels at detecting variations in patterns, but that technology can also be used to cover tracks. “Currently, many criminals’ signatures are based on the consistent nature of their activities,” said DivvyCloud’s Totman. “Criminals can use machine learning to randomize their patterns and blend in to avoid detection.”
Bad actors can also use machine learning to get a better understanding of the environments they penetrate and move more quickly toward targets. It took Chinese hackers nine months to steal 23 million records from the U.S. government’s Office of Personnel Management in 2015, noted Robert Ackerman Jr., managing director of Allegis Cyber, a venture capital firm that invests extensively in cybersecurity startups. “One official told me that with AI, that could be compressed to nine hours,” he said.
Anti-malware makers worry about the potential for AI to create self-mutating malware that “could adjust the code of the detected malware, compile it and redeploy it to avoid further detection,” said Adam Kujawa, director of Malwarebytes Labs. “This can happen in the blink of an eye and greatly increase the amount of malware we deal with on a regular basis.”
Fake news at scale
As cybercrime increasingly becomes the domain of professionals and rogue states, the potential misuse of AI grows more troubling yet. The fake-news phenomenon that swirled around the 2016 U.S. presidential election has been elevated by the arrival of “deep-fake” technology, which manipulates image, video and sound files to make it appear that events and actions occurred that never actually happened. So far, applications of deep-faking have been limited to embarrassing celebrities, but there’s no reason the same technology can’t be applied to damage the reputations of executives and political figures.Which leads back to the question of who will benefit more from AI: the white hats or the black hats?
There’s no consensus yet. What nearly everyone does agree on is that the new tools will elevate the stakes in an ongoing arms race. “The attackers will use AI to create better phishing emails and the security companies will get better at detecting them,” said IDC’s Westervelt.
In the same way that cloud computing has made powerful computers and software affordable, the democratization of AI-based security tools could benefit organizations that tend to have weaker defenses, said Mark Weiner, chief marketing officer at Balbix: “I think it’s going to help shore up a lot of less-than-mature organizations more than it will give the bad guys better opportunity.”
Ackerman believes growing attacker sophistication will force organizations to concentrate more on protecting data than preventing penetration. Cybersecurity practices today are “a dike with 1,000 holes in it and we’re running around trying to plug them all,” he said.
There’s some hope. Ackerman said technologies such as homomorphic encryption, which enables data to be processed while it’s in an encrypted state, will gain favor. Others see potential in adversarial machine learning, which pits models against each other in an attempt to fortify defenses.
But all of this activity is taking place against the backdrop of growing complexity brought on by the use of multiple clouds, mobile devices and myriad IoT devices. That doesn’t bode well. Said Westervelt: “The idea that chief information security officers can have granular control over their data, not to mention how people share and use it, gets really complicated.”
And in cybersecurity, complexity is the criminal’s best friend.
Image: TheDigitalArtist/Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU