A fundamental flaw with a US National Security Agency (NSA)’s machine learning program to identify terrorist suspects in Pakistan may have led to thousands of people in that country being wrongly labeled.
A report in Ars Technica says that the SKYNET program – yep, they copied Terminator – uses what’s called an “analytic triage”, using an 80-point analytical score, to calculate an individual’s probability of being a terrorist. The system, which was leaked in documents provided by whistleblower Edward Snowden to The Intercept last year, factors in things like a person’s phone calls, movements, location, social media activity and more to evaluate whether or not that person may have terrorist links.
In theory the system sounds like a good idea, but Ars Technica reports that SKYNET is far from foolproof. While its false-positive rate of 0.008 percent sounds pretty accurate, the NSA is said to have evaluated 55 million citizens from Pakistan’s 182 million population, which leaves room for up to 99,000 people to be potentially mislabeled as a terrorist threat.
Ars Technica says that among those who are confirmed to have been wrongly mislabeled include Ahmad Zaidan, the bureau chief of TV news network Al Jazeera’s Islamabad office. Zaidan was mislabeled due to his habit of regularly traveling to conflict areas to report on terrorism in the country.
SKYNET is similar to business-oriented machine learning algorithms that scour data about people’s traits and behavior and compares these to model profiles to try and find sales leads. Just like regular machine learning systems, SKYNET was trained by being fed with data sets from 100,000 individuals, together with data from seven known terrorists. The system was then challenged to identify one of those terrorists, and apparently was able to do so. However, the problem is that this is still a very small dataset, and SKYNET hasn’t been tested with any new data since then, according to Patrick Ball, a data scientist and director of the Human Rights Data Analysis Group, who examined SKYNET’s leaked slides.
“There are very few ‘known terrorists’ to use to train and test the model,” said Ball. “If they are using the same records to train the model as they are using to test the model, their assessment of the fit is completely bullshit.”
The news is concerning because it’s not clear how SKYNET’s data is used by U.S. authorities. Ars Technica speculates that it could be used to help identify targets for drone strikes, in which case there’s a very clear risk of someone who is wrongly identified as a terrorist being targeted. The U.S. military has carried out “hundreds” of drone strikes in Pakistan since 2004, the Bureau of Investigative Journalism says.