Data Diver Disses Terror-Mining
Jeff Jonas is one of the country's leading practitioners of the dark art of data analysis. Casino chiefs and government spooks alike have used his CIA-funded "Non-Obvious Relationship Awareness" software to scour databases for hidden connections.
So you'd think that Jonas would be all into the idea of using these data-mining systems to predict who the next terrorist attacker might be.
Think again. "Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem," he writes in a new study, co-authored with the Cato Institute's Jim Harper. "This use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community." Are you listening, NSA?
Jonas doesn't have a problem cobbling together information on suspects from various databases. It's using these databases to forecast a terrorist's behavior -- think market research, but for Al-Qaeda -- that Jonas hates. "The possible benefits of predictive data mining for finding planning or preparation for terrorism are minimal. The financial costs, wasted effort, and threats to privacy and civil liberties are potentially vast," he writes.
One of the fundamental underpinnings of predictive data mining in the commercial sector is the use of training patterns. Corporations that study consumer behavior have millions of patterns that they can draw upon to profile their typical or ideal consumer. Even when data mining is used to seek out instances of identity and credit card fraud, this relies on models constructed using many thousands of known examples of fraud per year.
Terrorism has no similar indicia. With a relatively small number of attempts every year and only one or two major terrorist incidents every few yearsâeach one distinct in terms of planning and executionâthere are no meaningful patterns that show what behavior indicates planning or preparation for terrorism. Unlike consumersâ shopping habits and financial fraud, terrorism does not occur with enough frequency to enable the creation of valid predictive models. Predictive data mining for the purpose of turning up terrorist planning using all available demographic and transactional data points will produce no better results than the highly sophisticated commercial data mining done today [with results in the low single-digits â ed.]. The one thing predictable about predictive data mining for terrorism is that it would be consistently wrong.
Without patterns to use, one fallback for terrorism data mining is the idea that any anomaly may provide the basis for investigation of terrorism planning. Given a âtypicalâ American pattern of Internet use, phone calling, doctor visits, purchases, travel, reading, and so on, perhaps all outliers merit some level of investigation. This theory is offensive to traditional American freedom, because in the United States everyone can and should be an âoutlierâ in some sense. More concretely, though, using data mining in this way could be worse than searching at random; terrorists could defeat it by acting as normally as possible.
Treating âanomalousâ behavior as suspicious may appear scientific, but, without patterns to look for, the design of a search algorithm based on anomaly is no more likely to turn up terrorists than twisting the end of a kaleidoscope is likely to draw an image of the Mona Lisa.
Civil libertarians and bloggers have talked 'til they're blue in the face about how lame this kind of terror-predicting is. But I don't think I've ever heard a giant of the field, like Jonas, come out against the practice -- at least not on-the-record. Let's hope this is one conversation that the feds are monitoring.
(Big ups: Daou)
UPDATE 11:49 AM: Shane Harris here. Die-hard proponents of pattern-based 'data mining' to catch terrorists will remain unconvinced by Jonas' and Harper's argument. While it's true that data mining in the commercial sector is based upon "training patterns," backers of systems such as Total Information Awareness will say, yes, and that's why data mining for terrorists has to start with hundreds -- maybe thousands -- of known or potential terrorist patterns to look for. A major part of TIA research was the creation of terrorist attack templates through red teaming exercises, in which experts were paid to come up with devious and clandestine plots that a terrorist might conceivably attempt. Their various machinations would, presumably, leave a set of digital footprints -- airline tickets purchased, money wired, hotels paid for, and so on -- and THAT data would be mined for clues.
What's also interesting about this paper is the combination of the authors. Jim Harper is a well-known and articulate activist, and has long since staked out central territory in the security vs. privacy debate. But Jonas has stayed out of politics. Indeed, those who've met him will know that he sticks out like a sore West coast thumb among Washington gear heads, being unafraid to use the word "dude" in formal conversation and happily acknowledging his ignorance of most Beltway insider baseball. But those who know Jonas and have heard him speak about electronic terrorist hunting know that, like his co-author Harper, he has a strong libertarian streak. Maybe Jonas wouldn't put it quite that way -- dude -- but it's there.
Obviously Jef Jonas has not heard of the new kid in town. His name is ARGUS; it's not an acronym, but the mythological, all-seeing Greek god. ARGUS is an open-source data mining tool which focuses on social disruption as an indicator or nefarious activity; but its content engine and Bayesian algorithms can be evolved to focus on any subject, human or otherwise. This tool is in its initial stages, but is being grown into a global capability which currently operates in 23 languages.
Yes, most data mining tools are very limited, especially for an predictive analysis. But ARGUS is winning over everyone who's seen its capability because it drills down below the official [read censored] websites and blogs and captures individual postings which are aching to be read by the rest of the world. ARGUS will be peering into an open source near you...
peace out. Bob
Posted by: Bob Stevens at December 18, 2006 9:58 PM