US gov't report: Data mining is ineffective
Published: 08 Oct 2008 14:59 BST
The most extensive US government report to date on whether terrorists can be identified through data mining has concluded that it does not work well.
A US National Research Council (NRC) report, years in the making and scheduled to be released on Tuesday, concludes that automated identification of terrorists through data mining or any other mechanism "is neither feasible as an objective nor desirable as a goal of technology development efforts". Inevitable false positives will result in "ordinary, law-abiding citizens and businesses" being incorrectly flagged as suspects.
The 352-page report, called Protecting Individual Privacy in the Struggle Against Terrorists, amounts to at least a partial repudiation of the Defense Department's data-mining programme called Total Information Awareness, which was limited by Congress in 2003.
But the ambition of the report's authors is far broader than just revisiting the problems of the TIA programme and its successors. Instead, they aim to produce a scholarly evaluation of the current technologies that exist for data mining, their effectiveness and how government agencies should use them to limit false positives.
Read this
Corporate espionage: Not if, but when
When it comes to business-to-business theft of information, experts agree it's best to assume it will happen to your company
The report was written by a committee whose members include William Perry, a professor at Stanford University; Charles Vest, the former president of MIT; W Earl Boebert, a retired senior scientist at Sandia National Laboratories; Cynthia Dwork of Microsoft Research; R. Gil Kerlikowske, Seattle's police chief; and Daryl Pregibon, a research scientist at Google.
The committee admitted that far more Americans live their lives online using everything from VoIP phones to Facebook to RFID tags in cars than a decade ago, and that the databases created by those activities are tempting targets for federal agencies. The report also drew a distinction between subject-based data mining (starting with one individual and looking for connections) compared with pattern-based data mining (looking for anomalous activities that could show illegal activities).
But the authors concluded the type of data mining government bureaucrats would like to do can't work. "If it were possible to automatically find the digital tracks of terrorists and automatically monitor only the communications of terrorists, public policy choices in this domain would be much simpler. But it is not possible to do so."
The recommendations of the report can be summarised as follows:
- US government agencies should be required to follow a systematic process to evaluate the effectiveness, lawfulness and consistency with US values of every information-based programme, whether classified or unclassified, for detecting and countering terrorists before it can be deployed, and periodically thereafter.
- Periodically, after a programme has been operationally deployed, and in particular before a programme enters a new phase in its lifecycle, policy-makers should [carefully review] the programme before allowing it to continue operations or to proceed to the next phase.
- To protect the privacy of innocent people, the research and development of any information-based counterterrorism programme should be conducted with synthetic population data... At all stages of a phased deployment, data about individuals should be rigorously subjected to the full safeguards of the framework.
- Any information-based counterterrorism programme of the US government should be subjected to robust, independent oversight of the operations of that programme, a part of which would entail a practice of using the same data-mining technologies to "mine the miners and track the trackers".
- Counterterrorism programs should provide meaningful redress to any individuals inappropriately harmed by their operation.
- The US government should periodically review the nation's laws, policies and procedures that protect individuals' private information for relevance and effectiveness in light of changing technologies and circumstances. In particular, Congress should re-examine existing law to consider how privacy should be protected in the context of information-based programmes (for example, data mining) for counterterrorism.
By itself, this is merely a report with non-binding recommendations that Congress and the executive branch could ignore. But NRC reports tend to represent a working consensus of technologists and lawyers.
The great encryption debate of the 1990s was one such example: the NRC's so-called Crisis report on encryption in 1996 concluded export controls that treated software such as web browsers and PGP as munitions were a failure and should be relaxed. That eventually happened two years later.












