One of the biggest downfalls of “Big Data” as it stands now is simply how broad, unorganized, and abstract it all seems to be. In his 2019 article “The Exaggerated Promise of So-Called Unbiased Data Mining” Gary Smith highlights this problem, synthesizing it into something he calls the Feynman trap. The Feynman trap, as defined by Smith, is the “ransacking [of] data for patterns without any preconceived idea of what one is looking for” (Smith). I can imagine it’s an easy trap to fall into. How can we, researchers and society at large, leverage the technology we have when there is a near infinite amount of information to comb through?
The answer, in my humble history major opinion, resides in the definition of the Feynman trap itself. As Smith says himself “good research begins with a clear idea of what one is looking for and expects to find [while data mining] just looks for patterns and inevitably finds some” (Smith). So we have a “how”; In order for data mining to truly be useful, we need to be specific and targeted.
A great example of this kind of data mining can be found in a 2014 article titled “Data Mining Reveals How Conspiracy Theories Emerge on Facebook”. In this study, researchers used data mining to analyze how much time users spent engaging with official media news outlets and alternative ones. The results indicated that, of a sample of 1 million Facebook users, the average amount of time spent engaging with mainstream news, official political channels, and alternative sources was approximately the same (MIT Tech Review). I was amazed to find this study from 2014 dealing with a practical application for data mining that would be undoubtedly useful both now and in the future.
Lest we forget that there was most likely explicit meddling by foreign agents in both the 2016 American presidential election in addition to subsequent elections in Europe. Imagine how useful data mining might be in observing and predicting the responses populations might have to this kind of interference. Imagine using data mining to counter or block attempts at digital meddling. This kind of technology could be vital in securing the openness of information and political processes in the 21st century.
I’m sure people high above my pay-grade have already been considering these possibilities. Or maybe not, after all, the study referred to in this blog is from 2014. Nevertheless, the point still stands. If data mining wants to remain credible and useful in the rest of the 21st century, it needs to be able to move beyond its greedy roots. Data can only be useful when given context. What does it matter if Bitcoin goes up if it rains in New England. If data mining does not adapt, it may never escape the Feynman trap.
Sources Accessed:
“Data Mining Reveals How Conspiracy Theories Emerge on Facebook” – MIT Technology Review
“The Exaggerated Promise of So-Called Unbiased Data Mining” – Gary Smith
