Synopsis: We have always used information to make decisions, but there is a radical change taking place. Society is going from a constant shortage of data to a surfeit of information – and this upends everything.
For centuries, we have only collected and crunched a sliver of information because of the cost and complexity of processing larger amounts. We relied on data of the cleanest, highest quality possible, since we only tapped a little of it. And we tried to uncover the reasons behind how the world worked, to generalize. Yet all this was actually a function of a small-data world, when we never had enough information. Change that, and a lot of other things need to change as well.
Think of a car engine. Breakdowns rarely ever happen all at once. Instead, one hears strange noises or the driving “feels funny” a few days in advance. Many vehicles are fitted with sensors that can measure the heat and vibration from the engine. By capturing it in data, one can know what a healthy engine’s “data signature” looks like, as well as how it changes prior to a breakdown. That way, one can identify when a part is about to fail before it actually breaks. The car can alert the driver to visit a service station to get it repaired, as if it is clairvoyant. But we needed lots of data, needed to accept messy data, and had to give up knowing why the engine was about to break for the practical knowledge that it was, without a cause.
That’s big data. It ushers in three big shifts: more, messy and correlations (the book’s chapters 2, 3 and 4). First, more. We can finally harness a vast quantity of information, and in some cases, we can analyze all the data about a phenomenon. This lets us drill down into the details we could never see before. Second, messy. When we harness more data, we can shed our preference for data that’s only of the best calibre, and let in some imperfections. The benefits of using more data outweighs cleaner but less data. Third, correlations. Instead of trying to uncover causality, the reasons behind things, it is often sufficient to simply uncover practical answers. So if some combinations of aspirin and orange juice puts a deadly disease into remission, it is less important to know what the biological mechanism is than to just drink the potion. For many things, with big data it is faster, cheaper and good enough to learn “what,” not “why.”
A reason that we can do these things is that we have so much more data, and one reason for that is because we are taking more aspects of society and rendering it into a data form (discussed in chapter 5). With so much data around, and the ability to process it, big data is the bedrock of new companies.
The value of data is in its secondary uses, not simply in the primary purpose for which it was initially collected, which is the way we tended to value it in the past (noted in chapter 6). Hence, a big delivery company can reuse data on who sends packages to whom to make economic forecasts. A travel site crunches billions of old flight-price records from airlines, to predict whether a given airfare is a good one, or if the price is likely to increase or decrease. These extraordinary data services require three things: the data, the skills, and a big data mindset (examined in chapter 7). Today, the skills are lacking, few have the mindset even though the data seems abundant. But over time, the skills and creativity will become commonplace — and the most prized part will be the data itself.
Big data also has a dark side (chapter 8). Privacy is harder to protect because the traditional legal and technical mechanisms don’t work well with big data. And a new problem emerges: propensity — penalizing people based on what they are predicted to do, not what the have done. At the same time, there will be an increasing need to stay vigilant so that we don’t fall victim to the “dictatorship of data,” the idea that we shut off our reasoned judgment and endow in the data-driven decisions more than they deserve.
Solutions to these thorny problems (raised in chapter 9) include a fundamental rethink of privacy law and the technology to protect personal information. Also, a new class of professional called the “algorithmist” that will do for the big data age what accountants and auditors did for an era 100 years ago, when the cornucopia of information swamping society was in the form of financial data.
What role is left for humanity? For intuition, experience and acting in defiance of what the data suggests? Big data is set to change not only how we interact with the world, but ourselves.
Published: March, 2013 | ISBN-13: 978-0544002692
[Image credit: http://sme-blog.com/files/2013/04/big-data-a-revolution-that-will-transform-how-we-live-work-and-think-286×440.jpg]