Tom Chatfield writes: Massive, inconceivable numbers are commonplace in conversations about computers. The exabyte, a one followed by 18 zeroes worth of bits; the petaflop, one quadrillion calculations performed in a single second. Beneath the surface of our lives churns an ocean of information, from whose depths answers and optimisations ascend like munificent kraken.
This is the much-hyped realm of “big data”: unprecedented quantities of information generated at unprecedented speed, in unprecedented variety.
From particle physics to predictive search and aggregated social media sentiments, we reap its benefits across a broadening gamut of fields. We agonise about over-sharing while the numbers themselves tick upwards. Mostly, though, we fail to address a handful of questions more fundamental even than privacy. What are machines good at; what are they less good at; and when are their answers worse than useless?
Consider cats. As commentators like the American psychologist Gary Marcus have noted, it’s extremely difficult to teach a computer to recognise cats. And that’s not for want of trying. Back in the summer of 2012, Google fed 10 million feline-featuring images (there’s no shortage online) into a massively powerful custom-built system. The hope was that the alchemy of big data would do for images what it has already done for machine translation: that an algorithm could learn from a sufficient number of examples to approximate accurate solutions to the question “what is that?”
Sadly, cats proved trickier than words. Although the system did develop a rough measure of “cattiness”, it struggled with variations in size, positioning, setting and complexity. Once expanded to encompass 20,000 potential categories of object, the identification process managed just 15.8% accuracy: a huge improvement on previous efforts, but hardly a new digital dawn. [Continue reading…]