DCN August 2016

data analysis THE NEED FOR SPEED Partha Sen of Fuzzy Logix believes that near real time data analysis is the new standard for today’s businesses to remain competitive. F or those who hit the ‘snooze’ button on earlier Big Data wakeup calls, consider this your espresso shot of information: Our society creates as much data in two days – approximately five exabytes of data – as all of civilisation did prior to 2003. In light of that statistic, maybe they should change Moore’s law to ‘A Little Moore’s Law’ out of respect. If you want to know where all of this data is coming from, simply look around you: smartphones, tablets, social media, GPS devices, telemetry sensors, smartwatches. All of these devices generate massive amounts of data, to the extent that we can soon expect the amount of data in the world to double every year. Not surprisingly, it’s business that feels the growing pains of Big Data most acutely. Enterprises have become data dependent to make decisions, service customer relationships, foster innovation, improve operational efficiencies and even tell them when to change the lightbulbs. 32 The current challenge: Too much, too late A few examples illustrate the magnitude of today’s data challenge. If an average sized healthcare insurer (with approximately 30 million customers) wishes to improve the outcomes for diabetic patients, they may need to analyse more than 60,000 medical codes across 10 billion claims and factor a separate silo of pharmacy data into the equation. The challenge is no less daunting in other industries. A national retail chain that wants to improve its product replenishment could be looking at sales data from thousands of separate stock keeping units (SKUs) across hundreds or thousands of stores over the last several years – that’s more than 100 billion rows of data. For years, data analysis has been, in a sense, a ‘moving’ experience. Enterprises moved the data that they wanted to analyse from their database onto analytic servers in order to break the analytic work into smaller pieces. Many enterprises, in fact, still do this today. The problem with this approach is several-fold. As data gets bigger – a foregone conclusion today – it can take hours (or even longer) to transfer and stage the data on multiple servers, then return it to the database and reassemble it. For time sensitive analyses, that’s a deal breaker. As a workaround, enterprises often choose to analyse only a subset of their data, but this sort of data sampling leads to less than ideal analytic models and can ultimately create more data confusion by generating multiple versions of the same information. Move the analytics, not the data To solve the current challenges of Big Data, enterprises are turning to a new strategy: in-database analytics. The idea behind this approach can be summed up in one simple concept: Move the analytics, not the data. By bringing the analytics engine into the database and leveraging massively parallel map-reduce technology (popularised by tools like Hadoop), enterprises can perform highly complex analyses directly in the database

DCN August 2016 | Page 32