data analysis
THE NEED FOR SPEED
Partha Sen of Fuzzy Logix believes that near real time data analysis
is the new standard for today’s businesses to remain competitive.
F
or those who hit the
‘snooze’ button on
earlier Big Data wakeup calls, consider this
your espresso shot of
information: Our society creates
as much data in two days –
approximately five exabytes of data –
as all of civilisation did prior to 2003.
In light of that statistic, maybe they
should change Moore’s law to ‘A
Little Moore’s Law’ out of respect.
If you want to know where all
of this data is coming from, simply
look around you: smartphones,
tablets, social media, GPS devices,
telemetry sensors, smartwatches.
All of these devices generate
massive amounts of data, to the
extent that we can soon expect
the amount of data in the world to
double every year. Not surprisingly,
it’s business that feels the growing
pains of Big Data most acutely.
Enterprises have become data
dependent to make decisions,
service customer relationships,
foster innovation, improve
operational efficiencies and even tell
them when to change the lightbulbs.
32
The current challenge:
Too much, too late
A few examples illustrate the
magnitude of today’s data challenge.
If an average sized healthcare
insurer (with approximately 30
million customers) wishes to
improve the outcomes for diabetic
patients, they may need to analyse
more than 60,000 medical codes
across 10 billion claims and factor
a separate silo of pharmacy data
into the equation. The challenge is
no less daunting in other industries.
A national retail chain that wants to
improve its product replenishment
could be looking at sales data from
thousands of separate stock keeping
units (SKUs) across hundreds or
thousands of stores over the last
several years – that’s more than 100
billion rows of data.
For years, data analysis has been,
in a sense, a ‘moving’ experience.
Enterprises moved the data that they
wanted to analyse from their database
onto analytic servers in order to break
the analytic work into smaller pieces.
Many enterprises, in fact, still do this
today. The problem with this approach
is several-fold. As data gets bigger –
a foregone conclusion today – it can
take hours (or even longer) to transfer
and stage the data on multiple
servers, then return it to the database
and reassemble it. For time sensitive
analyses, that’s a deal breaker. As a
workaround, enterprises often choose
to analyse only a subset of their data,
but this sort of data sampling leads
to less than ideal analytic models
and can ultimately create more data
confusion by generating multiple
versions of the same information.
Move the analytics,
not the data
To solve the current challenges of
Big Data, enterprises are turning
to a new strategy: in-database
analytics. The idea behind this
approach can be summed up in one
simple concept: Move the analytics,
not the data. By bringing the
analytics engine into the database
and leveraging massively parallel
map-reduce technology (popularised
by tools like Hadoop), enterprises
can perform highly complex
analyses directly in the database