Picking Winners In Big Data
Big data solutions are picking up
speed in the IT industry. There’s a Cambrian explosion of interesting
start-ups, and all the database and business intelligence incumbents have moved
to create big data offerings, or rebrand into the new universe.
The key to seeing the value of big
data is understanding that it’s a business problem, not a matter of picking the
right tools and waiting for the magic to happen. That said, technology choices
still need to be made in an increasingly crowded and confused market.
The biggest question for anybody
wanting to invest or adopt technology in this area is how to pick the winner?
Glancing through marketing materials will do little to help you: everybody
claims relevance to big data.
As I am often asked my opinion about
big data companies, I thought I’d share some of the principles I use to help me
think about the industry.
Where’s
the value?
We need to understand where the
actual value is in the data world. For the most part, this value doesn’t lie
purely in the software. Over the past decade we have seen a rising tide of
commoditization of the software stack: from operating system, to relational
databases, to Hadoop itself. In fact, without this, we wouldn’t have the big
data revolution as we know it.
As Hadoop has become a de facto standard, so has the notion of building on
top of it with open source. There is some advantage in software innovation, but
it is momentary. Once something is known to be possible, there are enough smart
programmers out there that reproducing it becomes straightforward. (Because of
this, I gloomily predict no shortage of patent battles in the not-too-distant
future.)
Despite the flux in the software
world, two things about big data remain constant: the need for compute and
storage, and data itself. It’s ultimately to ownership of one or both of these
factors that IT industry value will gravitate.
Compute
and storage
The ever growing need for computing
power and storage bodes well for companies providing the basics. These fall
into two categories: hardware manufacturers, and cloud infrastructure
providers. Not that these two markets are immune to their own fluctuations,
thanks to standardization and commoditization, but fundamentally, getting paid
for use of metal is the name of the game.
In this respect, it’s not too much of
a mystery why storage company EMC has plowed so early and so deep into the big
data world. Neither is it hard to see the reasoning behind Intel creating its
own optimized Hadoop distribution.
Data
Value resting in data is the more
subtle of the two axes of big data success.
Big data is ultimately about the
smart use of data to drive a business. There are two kinds of data: data about
your business, and data external to your business that you can create value
from.
It’s easy to see who might get
success from the latter, external data. We can expect that massive data owners
such as Google, Facebook, Thomson Reuters, Bloomberg will experience ongoing
success for as long as they are able to create product from their data.
Who owns your data, though? The
obvious answer, you, isn’t the only answer. In fact, your data is locked up
inside the platform choices you make, at both the hardware and software level.
If your systems are based on Oracle, Microsoft, you are very unlikely to move
in a hurry. Data likes to stay where it is, and tends to attract more data as
you build systems around it. Production systems are expensive to replace.
So, the vendors of your software
platforms of choice also get long term value from your data. For this reason,
it’s hard to bet against existing enterprise application platform incumbents in
the big data world. Big data, for most of today’s organizations, is an additive
phenomenon, not a challenge to the core of IT.
We’ll see more large platforms
ensuring nobody needs to move away. Examples include SAP adding HANA, in order
to enable their existing customers for big data, or Amazon Web Services’
addition of their data warehouse Redshift, to ensure that the entire data needs
of a company can be met on their platform.
Finally, it would be lunacy to bet
against either Oracle, who have been portentously quiet in the big data world
over the last year, or Microsoft, who in Excel have the world’s most popular
data manipulation environment.
It
is all business as usual?
I’m not saying there is no new
opportunity in the big data industry. What I am saying is that, as an additive
technology, big data is unlikely to enable anybody to challenge Oracle
or Microsoft
for the throne.
There will be change, though, and it
will bring both winners and losers.
The area of value we haven’t yet
looked at yet is the point where data interacts with the actual mechanism of a
business. That’s where it transfers value to you and enables you to leverage
data to get ahead. This breaks down into several areas of opportunity for big
data innovators.
- Domain specific: tools that enable the manipulation and exploitation of data in a way that’s specific to a business segment. We see initial evidence of this market opportunity in the evolution of web and customer analytics products.
- Machine learning: more data means more to understand, and the only way an organization can realistically do this is with the aid of computers. Machine learning helps automate many parts of the data wrangling process. Furthermore, cross-company data sharing can significantly boost the effectiveness of machine-learning, creating the opportunity for companies gaining early market share. A recent example of this is Sift Science, a fraud detection application.
- Tools for exploration: human interaction with data is a requirement that’s hard to abstract away. Tableau has a head start in this market for big data, filling the role of “Microsoft Office for Data”, but the field is ripe for new innovation, especially with the increasing power of graphical capabilities and new device formats.
- Data agility: speed-to-decision is a critical factor in business competitiveness. Any solution that removes laborious steps has an advantage: a particularly problematic area here is data integration, the loading of both internal and external data sources ready for analytics. Most of today’s big data solutions are frankensteined combinations of layers: there’s a great opportunity for vertically integrated solutions that removes needless impedance to data manipulation.
Areas
of risk
What are those riskier big data
options? If a solution isn’t scoring high in the categories above, it’s not
likely to be around for the long run.
The biggest risk is with solutions
that address only a single horizontal part of the data architecture. Time is
against companies in this game. By selling just part of a complete solution,
they’re working against the rising tide of commoditization. Customers will
demand standardization (e.g. Hadoop compatibility) in order to feel safe
adopting such solutions, but that prevents the lock-in that will protect that
business. It’s not Oracle’s relational database that cements their position:
it’s their vertical position up and down the application stack.
Therefore, it’s not surprising to see
the pure Hadoop distribution companies making partnerships, and branching out
into other vertical layers. In the long term, it’s a tough road they’ve chosen.
One company working actively to solve
this problem is DataStax, who have pivoted from being seen as the corporate
backer for the Cassandra NoSQL database—a risky horizontal play—to selling an
integrated stack of Cassandra, Hadoop and Solr, intended as a complete platform
for building enterprise applications. They’re going after some of the platform
business: being involved in the actual use of data to drive business value.
For some start-ups, not having long
term big data viability might be just fine as a strategy. As the bigger
enterprise companies lumber into the arena, they’ll prove handy acquisitions.
But given the crowded space and progressive commoditization, this isn’t a
certain future. And certainly for their customers, the risks are growing.
More controversially, another area at
risk is that of traditional ETL. Or in its broader sense, data integration.
This is a hard, hard, problem. It’s not easy to integrate data retrospectively,
and it’s not all certain that the integration players from the data warehouse
world will be able to translate that success into the big data world. Many
early adopters of Hadoop were motivated by the fact that existing ETL solutions
couldn’t meet their needs.
In the long term, data integration is
likely to be best served by entire new architectures that don’t try to separate
and tame the data in the first place. It’s a lot easier to build truly
integrated data infrastructure as greenfield.
For those who crack the integration
problem, the potential rewards are high. But so is the risk.
Conclusion
In the long term, the additive nature
of big data, combined with inertia, makes it a safe bet that current enterprise
IT incumbents will continue their reign, as long as they move to embrace big
data in their architectures.
There is plenty of opportunity
though, especially in greenfield and cloud scenarios. Expect to see increasing
returns for those who provide integrated solutions and do a good job of
equipping human decision makers.
There’s long term value in metal, and
value in data. About everything else, it’s worth thinking carefully.
Source: Wikipedia./Forbes