Big data is a scorching hot topic, currently capturing a lions share of the markets available stock of hyperbole and for good reason, data is growing at a meteoric rate.
As we continue to innovate, as business accelerates technology adoption, as the line bleeds between corporate and personal computing and as we interact more in digital mediums we are creating mountains of data. Much of this data is garbage, but some of it is gold (big-data-are-you-creating-a-garbage-dump-or-mountains-of-gold).
Unfortunately with all overly hyped technologies there is a lot of misinformation, failed expectations and the inevitable trough of disillusionment, but that doesn’t mean you have to spend months or years curled up in a fetal position, disillusioned and wondering what went so wrong. With a thoughtful approach you can venture through the murky swamp of your big data and find the insights that provide your company a significant competitive and market advantage.
“Information is not knowledge” – Albert Einstein
I recently read a couple of posts about BigData from my friend Chris Hoff – “Infosec Fail: The Problem With BigData is Little Data” and “More on Security and BigData…Where Data Analytics and Security Collide”
In these posts Hoff posits that the mass centralization of information will benefit the industry and that monitoring tools will experience a boon, especially those that leverage a cloud-computing architecture…
This will bring about a resurgence of DLP and monitoring tools using a variety of deployment methodologies via virtualization and cloud that was at first seen as a hinderance but will now be an incredible boon.
As Big Data and the databases/datastores it lives in interact with then proliferation of PaaS and SaaS offers, we have an opportunity to explore better ways of dealing with these problems — this is the benefit of mass centralization of information.
Hoff then goes on to describe how new data warehousing and analytics technologies, such as Hadoop, would positively impact the industry…
Even when we do start to be able to integrate and correlate event, configuration, vulnerability or logging data, it’s very IT-centric. It’s very INFRASTRUCTURE-centric. It doesn’t really include much value about the actual information in use/transit or the implication of how it’s being consumed or related to.
This is where using Big Data and collective pools of sourced “puddles” as part of a larger data “lake” and then mining it using toolsets such as Hadoop come into play…