BigData, Hadoop, and the Impending Informationpocalypse

“Information is not knowledge” – Albert Einstein

I recently read a couple of posts about BigData from my friend Chris Hoff – “Infosec Fail: The Problem With BigData is Little Data” and “More on Security and BigData…Where Data Analytics and Security Collide”

In these posts Hoff posits that the mass centralization of information will benefit the industry and that monitoring tools will experience a boon, especially those that leverage a cloud-computing architecture…

This will bring about a resurgence of DLP and monitoring tools using a variety of deployment methodologies via virtualization and cloud that was at first seen as a hinderance but will now be an incredible boon.

As Big Data and the databases/datastores it lives in interact with then proliferation of PaaS and SaaS offers, we have an opportunity to explore better ways of dealing with these problems — this is the benefit of mass centralization of information.

Hoff then goes on to describe how new data warehousing and analytics technologies, such as Hadoop, would positively impact the industry…

Even when we do start to be able to integrate and correlate event, configuration, vulnerability or logging data, it’s very IT-centric.  It’s very INFRASTRUCTURE-centric.  It doesn’t really include much value about the actual information in use/transit or the implication of how it’s being consumed or related to.

This is where using Big Data and collective pools of sourced “puddles” as part of a larger data “lake” and then mining it using toolsets such as Hadoop come into play…

Continue reading

Amazon AWS, Google App Engine, Microsoft Azure, and More – Part 1: Can We Secure The Cloud?

Cloud computing, or as I like to call it the return of the mainframe and thin-client computing architecture – only cloudier, has been creating a lot of interesting discussion throughout IT recently.

Cloud computing, which we will define as any service or set of services delivered through the Internet (Cloud) without requiring additional infrastructure on the part of the organization. Although a broad definition it encompasses everything from storage and capacity services to applications like CRM or email to development platforms and everything in between that is delivered and accessed through the Internet (Cloud).

Obviously the concept of ubiquitous broadband connectivity combined with a highly mobile workforce enabled to productivity, independent of location and with the promise of limited, if any, additional infrastructural costs, offers new levels of efficiencies for many organizations looking to leverage and extend their shrinking IT budgets.

There is little doubt that cloud computing offers benefits in how organizations look to drive greater benefit from their IT dollars, but there are also many trade-offs that can dramatically reduce, and negate the benefits altogether, understanding these trade-offs will allow an organization to make the right decisions.

As with most advancements in computing, security is generally an afterthought, bolted on once the pain is great enough to elicit the medication. Sort of like the back pain of IT, security enhancements tend to result once the agility (availability, reliability, etc) is somehow inhibited or because it is prescribed as a result of a Doctors visit (compliance audit) cloud computing is no different.

But before we can understand the strengths or inadequacies of cloud computing security models we need to have an understanding of baseline security principles that all organizations face, this will allow us to draw parallels and define what is and isn’t an acceptable level of risk.

Again for the sake of brevity I will keep this high-level, but it really comes down to two main concepts; visibility and control. All security mechanisms are an exercise in trying to gain better visibility or to implement better controls all balanced against the demands of the business. for the most part the majority of organizations struggle with even the most basic of security demands. For example visibility into the computing infrastructure itself;

  • How many assets do you own? How many are actively connected to the network right now? How many do you actively manage? Are they configured according to corporate policy? Are they up to date with the appropriate security controls? Are they running licensed applications? Are they functioning to acceptable levels? How do you know?
  • How about the networking infrastructure? databases? application servers? web servers? Are they all configured properly? Who has access to them? Have they been compromised? Are they secure to the universe of known external threats? How do you know?
  • Do internal applications follow standard secure development processes? Do they provide sufficient auditing capabilities? Do they export this data in a format that can be easily consumed by the security team? Can access/authentication anomalies be easily identified? How do you know?
  • What happens when we an FTE is no longer allowed access to certain services/applications? Are they able to access them even after they have been terminated? Do they try? Are they successful? How do you know?

These are all pretty basic security questions and it is only a small subset of issues IT is concerned with, but most organizations cannot answer any one of them, let alone all of them, without significant improvement to their current processes. It is fair to say that the majority of organizations lack adequate visibility into their computing infrastructures.

Of course the lack of visibility doesn’t imply a lack of control;

  • Are assets that are not actively managed blocked from accessing corporate services? Are they blocked from accessing internal applications? Based on what criteria – lack of policy adherence? How granular is the control? And if you lack visibility how can you be sure the control is working?
  • What controls have you implemented to prevent external access to internal resources? Does this apply to mobile/remote employees? How long after an employee is released does it take to remove access to all corporate resources? What authentication mechanisms are in place to validate the identify of an employee accessing corporate resources? Without visibility how do you know?
  • What controls are in place to ensure the concept of least privilege? What controls are in place to ensure internal applications (web, non-web, or modifications to COTs) adhere to corporate secure coding standards? If you lack visibility how do you know?
  • What controls are in place to ensure that a malicious actor cannot access internal corporate resources if they have stolen the credentials of a legitimate employee? How do you know the controls are adequate?

Again, just a small subset of the controls IT must be concerned with. Like the problem of visibility most organizations are barely able to implement proper controls for some of these, let alone the universe of security controls required in most organizations. Let me state, in case it isn’t obvious, the goal of security isn’t to prevent all bad things from occurring – this is an unachievable goal – the goal of security is to implement the needed visibility and controls that allow them to limit the probability of a successful incident from occurring, and when an incident does occur to quickly limit it’s impact.

So what happens when we move services to the cloud?  When we allow services to be delivered by a third party we lose all control over how they secure and maintain the health of their environment and in many cases we lose all visibility into the controls themselves, that being said…Cloud Computing platforms have the potential to offer adequate security controls, but it will require a level of transparency the providers will most likely not be comfortable providing.

Our current computing paradigm is inherently insecure because for the most part it is built on top of fundamentally insecure platforms, there is some potential for cloud computing to balance these deficiencies, but to date there has been little assurances that it will. Some areas that require transparency and that will become the fulcrum points of a sound cloud computing security model:

  • Infrastructural security controls
  • Transport mechanism and associated controls
  • Authentication and authorization access controls
  • Secure development standards and associated controls
  • Monitoring and auditing capabilities
  • SLA and methods for deploying security updates throughout the infrastructure
  • Transparency across these controls and visibility into how they function on a regular basis

Most organizations struggle with their own internal security models, they are barely able to focus their efforts on a segment of the problem, and in many cases they are ill-equipped to implement the needed security mechanisms to even meet a base level of security controls, for these organizations looking to a 3rd party to provide security controls may prove to be beneficial. For organizations that are considered to be highly efficient in implementing their security programs, are risk adverse, or are under significant regulatory pressures, they will find that cloud computing models eliminate too much visibility to be a viable alternative to deploying their own infrastructure.

I will leave you with one quick story, when I was an analyst with Gartner I presented at a SOA/Web Services/Enterprise Architecture Summit a presentation titled “Security 101 for Web 2.0” the room was overwhelming developers who were trying to understand how to better develop and enable security as part of developing the internal applications they were tasked to develop. The one suggestion that elicited the greatest interest and most questions was a simple one; develop your applications so that they can be easily audited by the security and IT teams once they are in production, enable auditing that can capture access attempts (successful or not), date/time, source IP address, etc…the folks I talked to afterwards told me it was probably the single most important concept for them during the summit – enable visibility.

In part 2 we will take an in-depth look into the security models of various cloud computing platforms, stay tuned for more to come….

Some interesting “Cloud” Resources that you can find in the cloud:

  • Amazon Web Services Blog (here)
  • Google App Engine Blog (here)
  • Microsoft Azure Blog (here)
  • Developer.force.com Blog (here)
  • Gartners Application Architecture, Development and Integration Blog (here)
  • The Daily Cloud Feed (here)
  • Craig Balding – Cloudsecurity.org (here)
  • James Urquhart – The wisdom of Clouds (here)
  • Chris Hoff – Rational Survivability (here)

Cloud Computing – The Good, The Bad, and the Cloudy

And on the second day God said “let there be computing – in the cloud” and he gave unto man cloud computing…on the seventh day man said “hey, uhmm, dude where’s my data?”

There has been much talk lately about the “Cloud“. The promise of information stored in massive virtual data centers that exist in the ethereal world of the Internet, then delivered as data or services to any computing device with connectivity to the “Cloud“. Hoff recently ranted poetic on the “Cloud” (here) and asked the question “How does one patch the Cloud” (here)

So what the hell is the cloud anyway and how is it different from ASPs (application service providers) and MSPs (managed service providers) of yesteryear, the SaaS/PaaS/CaaS (crap as a Service) “vendors” of today and the telepathic, quantum, metaphysical, neural nets of tomorrow?

I am not going to spend any time distinguishing between services offered by, or including the participation of, a 3rd party whether they take the name ASP, SOA, Web services, Web 2.0, SaaS/PaaS, or cloud-computing. For whatever label the ‘topic du jour’ is given, and regardless of the stark differences or subtle nuances between them, the result is the same – an organization acquiesces almost complete visibility and control over some aspect of their information and/or IT infrastructure.

There should be no doubt that the confluence of greater computing standardization, an increasing need for service orientation, advances in virtualization technology, and nearly ubiquitous broad-band connectivity enable radical forms of content and service delivery. The benefits could be revolutionary, the fail could be Biblical.

Most organizations today can barely answer simple questions, such as how many assets do we own? How many do we actively manage and of these how many adhere to corporate policy? So of course it makes sense to look to a 3rd party to assist in creating a foundation for operational maturity and it is assumed that once we turn over accountability to a 3rd party that we significantly reduce cost, improve service levels and experience wildly efficient processes – this is rarely the case, in fact most organizations will find that the lack of transparency creates more questions than they answer and instill a level of mistrust and resentment within the IT team as they have to ask whether the company has performed something as simple as applying a security patch. The “Cloud” isn’t magic, it isn’t built on advanced alien technology or forged in the fires of Mount Doom in Mordor, no it is built on the same crappy stuff that delivers lolcats (here) and The Official Webpage of the Democratic Peoples Republic of Korea (here), that’s right the same DNS, BGP, click-jacking and Microsoft security badness that plague most everybody – well plague most everybody – so how does an IT organization reliably and repeatably gain visibility into a 3rd parties operational processes and current security state? More importantly when we allow services to be delivered by a third party we lose all control over how they secure and maintain the health of their environment and you simply can’t enforce what you can’t control.

In the best case an organization will be able to focus already taxed IT resources on solving tomorrows problems while the problems of today are outsourced, but in the worst case using SaaS or cloud-computing might end up as the digital equivalent of driving drunk through Harlem while wearing a blind fold and waving a confederate flag with $100 bills stapled to it and hoping that “nothing bad happens”. Yes cloud-computing could result in revolutionary benefits or it could result in failures of Biblical proportions, but most likely it will result in incremental improvements to IT service delivery marked by cyclical periods of confusion, pain, disillusionment, and success, just like almost everything else in IT – this is assuming that there is such a thing as the “Cloud

Update: To answer Hoff’s original question “How do we patch the cloud?” the answer is – no different than we patch anything, unfortunately the problem is in the “if and when does one patch the cloud” – which can result in mistmatched priorities between the cloud owners and the cloud users.

Myths, Misconceptions, Half-Truths and Lies about Virtualization

Thanks to VMware you can barely turn around today without someone using the V-word and with every aspect of the English language, and some from ancient Sumeria, now beginning with V it will only get worse. There is no question that virtualization holds a lot of promise for the enterprise, from decreased cost to increased efficiency, but between the ideal and the reality is a chasm of broken promises, mismatched expectations and shady vendors waiting to gobble up your dollars and leave a trail of misery and despair in their wake. To help avoid the landmines I give you the top myths, misconceptions, half-truths and outright lies about virtualization.

Virtualization reduces complexity (I know what server I am. I’m the server, playing a server, disguised as another server)

It seems counter-intuitive that virtualization would introduce management complexity, but the reality is that all the security and systems management requirements currently facing enterprises today do not disappear simply because an OS is a guest within a virtual environment, in fact they increase. Not only does one need to continue to maintain the integrity of the guest OS (configuration, patch, security, application and user management and provisioning), one also needs to maintain the integrity of the virtual layer as well. Problem is this is done through disparate tools managed by FTE’s (full time employees) with disparate skills sets. Organizations also move from a fairly static environment in the physical world, where it takes time to provision a system and deploy the OS and associated applications, to a very dynamic environment in the virtual world where managing guest systems – VMsprawl – becomes an exercise in whack-a-mole. Below are some management capabilities that VMware shared/demoed at VMworld.

  • Vddk (Virtual disk development kit) allows one to apply updates by mounting an offline virtual machine as a file system, and then performing file operations to the mounted file system.  They ignored the fact that file operations are a poor replacement for systems management, such as applying patches.  This method won’t work with windows patch executables, nor will it work with rpm patches which must execute to apply.
  • Offline VDI: The virtual machine can be checked out to a mobile computer in anticipation of a user going on the road and being disconnected from the data center. Unfortunately, data transfers, including the diff’s are very large and one needs to be aware of the impact on the network.
  • Guest API – allows one to inspect the properties of the host environment, but this is limited to the hardware assigned to the virtual machine
  • vCenter – Management framework for viewing and managing a large set of virtual machines across a large set of hardware, a seperate management framework than what IT will use to manage physcial environments.
  • Linked Clones – Among other things, this allows for multiple virtual machine images to serve as a source for a VM instance, however without a link to the parent, clones won’t work.
  • Virtual Machine Proliferation – Since it is so easy to make a snapshot of a machine and to provision a new machine simply by copying another and tweaking a few key parameters (like the computer name), there are tons of machines that get made.  Keeping track of the resulting virtual machines – VMsprawl – is a huge problem.  Additionally disk utilization is often under estimated as the number of these machines and their snapshots grows very quickly.

Want to guess how many start-ups will be knocking on your door to solve one or more of the above management issues?

Virtualization increases security (I’m trying to put tiger balm on these hackers nuts)

Customers that are drawn to virtualization should be aware virtualization adds another layer that needs to be managed and secured. Data starts moving around in ways it never did before as virtual machines are simply files that can be moved wherever.  Static security measures like physical security and network firewalls don’t apply in the same way and need to be augmented with additional security measures, which will increase both cost and complexity.  Network operations, security operations, and IT operations will inherit management of both the physical and the virtual systems so their jobs get more complicated in some ways, and they get simpler in other ways.

Again it would seem counter intuitive that virtualization doesn’t increase security, but the reality is that virtualization adds a level of complexity to organizational security that is marked by new attack vectors in the virtual layer, as well as the lack of security built into virtual environments, which is made even more difficult by the expertise required to secure virtual environments, skills that are sadly lacking in the industry.

The Hoff has written extensively about virtualization security and securing virtual environments (here) – they are different, yet equally complex and hairy – and nowhere will you find a better overall resource to help untangle the Tet offensive of virtualization security or securing virtual environments than from the Hoff.

Virtualization will not require specialization (A nutless monkey could do your job)

What is really interesting about the current state of virtualization technology in the enterprise is the amount of specialization that is required to effectively manage and secure these environments, not only will one need to understand, at least conceptually, the dynamics of systems and security management, but one will also need to understand the technical implementations of the various controls, the use and adminstration of the management tools, and of course follow what is a very dynamic evolution of technology in a rapidly changing market.

Virtualization will save you money today (That’s how you can roll. No more frequent flyer bitch miles for my boy! Oh yeah! Playa….playa!)

Given the current economic climate the CFO is looking for hard dollar savings today. Virtualization has shown itself to provide more efficient use of resources and faster time to value than traditional environments, however the reality is that reaching the promised land requires an initial investment in time, resources, and planning if one is to realize the benefits. Here are some areas that virtualization may provide cost savings and some realities about each of them

  • Infrastructure consolidation – Adding big iron and removing a bunch of smaller machines may look like an exercise in cost-cutting, but remember you still have to buy the big iron, hire consultants to help with the implementation, acquire new licenses, deploy stuff, and of course no one is going to give you money for the machines you no longer use.
  • FTE reduction – Consolidating infrastructure should allow one to realize a reduction in FTE’s right? The problem is that now you need FTE’s with different skills sets, such as how to actually deploy, manage, secure and manage these virtual environments, which now require separate management infrastructures.
  • Decrease in licensing costs – Yes, well, no, depends on if you want to pirate software or not, which is actually easier in virtual environments. With virtual sprawl software asset and license management just jumped the complexity shark.
  • Lower resource consumption – See above references to complexity, security and FTE’s, however one area where virtualization will have immediate impact is in power consumption and support of green IT initiatives, but being green can come at a cost

Virtualization won’t make you rich, teach you how to invest in real-estate, help you lose weight or grow a full head of hair, it won’t make you attractive to the opposite sex, nor will it solve all your problems, it can  improve the efficiency of your operating environment but it requires proper planning, expectation setting and careful deployment. There will be an initial, in some cases substantial, investment of capital, time, and resources, as well as an ongoing effort to manage the environment with new tools and train employees to acquire new skills. Many will turn to consulting companies, systems integrators and service providers that will help them to implement

solutions that generate a quick payback with virtually no risk and position your organization to take advantage of available and emerging real-time infrastructure enablers designed to closely align your business needs with IT resources.

As Les Grossman said in Tropic Thunder “The universe….is talking to us right now. You just gotta listen.”