The big conversation of big data 1 Year, 2 Months ago
I've been meaning to get more familiar with the concept of "big data"... and it seems I'm not alone. Amber Cox from Data Center Knowledge writes: A recent survey shows that only 27 percent of industry leaders surveyed understood what “big data” means. Meanwhile, many of those that understand it say they don’t have adequate tools to manage their data and mine it for business value.
From her post on a recent survey by Echelon One:
... according to Bob West, Founder and CEO of Echelon One... “While big data, cloud needs and compliance requirements are clearly major concerns, the majority of companies are not prepared to deal with any of them adequately,” said West. “It’s fascinating to see the rift, and the overwhelming percentage of companies surveyed are not prepared to manage big data properly, monitor cloud environments effectively, or report network and device activities properly.”
Big data is often characterized by data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes can range from a few dozen terabytes to many petabytes of data in a single data set. Increasingly businesses face the challenges created by the world of huge data sets and the technology needed to mine the secrets they contain and convert them into useful business intelligence.
As it happens, the folks over at O'Reilly Radar have posted a number of takes on big data over the past few days. Brian Ahier was part of a discussion examining issues in health information technology and "made the proposal that big data is the next big thing in health IT": When I talk about "big data" I am referring to a dataset that is too large for a typical database software tool to store, manage, and analyze. Obviously, as technology changes and improves, the size of a dataset that would be qualify as "big data" will change as well...
The proliferation of digital health information, including both clinical and claims information, is creating some very large datasets. This also creates some significant opportunity. For instance, analyzing and synthesizing clinical records and claims data can help identify patients appropriate for inclusion in a particular clinical trial. These new datasets can also help to provide insight into improved clinical decision making. One great example of this is when an analysis of a database of 1.4 million Kaiser Permanente members helped determine that Vioxx, a popular pain reliever that was widely used by arthritis patients, was dangerous. Vioxx was a big moneymaker for Merck, generating about $2.5 billion in yearly sales, and there was quite a battle to get the drug off the market. Only by having the huge dataset available from years of electronic health records, and tools to properly analyze the data, was this possible.
I've included the video from the Dell Think Tank panel below. Also, below, you'll find a link to a press release from Microsoft issued today which rightly acknowledges "big data’s strain on privacy protection, the shifting relationship between government and the Internet, and the evolving threat model all raise new challenges for industry and governments globally." Looks like the beginning of another big discussion.