How Many Data Scientists Does It Take to Find the Bug?

Guidance Software

Ideally, zero.
When thinking about corporate security teams, we often conjure up the image of a large group of people with state-of-the-art technology, monitoring end-users’ every action, 24x7 around the clock. The reality is, corporate security teams are often under-staffed and can barely keep up with just reacting to the threats that have already surfaced, let alone looking at all the endpoints in Big Data scale.
And as much as I live and dream Big Data, I cannot deny that without analytics, Big Data is just noise. Regardless of the sources and richness of the data, Big Data in itself does not provide big insights. That said, you would think almost every organization would embark on the journey to Big Data analytics to improve operations and enterprise security. The reality is, the desire to do Big Data analytics is often extinguished by these challenges:
Step 1: Technology selection
Technology selection has never been easy and it is definitely getting more confusing, if not downright difficult. Not that long ago, organizations were convinced that the enterprise data warehouse (EDW) should be the “one place for all your data.” Yet the enterprise data warehouse, which is used to store critical operational data, hardly seems like a proper place for the more “informal” and unstructured data, such as social media data, marketing automation data, machine log data, or BYOD endpoint data. Along with that comes the wide variety of data platforms such as Hadoop, massively-parallel processing (MPP) databases, NoSQL databases, in-memory databases… And as if that is not enough confusion, once the data platforms have been selected, you will also need to select the method to derive intelligence out of these data. (Think SAS, R, MADlib, etc.)
Yet the biggest problem is security teams rarely are in the position to provide inputs and requirements throughout the technology selection process, even though they are the ultimate users.
Step 2: Data collection and preparation
Once the data platform has been decided, organizations will need to aggregate all the data into one place. Data collection and preparation are often the most time-consuming tasks of any analytics project. It is very laborious to collect the vast amount of unstructured data, and requires a deep understanding of the data sources, data models, and metadata to be able to cleanse the data for analysis. Additionally, data engineer resources are scarce and often are not available to the security operations teams.
Step 3: Data science talent recruitment and analytics
When it comes to data science talents, McKinsey Global Institute (MGI) said it the best. In the May 2011 MGI Report, Big data: The next frontier for innovation,competition, and productivity, it stated that “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data.”
Educational institutes are just now creating curricula to satisfy this demand in the market. For corporations today, however, the ability to hire data science talent with a good understanding of math/stats, programming, technology platform, and business requirements is virtually impossible. And only after all this can an organization start to utilize these data scientist resources to derive insights out of Big Data.
Step 4: Operationalization and sharing of insights
When organizations get to this step, they are often quick to congratulate themselves. But remember, non-technical stakeholders usually lack the ability to understand the logic of the Naïve-Bayes model to classify the type of threats or to use logistic regression to score the likelihood of risks, hence the high demand for visual representation into the security posture.
Given the complexity and resources needed for the above process, getting security insights out of the endpoint data becomes a daunting task that organizations are hesitant to embark on. There is desperate need of something automated and simple that exposes suspicious patterns, commonalities, and anomalies through an interactive visual interface, allowing for on-the-fly adjustments to zero in on the threats.
So, how many data scientists does it take to find the bug? With help from Guidance Software, ZERO. Want to know how? Join us at CEIC 2013 in Orlando on May 19th-22nd to find out more. 

No comments :

Post a Comment