Big Data is all the rage, but apparently will come to a crashing halt due to a shortage of data scientists. As I've argued elsewhere, this is mostly a sham. Context is critical for making use of a company's data, and the people with context already work for the enterprise. So it becomes a matter of training the people one has, rather than going off on a scouting trip for the mythical data scientist.
Nor will the "science" of Big Data remain such for long, according to IBM's James Kobielus. As he notes, "core data scientist aptitudes -- curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature -- are widely distributed throughout workforces everywhere." He then points to a few key trends that will make data science less of a science:
- As more data discovery, acquisition, preparation, and modeling functions are automated through better tools, today's data scientists will have more time for the core of their jobs: statistical analysis, modeling, and interaction exploration.
- Data scientists are developing fewer models from scratch. That's because more and more big data projects run on application-embedded analytic models integrated into commercial solutions....
- Open source communities and tools will greatly expand the pool of knowledgeable, empowered data scientists at your disposal, either as employees or partners.
This jibes with Cloudera CEO Mike Olson's contention that "There will be enormous Hadoop adoption, but you'll get it by virtue of the applications you run."
But whether an organization interprets its data through applications or directly using open-source technologies, one thing that remains true in all this: people are critical to making sense of Big Data. The data won't speak for itself. It's therefore critical to find people inside one's organization who can help make sense of the organization's data. The good news? They're already available and on the payroll.