whiteboard-gcbb-flickrMost people know what marketing executives do every day. They try to catch people’s attention through email, ads, tweets, and press releases. As for data scientists, well, their work is not nearly as well understood.

That’s been slowly changing this year as companies slowly loosen up about letting their hard-won data scientists talk about their work.

This year, VentureBeat has learned a lot about these fawned-over specimens. But our knowledge isn’t always delivered at once. That’s why we’ve brought together some of the lessons we’ve picked up in 2013:

Data scientists should be creative

This point became clear as Jeremy Howard, the former president of data science competition-holder Kaggle, spoke with fellow luminaries in the field at VentureBeat’s 2013 DataBeat/Data Science Summit event a few weeks ago.

While popular algorithms such as Random Forests often have a hand in helping people win Kaggle competitions, “the best applied real-life data scientists … are extremely creative,” Howard said. They pore over the raw data for the competitions they enter and discern patterns. Howard prefers to just jump into the data without any context about it.

Choose a business problem and then the tools, not the other way around

During his talk at the DataBeat/Data Science Summit, MailChimp data scientist John Foreman poked fun at companies that hear about big data in headlines and then throw money at elaborate, expensive technology before they’ve decided what to do with it.

“This is kind of like when you decide, ‘I’m going to get fit,’” Foreman said, when many spend on workout clothes and a gym membership. “You just buy all the tools first because you get this illusion of making progress. ‘I’ve spent some of my budget. Something must be happening.’”

But companies should first see what data they can work with, and then they should identify business problems that can be solved.

“Only then do you actually start work,” Foreman said. “You choose data techniques and the technologies that solve the problem you’ve identified.”

The process might seem boring, but it could save time and money by circumventing needless “research.”

Low-cost educational resources abound

Universities from Columbia to the University of California, Berkeley have introduced data science programs, and they have their merits. At the same time, the massively open online course (MOOC) sites such as Coursera provide introductions to data science, and at least one more, from Udacity, is on the way (although it won’t be free).

Those wishing to learn more can spot additional high-quality information in other corners of the Internet. The recently formed website DataTau aims to bring together many of these resources and provide a place for data scientists to discuss their work. Rohit Sivaprasad, who started DataTau earlier this month, has picked up data science skills online alongside his academic learnings, and he’s now ranked in 532nd place on Kaggle out of more than 135,000 users. Via

Image via gcbb/Flickr