What is a Data Scientist? An In-Depth Look

Right now, there’s a lot of talk about data scientistsWhat do they do? How do I become one? Do I need one for my company?

All are valid questions since the position has recently gained momentum alongside the big data boom. According to RJ Metrics’ report The State of Data Science (which looked only at profiles on LinkedIn), at least 52% of data scientists were hired within the last four years. In 2014 alone, there were between 140,000 and 190,000 data science positions available.

What is a Data Scientist

Why the sudden demand?

It’s because of the data-driven technique businesses have adopted to remain competitive. The big data boom has left companies drowning in more data than they know what to do with, and they need someone to make sense of it all. That person is the modern data scientist.

How do you know when your company needs one? An in-depth look at the position may help you decide.

Defining the title

Google “What is a data scientist,” and an array of answers will turn up.

For example:

Anjul Bhrambhri, IBM’s VP of big data projects said, “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to the organization.”

Josh Wills, the data scientist at office communication software vendors Slack tweeted, “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”

Fortune.com called it a “loosey-goosey term” where “practitioners are expected to know statistical analysis, predictive modeling and programming.”

The head of Amadeus Travel Intelligence in Madrid, Pascal Clement, kept his statement simple yet profound with, “Data scientists are the new superheroes.

The problem with the title “data scientist” is that it’s growing so quickly, no one has had the time to define the nitty gritty details of the position.

How it started

According to Forbes, the term “data scientist” was coined by Jeffrey Hammerbacher (founder and chief scientist at Cloudera) and DJ Patil (the first chief data Scientist at the White House). Both are better known for their roles in creating the original data science teams at Facebook (led by Hemmerbacher) and LinkedIn (led by Patil).

The RJ Metrics report states that Hammerbacher is more or less responsible for the “rise of the modern data scientist.”

In Information Platforms and the Rise of the Data Scientist, Hammerbacher explains his methodology in creating the team and how the overlap of responsibilities brought about the role of data scientists:

“People on the Facebook team were originally given one of two titles: data analyst or research scientist. This was primarily based on academic background: if you had a Ph.D., then you were a research scientist … The work at Facebook called for a mash-up of skills combining computer science, business, social science, statistics and more. It was out of the need to accomplish this growing “multitude of tasks” that the role of the data scientist at Facebook was born.”

From that point on, the data scientist position became valued in all types of businesses that crunch large sets of data. Though the actual role of a data scientist is still a matter of debate.

The bottom line

Despite Clement’s comment, capes and masks haven’t become the standard for data scientists.

However, there are specific skills employers are looking for. When RJ Metrics looked at LinkedIn, five technical skills were listed most often:

  • Data analysis
  • R
  • Python
  • Data mining
  • Machine learning

But when it comes to being data scientists, technical skills are only a portion of what’s needed. They have to be effective managers and communicators, as well as data crunchers. Specializing in the placement of quantitative professionals, Burtch Works identified the following “soft skills” data scientists should have:

  • Intellectual curiosity – Data scientists are naturally interested in discovery. Instead of being daunted when you hand them a mountain of unstructured data, they’re instinctively driven to find a truth within it. An important skill one needs to develop in this position is knowing what questions to ask and how to go about answering them.
  • Business insight – The top data scientists will be driven by their own, natural curiosity. But they will need the business knowledge to prioritize tasks and apply their findings to the company through actionable decisions.
  • Communication – Good data scientists will be able to clearly discuss their methods and insights. Being able to do so visually with a creative flair has become the “bonus” skill most employers look for.

Just as Hammerbacher’s team at Facebook learned, data scientists have to wear an array of hats. This isn’t to say you can start getting rid of your analysts or engineers. A data scientist isn’t a “jack-of-all-trades.” But the person must have a good grasp on a multitude of skills to work with various teams and pinpoint actionable insights.

What’s the difference?

Without a concrete definition, the lines between a data scientist and data analyst are becoming blurred.

The co-founders of Leada (a programming and data science learning company) and authors of the Statsguys blog, Brian Liou and Tristan Tao, talked with some of the leading data scientists and data analysts at companies like LinkedIn, Facebook and Yelp to clear some of the confusion.

Abraham Cabangnang, a data scientist at LinkedIn took a stab at defining the two roles by referencing his own experience:

“It’s definitely a gray area. At [my] previous company I did both analyst and scientist jobs and as an analyst we were more customer facing; the tasks we did were directly related to the tangible business – what the customers wanted/requested. It was very directed. The scientist role is a little more free form. The first thing I did as a data scientist is work on building out internal dashboards, basically surfacing information that we were tracking on the back end, but weren’t being used by the data analysts for any reasons; for example, we might have lacked the infrastructure to display it, or the data was just not very well processed. It really wasn’t anything tailored out from a customer need, but came from what I noticed the analyst team needed in order to do their job.”

Josh Wills talked more about the progression from data analyst to data scientist.

As an analyst he said that the amount and structure of data was not “too extreme or unusual.” Therefore, the algorithms he was running were linear. He went on to say, “I think the transition into data scientist happens when the data cleansing process or volume of the data becomes so extreme that you need to worry about the computational complexity of your algorithms in order to get an answer in a reasonable time frame.”

Peter Harrington, HG dData’s chief data scientist, tried to make the line between the two roles a little clearer:

A data analyst doesn’t know how to code, but instead is expected to be proficient in industry tools such as Excel or if you, say, work in Finance, a Bloomberg terminal. A data scientist definitely has a much higher understanding of computer science and is expected to develop tools on their own or put to use some non-standard tools for the product’s needs or company’s needs.”

The bottom line is, a data scientist is a more robust form of a data analyst. Both bring a data-driven perspective to the table and both need to clearly communicate their methods to non-data professionals. The difference lies in the way they reach their answers, the amount of data they need to work through, and the tools they use on a regular basis.

Do I need school to be one?

In its report mentioned earlier, RJ Metrics found that only 12% of self-identified data scientists on LinkedIn don’t have a degree.

Beyond that, it also reports that:

  • 79% have graduate degrees
  • 42% have a Masters
  • 38% have Ph.D.s

The ideal data scientist candidate has a background in both computer science and statistics. In addition other popular fields of study for data scientists include the graduate level of science, technology, engineering and mathematics (STEM) and physics.

In Liou’s and Tao’s The Data Analytics Handbook: Data Analysts + Data Scientists, Harrington states, “Companies right now are asking for Ph.D.s because up to now, these techniques for applying machine learning and data mining in industry haven’t been well defined, but I think a lot of the techniques are becoming more standard and accessible for public use.”

But is the degree absolutely needed to be a successful data scientist?

Frank Lo, the Head of Data Science at Wayfair thinks that an advanced degree should be considered last:

“Some Ph.D. candidates have very well-rounded skills and do become top performers. Though, I’ve found that many others find themselves mentally stuck far down the academic rabbit hole, and have difficulty translating their focused depth into value in a business environment.”

Edwin Chen, who has held many positions including ad quality at Twitter, quantitative analysis at Google, and data science at Dropbox, supports the self-learning path. In the article 5 Things You Should Know Before Getting a Degree in Data Science, he mentions that the skills of a data scientist aren’t learned in school:

“I studied math, computer science, and linguistics in school, and did a lot of research in natural language processing, so I had some background from there. But in terms of most of the stuff I apply day to day – machine learning, ads, recommendations, data munging, statistical analysis, etc. – I picked those up while I was working.”

However, Linda Burtch, the Managing Director of Burtch Works, puts an emphasis on education in her article The Must-Have Skills You Need to Become a Data Scientist. According to Burtch, “While there are notable exceptions, a very strong educational background is usually required to develop the depth of knowledge necessary to be a data scientist.”

In the same article as Chen, Mark Madsen, the president of analytics and information management consulting company Third Nature Inc., refers back to the soft skills of a data scientist as the “make-its or break-its” for the position:

“The part that really separates people who are successful from [those who] are not is just a core curiosity and desire to answer questions that people have – to solve problems. Don’t do it because you think you’ll make a lot of money, chances are by the time you’re trained, you either don’t know the right stuff or there’s a hundred other people competing for the same position, so the only thing that’s going to stand out is whether you really like what you’re doing.”

Selecting The Right BI Vendor:
The Ultimate Guide

Choosing a BI vendor is all about finding the right fit. Our exclusive report will walk you through the process and help you select the perfect solution.Download Now

Speak Your Mind