One Path to Becoming a Data Scientist

Well, I am happy to announce that the job search is finally over and I am officially a “data scientist.” It was a long and circuitous path to getting there, and there is certainly not one path to becoming a data scientist. I’d like to describe my path here for those aspiring to a career in data science.

Estimating Applications

Before that, let’s all bask in the glory that is Bayesian statistics. After nine applications, I constructed a Bayesian model to estimate how many additional job applications I would need to submit. I submitted 11 additional applications, receiving a job offer on the 8th, for a total of 20 applications.

Here is the posterior distribution of additional applications from that model, plus the actual outcome.


What did I take away from this exercise?

  1. Applying to jobs is a random process. Don’t take rejections personally—just keep applying and trying to optimize your chances.

  2. You’ll have a good sense of how many applications you need to submit after 10 to 20 applications.

  3. With a Masters degree from a reputable school (i.e., Tier 2 in US News Rankings), a reasonable lower bound for interview success rate is 10%. If you’re above 20%, then you’re in a good spot, provided you’re not botching interviews.

  4. If your interview success rate is approaching or below 10%, it’s a good time to regroup and take a long look at your resumé.

To elaborate on point 4, midway through the process I completely overhauled my resumé and most importantly, added new projects to it. And yes, this breaks the assumption of constant probability of success in the Bayesian model. But all models are wrong and some are useful, and I’d argue that model was useful.

Applying to jobs is a random process. But from my handful of jobs interviews as well as a few informational interviews, I think there are couple “core competencies” that employers are looking for. In a highly technical position, there are assuredly more competencies, but I think these are common across any positions calling themselves “data science.”

Core Competency #1: Track Record of GTD

Hiring for a technical role, a manager needs to see that you have a track record of getting things done. GTD is an entire productivity framework—I just mean a track record of managing tasks and projects. I can imagine two worst case scenarios for a hiring manager: (1) hiring someone who isn’t proficient in stats and machine learning, and (2) hiring someone who gets lost in the weeds, doesn’t deliver projects on time, and can’t convince people that the model is useful. This is addressing the latter.

This track record can take many forms. I have a background of five years in consulting, using data and research to improve client programs. You can easily talk about managing graduate school research or graduate school teaching. It’s just a matter of articulating your experience.

A major challenge of many data science teams is convincing other elements of the company to adopt their models. Make sure your experience shows you can deliver the models and communicate their value.

Core Competency #2: Applying Models to Projects

Coming straight out of grad school, I could only talk about a few class projects as evidence that I knew how to apply stats and machine learning methods to “real world” problems. Even in grad school, a class project is a two to three week affair (and of course some students do them in a weekend). A data science project can span many months, and employers want to see evidence you are prepared to tackle something of that scale. A thesis demonstrates this competency, but as a non-thesis Masters student, I could not point to a long-term research project.

The solution? Find some interesting problems, and apply a model to them. My MLB attendance model was a big undertaking—not only to wrangle the data, but also to tune the model, interpret the model outputs, and produce the visualizations.  I think that project was the single biggest factor in landing my job.

Core Competency #3: Articulate Your Data Science Strength

Data science can be described as the intersection of statistics, hacking, and subject matter knowledge. Where do you sit within that intersection? I am a strong believer in the importance of understanding statistical theory. It’s why I went back to school, why I chose my program, and what I prioritized in terms of coursework. Some of my colleagues come from an engineering and computer science background, while others come from an applied math background. While we all aspire to be strong in all disciplines of data science, it’s important to plant your flag somewhere and articulate what you believe your greatest strength to be.

Good luck to all the aspiring data scientists out there! There is no single path to the career, and mine is just one example. However, I hope it gives you some nuggets of information. The “no data” problem is no fun!