A Fully Bayesian Approach to the Job Search

The job search is a long and random process. So long that my statistical skills may need some sharpening! To put them to use, I developed a fully Bayesian model to estimate how long it will take me to get a job!

Model

I am estimating how many applications it will take before I get a job using a Bayesian model. With some simplification, there are two major steps to the application process: (1) getting a first-round interview, and (2) getting a job offer after the interview. There may actually be many interviews between the first-round interview and landing an offer, but I will likely not get enough data in those intervening steps to build a worthwhile model.

For this model, I am considering each application submitted to be a Bernoulli trial where a success is getting a first-round interview. I'm going to assume this is a fixed probability \(P(\text{First-round Interview})= p_1\). Then, if there is a success in this first trial, there is a second Bernoulli trial where a success is getting a job offer. Again I'm going to assume a fixed, but different parameter for this conditional probability of success, which I will call \(P(\text{Job Offer}|\text{First-round Interview}) = p_2\).

Note that I have data for both Bernoulli steps. For the first step, I have \(n_1=9\) trials and \(y_1=1\) success. For the second step, I have \(y_1=1\) trial (this is the number of first-round interviews I've had - we could also call this \(n_2\)) and \(y_2=0\) successes.

Honestly, I have no idea what my success probability is. So I chose a uniform prior for \(p_1\) on [0,1], which leads to a neat posterior. The posterior distribution of \(p_1|y_1, n_1\) is \(Beta(y_1+1,n_1-y_1+1)\) where \(n_1\) is the number of applications submitted. The same holds for \(p_2\) with a uniform prior; the posterior for \(p_2|y_1, y_2\) is \(Beta(y_2+1,y_1-y_2+1)\).

Posterior Distributions

Let's visualize these densities.

img1.png

Nothing crazy here. The median of the posterior for \(p_1\) (chance of getting a first-round interview) is 0.16, with a 90% credible interval of 0.04 to 0.39. So the uniform prior is nudging that upwards from the \(\hat{p}\) we would get from \(\frac{y_1}{n_1}\).

The median of the posterior for \(p_2\) (chance of getting a job offer after first-round interview) is 0.29, which is not really saying much since our 90% credible interval is 0.03 to 0.78. We just don't have much data, which is doubly sad :( The prior in this case is giving me the benefit of the doubt, saying that I have a roughly one-quarter chance of getting a job offer even though I've received no job offers!

So how many jobs do I need to apply for?

Let's return back to the original question. We have \(P(\text{First-round Interview})= p_1\) and \(P(\text{Job Offer}|\text{First-round Interview}) = p_2\). Then \(P(\text{First-round Interview & Job Offer}) = p_1*p_2\). Let's call this joint probability \(q\). Now we can conceive of a set of Bernoulli trials with this joint probability. The expected number of trials required to achieve one success comes from the geometric distribution and is \(\frac{1}{q}\).

Okay… so how does this help me? Well, I can simply simulate draws from my known posteriors and then combine the draws to estimate the posterior distribution of \(\frac{1}{q}\), which is the number of applications required to land a job! Here is the posterior for \(q|y_1,n_1,y_2\):

img2.png

The median of the posterior is 25.11 with a 90% credible interval of 5.22 to 383.9. Note that due to the memoryless property of the geometric distribution, we should interpret those as additional applications, i.e., my best estimate is 25 additional applications. But take a look at that upper bound. Yikes! So the job search may take me another week to ... another year. Better get back to the job search!