Data science interviews

The aim of this guide is to provide the “delta” between a software engineering interview and a data science interview, for a mathematically-oriented researcher. This guide is suitable for researchers well-versed in some combination of statistics, probability, machine learning, optimization, and related areas.

If you don’t already have software engineering interview skills, I recommend looking additionally at guides for building those skills. Almost all of those skills (writing code live, designing algorithms, etc.) and good interviewing practices (asking good questions, showing interest, communicating well) still apply here, except there will be fewer computer systems questions and less emphasis on best practices in software engineering.

Anyway, here is how I prepared for my data science interviews.

Step 0: what is data science?

Learn your audience. It’s good for context to know what exactly people mean when they are talking about data science. it’s sufficient to read the top articles from google, and I’ve provided some here.

Step 1: imagine the problems that the company will ask you

What are the questions that the company would like to know from their data? What kind of data do they have? How can the data be related to the company’s mission / bottom line / target audience? How can their data affect their day-to-day operations? How can it affect their longer-term business decisions? What kinds of methods would you use to solve these problems? What kinds of visualizations would be appropriate? How do some of these questions then translate into products or features?

Step 2: pseudo-structured wandering through wikipedia++

The goal here is primarily to familiarize oneself with the terminology of data mining, with an emphasis on the high-level concepts and what people tend to use in practice. I’ve listed a bunch of topics I reviewed below, and the goal is to develop the following for each area:

Have an intuitive understanding (and way to explain) the concept / technique / method
Be able to write the (main) relevant equations; know the key properties
Have an example at hand to demonstrate usage/understanding
Understand when it is used and why; know related tools and when it is better to use one vs the other; what are considered “good” values and how to determine/assess/validate them

Topics

Data mining: [overview of techniques, data dredging]
Logistic regression: [main article, R^2, covariance, multinomial logit model, the minute details]
Hypothesis testing [p-value, type I & type II errors, Wald test, likelihood ratio test]
Data centering: [src]
Time series analysis: [main article, stochastic processes, ARMA models]
Fourier analysis, fourier transform, FFT, Nyquist: [aka spectrum analysis, power of a signal]
Probability theory: [CLT, random variables, convergence of RV, more convergence]
SVMs [derivation, perceptron]
Decision trees: [random forests, bootstrap, bootstrap aggregation]
Collaborative filtering
Probability problems
PCA/SVD
Stochastic gradient descent
Data visualization
Inverse covariance
Taylor expansion / Mclauren’s series
HMMs
Geospatial prediction/analysis methods

Thesaurus

Learning rate (machine learning) == step size (convex optimization)
Spectrum analysis == frequency domain analysis == spectral density estimation
Predictive analytics == predictive modeling and forecasting
Multinomial logistic regression == softmax regression == multinomial logit == maximum entropy (MaxEnt) classifier == conditional maximum entropy model

Step 3: okay, what do other people do for interview prep?

Additionally, there is an explosion of data science books (e.g. this or this) and blogs that I’m sure are also very useful for data science interviews.

If you are a grad student in a technical field, leave a comment with your interview preparation techniques!

cathy wu

beyond low hanging fruit

Data science interviews

Step 0: what is data science?

Step 1: imagine the problems that the company will ask you

Step 2: pseudo-structured wandering through wikipedia++

Topics

Thesaurus

Step 3: okay, what do other people do for interview prep?

Like this:

Related

Leave a Reply Cancel reply

Step 0: what is data science?

Step 1: imagine the problems that the company will ask you

Step 2: pseudo-structured wandering through wikipedia++

Topics

Thesaurus

Step 3: okay, what do other people do for interview prep?

Share this:

Like this:

Related

Leave a Reply Cancel reply