Recommendations on Columbia Courses for Machine Learning & Statistics

For years I’ve kept an extremely long list of resources, from online and offline, of various forms, for machine learning, statistics, programming, video game production and many more. Since many people asked me what to take at Columbia to advance their career in data science, let me begin with one tiny fraction of that: Great Columbia courses that teach you theory, practice, implementation and thought-process on doing the pro way. If you’re ┬ástarting at Columbia with some basic knowledge of statistics and probability, this is the list of road-to-pro I’ve came up with (most of these I’ve taken or audited, some listed with old course numbers before they swapped to the GR ones):

Ph.D. level courses:

  • Bayesian Data Analysis (Andrew Gelman, STAT G6103) (Applied Bayesian statistics, social studies, hierarchical regression models, PPC, Bayesian inference, Stan, etc.)
  • Computational Statistics (Liam Paninski, STAT G6104)
  • Gaussian Process & Kernel Methods (John Cunningham, STAT G8325) (RKHS, kernels, advanced approximate Bayesian inference, GP, etc.)
  • Bayesian Nonparametrics (John Paisley, ELEN 9801) (Beta process, Poisson process, DP, IBP, etc.)
  • Statistical Communication (Andrew Gelman)
  • Causal Inference (Jose Zubizarreta, DROM B9124)
  • Foundations of Graphical Models (David Blei, STAT G6509)
  • Applied Causality (David Blei, STAT GR8101)
  • Probabilistic Models with Discrete Data (David Blei, COMS 6998)
  • Probability Theory I (Marcel Nutz, STAT GR6301) (Probability, measure, expectations, LLN, CLT, etc.)
  • Probability Theory II (Peter Orbanz, STAT G6106) (Topology, filtrations, measure theory, Martingales, etc.)
  • Probability Theory III (Marcel Nutz, STAT GR6303) (semi-Martingales, stochastic process, Weiner process, SDE, etc.)
  • Statistical Inference II (Sumit Mukherjee, STAT GR6202) (Statistical testing, nonparametric inference, etc.)
  • Statistical Inference III (Zhiliang Ying, STAT GR6203) (Semiparametric inference, etc.)
  • Bayesian Nonparametrics (Peter Orbanz, STAT GR8201) (DP, IBP, GP, PP, random measures, approximate inference)
  • Neural Networks and Deep Learning (Aurel Lazar, COMS 6998)
  • Seminar in Theoretical Statistics (STAT GR9201)
  • High Dimensional Data Analysis (Aleksandr Aravkin & Aurelie Lozano, COMS 6998)
  • Optimization I (Donald Goldfarb, IEOR 6613) (LP, convex optimization, newton methods, quasi-newton methods, etc.)
  • Optimization II (Clifford Stein, IEOR 6614) (Graph theory)

non-Ph.D. level courses:

  • Natural Language Processing (Michael Collins, COMS W4705)
  • Design and Analysis on Sample Surveys (Andrew Gelman, POLS GU4764)
  • Advanced Machine Learning (Tony Jebara, COMS W4772)
  • Advanced Machine Learning (Daniel Hsu, COMS W4772)
  • Nonlinear Optimization (Donald Goldfarb, IEOR E4009)

You might now wonder:

  • Where are the distributed systems courses?
  • Where are the MapReduce courses?
  • Where are the SAAS courses?
  • Where are the low-latency courses?
  • Where are the data visualization courses?
  • Where are the data mining courses?
  • Where are the mathematical finance courses?
  • Where are the data structure courses?
  • Where are the algorithm courses?
  • Where are the learning theory courses?

My answer: Although they take prime roles in modern implementations of machine learning systems, it’s really difficult to get the right (and better) extrapolations from data, and easy to sink and get buried within the implementation of real-world application systems. These aforementioned courses in my list concentrates on how to get your insights deeper via the underlying history of math and statistics development that made how these tools came about, and what limitations they have. Courses for those questions, on the other hand, can easily be found on Columbia’s own data science master’s degree roadmaps, rather standard and gives you more industrial preps as an engineer specified in data science rather than preps you’ll need for complicated real-world analysis on non-trivial problems.

Moved to WordPress!

Always looking for a better publishing solution!

3-pic

I wanted to temporarily go without a personal server and this turned out to be one of the best solutions. To setup webpages with LaTeX render in HTML requires some struggle (not hard, but I am too lazy right now.) I am seriously considering KaTeX + other webpage packages to fire up a highly personalized site some time soon, but more on that later.

Let’s try this:

p(x \mid y) = \frac{p(x, y)}{p(y)}

It works!

Couple of topics to be discussed here soon (details, order and time-schedule TBD):

  • Gumbel softmax trick / Concrete distribution
  • Hamiltonian Monte Carlo (HMC, NUTS, stochastic gradient HMC)
  • Variational inference (VI, SVI, BBVI, ADVI, RSVI, OPVI, VGP, Hamiltonian VI, etc.)
  • Generative adversarial networks (GAN, InfoGAN, Conditional GAN, Wasserstein GAN, DCGAN, ALI, BiGAN, LS-GAN, etc.)
  • Causal inference (potential outcomes, SUTVA, instrumental variables, propensity scores, causal graphs, Bayesian, etc.)
  • Expectation maximization (EM, stochastic EM, Monte Carlo EM, etc.)
  • Attention models (DRAW, One-shot)
  • Recurrent neural nets (RNN, LSTM, GRU)
  • Resampling methods (parametric bootstrap, nonparametric bootstrap, jackknife estimator)
  • Tensor probabilistic models (Tucker decomposition, HOSVD, Parafac)
  • Sequential Monte Carlo
  • Nonparametric Bayesian models (CRP, HDP, IBP, GP, stick breaking construction, complete random measures, etc.)
  • Variational autoencoders (VAE, SVAE)
  • Proximal gradient methods
  • Reinforcement learning (model-based, model-free, Q learning, Monte Carlo tree search, TD-\lambda, SARSA, etc.)
  • Dimensionality reduction (PCA, robust PCA, probabilistic PCA, ICA, etc.)
  • Linear regression
  • Logistic regression
  • ML/Statistics tricks