Learning types

$$\gdef \sam #1 {\mathrm{softargmax}(#1)}$$ $$\gdef \vect #1 {\boldsymbol{#1}} $$ $$\gdef \matr #1 {\boldsymbol{#1}} $$ $$\gdef \E {\mathbb{E}} $$ $$\gdef \V {\mathbb{V}} $$ $$\gdef \R {\mathbb{R}} $$ $$\gdef \N {\mathbb{N}} $$ $$\gdef \relu #1 {\texttt{ReLU}(#1)} $$ $$\gdef \D {\,\mathrm{d}} $$ $$\gdef \deriv #1 #2 {\frac{\D #1}{\D #2}}$$ $$\gdef \pd #1 #2 {\frac{\partial #1}{\partial #2}}$$ $$\gdef \set #1 {\left\lbrace #1 \right\rbrace} $$

Here we present different Learning types.

  1. Abstract
  2. Active Learning
  3. Reinforcement Learning (RL)
    • On-policy and Off-policy learning
    • Policy-based methods
  4. Multi-task Learning (MTL)
  5. Meta-Learning
  6. Continual/Life-long Learning
  7. Online/Offline Learning
  8. Network Architecture Search (NAS)
  9. Bayesian Learning (BL)
    • Bayes topics
    • Bayes in Deep Learning
    • Uncertainties
    • Optimization under uncertainty
    • Gaussian Process and Kernels
  10. Summary

Abstract

Active Learning

Reinforcement Learning (RL)

On-policy and Off-policy learning

Policy-based methods

  • Note that RL methods split into two types: value-based (which we learned about in STATE SPACE slides in here and here) and policy-based which we learn here. The difference between these two types can be seen here.
  • About policy-gradient and AC in here.

Multi-task Learning (MTL)

Meta-Learning

Continual/Life-long Learning

Online/Offline Learning

Network Architecture Search (NAS)

Bayesian Learning (BL)

Bayes topics

Bayes in Deep Learning

Uncertainties

Optimization under uncertainty

Gaussian Process and Kernels

Gaussian Process

  • So how to draw the plot of mean function + uncertainty? Simply by calculating the mean and covariance in the train+test points.
  • Note that it is actually $\mathbf{f}^* \vert \mathbf{y}$ and not $\mathbf{f}^* \vert \mathbf{X}^*$ GP posterior (see in black box).
  • See different GP distributions in here.
  • More on GP here, here, and a python implementation here.
  • Great (with detailed math) GP explanation, following sparse/var GP versions: here and here. Sparse GP is to reduce K dimensions, and variational as a way to avoid original K huge calculations.
  • More on BLR here and here.
  • See “The automatic statistician” in here at time ~ 1:34:00.

Kernels summary

  • The “Kernel summary” is based on here, here and here (from time ~ 1:20:00).

Summary