Machine Learning

Linear regression and correlation

The “Multivariate Gaussian distribution” part of the video is based on this, this, this, and this. See more details here.
Note that besides $r$ or pearson linear correlation between continuous variables, there are other correlation measures for other variable types: cremer for ordinal or categorical variables, and spirmann for ordinal. While pearson is for interval or ratio scale variables. There are also other correlation coefficients, e.g. here.
Note also, that a single gaussian distribution is too simplistic assumption about general data, hence usually we use mixture of gaussians, i.e. where there are several blobs to represent actual data.
It is formulized as a weighted sum of multiple Gaussian distributions, each representing a cluster or component within the data: $p(\mathbf{x})=\sum\limits_{i=1}^{K}\omega_i\mathcal{N}(\mathbf{x}|\mu_i,\Sigma_i)$ where $\sum\limits_{i=1}^{K}\omega_i=1$ make sure that the mixture weights form a valid probability distribution. The parameters $\mu_i,\Sigma_i,\omega_i$ are learned, via any parameter approximation method. For example, via Expectation-Maximization (EM) algorithm, to maximize the likelihood of the observed data.
This section is the base for the Gaussian processes in the Bayesian Learning section.

Note that this is a PARTIAL video. To be completed later.
Note, that Generalized Linear Models (GLMs) are used for cases in which we do not assume the error in $y=f(x)+\epsilon$ is Gaussian distributed. See more in here.
See more about Perceptron here.
Good SVM explanation here, and also here.