The “Multivariate Gaussian distribution” part of the video is based on this, this, this, and this. See more details here.
Note that besides $r$ or pearson linear correlation between continuous variables, there are other correlation measures for other variable types: cremer for ordinal or categorical variables, and spirmann for ordinal. While pearson is for interval or ratio scale variables. There are also other correlation coefficients, e.g. here.
Note also, that a single gaussian distribution is too simplistic assumption about general data, hence usually we use mixture of gaussians, i.e. where there are several blobs to represent actual data.
It is formulized as a weighted sum of multiple Gaussian distributions, each representing a cluster or component within the data: $p(\mathbf{x})=\sum\limits_{i=1}^{K}\omega_i\mathcal{N}(\mathbf{x}|\mu_i,\Sigma_i)$ where $\sum\limits_{i=1}^{K}\omega_i=1$ make sure that the mixture weights form a valid probability distribution. The parameters $\mu_i,\Sigma_i,\omega_i$ are learned, via any parameter approximation method. For example, via Expectation-Maximization (EM) algorithm, to maximize the likelihood of the observed data.
This section is the base for the Gaussian processes in the Bayesian Learning section.
Algebra clarifications
More about linear transformations: here, here, and here.