Posts

How regularization affects the critical points in linear networks

$\newcommand{\transpose}{\intercal}$ $\DeclareMathOperator*{\minimize}{minimize}$ Given an input initial random vector $X_0\in\mathbb{R}^n$ with $p_X$ distribution and covariance matrix $\Sigma_{X_0}=\mathbb{E}[X_0{X_0}^\transpose]$. Assume the input-output model is in the following linear form: \[Z=RX_0+\xi,\] where $\xi\in\mathbb{R}^n$ is the noise and $Z\in\mathbb{R}^n$ is the output. In addition, the noise $\xi$ is assumed to have $p_\xi$ distribution and be independent to the input $X_0$, i.e. $\mathbb{E}[\xi{X_0}^\transpose]=0$. The problem is using i.i.d. input-output samples $\{({X_0}^{(k)},Z^{(k)})\}_{k=1}^K$ to learn the weights of a linear feed-forward neural network \[\dfrac{dX_t}{dt}=A_tX_t\] in order to match the input-output relation $R$. Note that $A_t$ are the network weights, $t$ denotes the input layer with at most depth $T$, and $K$ is the total number of trainning samples. Consider the following regularized form of the optimization problem: \[\begin{align} ...

Entropy and Mutual Information

1. $\textbf{Entropy}$ definition: Let $X$ be a continuous random variable, defined on probability space $(\Omega,\mathcal{F},\mathcal{P})$ and $X\in\mathcal{X}\subseteq\mathbb{R}$, with cumulative distribution function (CDF) given by \[F(x)=\mathcal{P}\{X\leq x\},\] and probability density function (pdf) given by \[f(x)=\dfrac{dF(x)}{dx},\] which are both assumed to be continuous functions. Then, the entropy of a continuous random variable $X$ is defined as \[h(X)=-\int_\mathcal{X} f(x)\log f(x)dx,\] where the integration is carried out on the support of the random variable. Example 1.1 Entropy: Entropy of a normal distribution $X\sim\mathcal{N}(0,\sigma^2)$, $f(x)=\dfrac{1}{\sqrt{2\pi\sigma^2}}\exp{\left(-\dfrac{x^2}{2\sigma^2}\right)},$ is \[\begin{align*} h(X)&=-\int f(x)\log f(x)dx\\ &=-\int f(x)\left[-\dfrac{1}{2}\log{\left(2\pi\sigma^2\right)}-\dfrac{x^2}{2\sigma^2}\log e\right]dx\\ &=\dfrac{1}{2}\log{\left(2\pi\sigma^2\right)}\int f(x)dx+\dfrac{\log e}{2\sigma...

MNIST Dataset

Image
MNIST Dataset is a well known handwritten digits dataset, and lots of people use it as their first pattern recognition practice. I starts with this github tutorial and with this reference website . There are 60,000 samples in the trainning set, and 10,000 samples in the testing set. Each sample has 28x28 pixels with values vary from 0~255 and each digit is centered with 20x20 pixels. A sample is shown as belowed. Note that numbers of samples for each digit in trainning set are not the same (i.e. not 6,000 samples for each digit), and not in testing set either. In this post, we try to use 8 kernels to built a filter for 10 digits. If we set the probability function matrix $[p_0\ p_1\ \dots\ p_9]^T$ corresponding to digit $n$ from 0 to 9, and each probability function can be written as \[p_n(t)=P\{X=n|\ y(1),y(2),\dots,y(t-1),y(t)\},\ \forall\ n\in\{0,1,\dots,9\},\] where $y(t)$ is an observation value, and at this moment, we assume $y(t)$ can be modeled as \[y(t)=h_k(n,t)+N...

Visualization of the Keyword Recording

The following lists 5 different keyword commands in the time domain and the frequency-time (spectrum) domain. Bed Up Down Yes No It’s interesting to find out a few facts from these pictures. First, in each category, each soundtrack all have a similar frequency response, called formants. Sometimes, though, it is not very similar (such as the following two “Yes” commands), we still can see the formants. Second, these recordings are all recorded in 16,000 Hz sampling rate, and in our spectrum analysis, the frequency resolution is 50 Hz. However, the frequency resolution of our human ears is up to 3.6Hz in the octave of 1,000–2,000Hz [2]. It means, a human is more precise and can hear much smaller details than machines. But a good question is that “Do we need that precise to distinguish each command?” Hope there are more results next week. Hope there are more results next week. The dataset is produced by Google with the link here. References: - [0] my GitHub code - [1] Speech ...