MNIST Dataset
There are 60,000 samples in the trainning set, and 10,000 samples in the testing set. Each sample has 28x28 pixels with values vary from 0~255 and each digit is centered with 20x20 pixels. A sample is shown as belowed.
Note that numbers of samples for each digit in trainning set are not the same (i.e. not 6,000 samples for each digit), and not in testing set either.In this post, we try to use 8 kernels to built a filter for 10 digits. If we set the probability function matrix $[p_0\ p_1\ \dots\ p_9]^T$ corresponding to digit $n$ from 0 to 9, and each probability function can be written as \[p_n(t)=P\{X=n|\ y(1),y(2),\dots,y(t-1),y(t)\},\ \forall\ n\in\{0,1,\dots,9\},\] where $y(t)$ is an observation value, and at this moment, we assume $y(t)$ can be modeled as \[y(t)=h_k(n,t)+N_k(t),\] where $h_k(n,t)$ are filters and $N_k(t)$ is a normal distribution noise with zero mean and variance ${\sigma_k}^2(t)$: \[N_k(t)\sim N(0,{\sigma_k}^2(t)).\]Note that time $t$ here is an integer in set $[l,L)$, which will be further discussed later.
First, let's build filters $h_k(n,t)$. Each filter $h_k$ corresponds to different kernels $w_k$ but with same time window $l=4$ pixels. Here, we try 8 kernels, listed as below:
| $w_1(t)=\sin\left(\dfrac{\pi x}{L}\right)$ | $w_3(t)=\sin\left(\dfrac{2\pi x}{L}\right)$ | $w_5(t)=\sin\left(\dfrac{3\pi x}{L}\right)$ | $w_7(t)=\sin\left(\dfrac{4\pi x}{L}\right)$ |
| $w_2(t)=\sin\left(\dfrac{\pi y}{L}\right)$ | $w_4(t)=\sin\left(\dfrac{2\pi y}{L}\right)$ | $w_6(t)=\sin\left(\dfrac{3\pi y}{L}\right)$ | $w_8(t)=\sin\left(\dfrac{4\pi y}{L}\right)$ |
Next, we compute each $h_k(n,t)$ for the very first 100 samples in each digit dataset and take the average and variance to build the final filters $h_k(n,t)$, shown as belowe:
However, from our assumption, noise $N_k(t)$ is merely a function of time $t$, which means it is not based on digit $n$. So we futher take the average among all digits at each time $t$.After $h_k(n,t)$ and $N_k(t)$ are constructed, we are now able to calculate $p_n(t)$ based on conditional probability theory: \[\begin{align*} p_n(t)&=P\{X=n|y(1),y(2),\dots,y(t-1),y(t)\}\\\\ &=\dfrac{P\{y(t)|X=n\}\ P\{X=n|y(1),y(2),\dots,y(t-1)\}}{P\{y(t)|y(1),y(2),\dots,y(t-1)\}}\\\\ &\propto P\{y(t)|X=n\}\ p_n(t-1).\end{align*}\] Note that $P\{y(t)|y(1),y(2),\dots,y(t-1)\}$ does not depend on digit number $n$, and therefore, we can write the unnormalized probability $\tilde p_n(t)$ as:\[\tilde p_n(t)=P\{y(t)|X=n\}\ \tilde p_n(t-1),\] where \[P\{y(t)|X=n\}=\prod_k\text{exp}\left(-\dfrac{(y(t)-h_k(n,t))^2}{2{\sigma_k}^2(t)}\right).\] In addition, the relation between $p_n(t)$ and $\tilde p_n(t)$ is defined as: \[p_n(t)=\dfrac{\tilde p_n(t)}{\sum\limits_{m=0}^9\tilde p_m(t)}.\] Furthermore, for coding simplicity, we take $\mu_n(t)=\ln\tilde p_n(t)$, and then we have \[\mu_n(t)=-\sum_k\dfrac{(y(t)-h_k(n,t))^2}{2{\sigma_k}^2(t)}+\mu_n(t-1).\]
The followings are 10 testing data under evaluation: The accuracy table:


Comments
Post a Comment