How regularization affects the critical points in linear networks
$\newcommand{\transpose}{\intercal}$ $\DeclareMathOperator*{\minimize}{minimize}$ Given an input initial random vector $X_0\in\mathbb{R}^n$ with $p_X$ distribution and covariance matrix $\Sigma_{X_0}=\mathbb{E}[X_0{X_0}^\transpose]$. Assume the input-output model is in the following linear form: \[Z=RX_0+\xi,\] where $\xi\in\mathbb{R}^n$ is the noise and $Z\in\mathbb{R}^n$ is the output. In addition, the noise $\xi$ is assumed to have $p_\xi$ distribution and be independent to the input $X_0$, i.e. $\mathbb{E}[\xi{X_0}^\transpose]=0$. The problem is using i.i.d. input-output samples $\{({X_0}^{(k)},Z^{(k)})\}_{k=1}^K$ to learn the weights of a linear feed-forward neural network \[\dfrac{dX_t}{dt}=A_tX_t\] in order to match the input-output relation $R$. Note that $A_t$ are the network weights, $t$ denotes the input layer with at most depth $T$, and $K$ is the total number of trainning samples. Consider the following regularized form of the optimization problem: \[\begin{align} ...
Comments
Post a Comment