Entropy and Mutual Information

1. $\textbf{Entropy}$ definition: Let $X$ be a continuous random variable, defined on probability space $(\Omega,\mathcal{F},\mathcal{P})$ and $X\in\mathcal{X}\subseteq\mathbb{R}$, with cumulative distribution function (CDF) given by \[F(x)=\mathcal{P}\{X\leq x\},\] and probability density function (pdf) given by \[f(x)=\dfrac{dF(x)}{dx},\] which are both assumed to be continuous functions. Then, the entropy of a continuous random variable $X$ is defined as \[h(X)=-\int_\mathcal{X} f(x)\log f(x)dx,\] where the integration is carried out on the support of the random variable.

Example 1.1 Entropy: Entropy of a normal distribution $X\sim\mathcal{N}(0,\sigma^2)$, $f(x)=\dfrac{1}{\sqrt{2\pi\sigma^2}}\exp{\left(-\dfrac{x^2}{2\sigma^2}\right)},$ is \[\begin{align*} h(X)&=-\int f(x)\log f(x)dx\\ &=-\int f(x)\left[-\dfrac{1}{2}\log{\left(2\pi\sigma^2\right)}-\dfrac{x^2}{2\sigma^2}\log e\right]dx\\ &=\dfrac{1}{2}\log{\left(2\pi\sigma^2\right)}\int f(x)dx+\dfrac{\log e}{2\sigma^2}\int x^2f(x)dx\\ &=\dfrac{1}{2}\log{\left(2\pi\sigma^2\right)}+\dfrac{1}{2}\log e=\log{\sqrt{2\pi e\sigma^2}}. \end{align*}\]

Example 1.2 Joint Entropy: Let a multivariate normal distribution $\mathbf{X}\sim\mathcal{N}(\mu_{\mathbf{x}},C)$, \[f(\mathbf{x})=\dfrac{1}{\sqrt{(2\pi)^n|C|}}\exp{\left(-\dfrac{1}{2}(\mathbf{x}-\mu_{\mathbf{x}})^\intercal{C}^{-1}(\mathbf{x}-\mu_{\mathbf{x}})\right)},\] where $|\cdot|$ denotes $\text{det}$ of the matrix and $\mathbf{X}=(X_1,X_2,\dots,X_n)^\intercal\in\mathcal{X}\subseteq\mathbb{R}^n$ is an $n$-dimensional random vector with pdf $f(\mathbf{x})=f(x_1,x_2,\dots,x_n)$. Then the entropy of $\mathbf{X}$ is \[\begin{align*} h(\mathbf{X})&=-\int f(\mathbf{x})\log f(\mathbf{x})d\mathbf{x}\\ &=-\int f(\mathbf{x})\left[-\dfrac{1}{2}\log{\left((2\pi)^n|C|\right)}-\dfrac{\log e}{2}(\mathbf{x}-\mu_{\mathbf{x}})^\intercal C^{-1}(\mathbf{x}-\mu_{\mathbf{x}})\right]d\mathbf{x}\\ &=\dfrac{1}{2}\log{\left((2\pi)^n|C|\right)}\int f(\mathbf{x})d\mathbf{x}+\dfrac{\log e}{2}\int f(\mathbf{x})(\mathbf{x}-\mu_{\mathbf{x}})^\intercal C^{-1}(\mathbf{x}-\mu_{\mathbf{x}})d\mathbf{x}\\ &=\dfrac{1}{2}\log{\left((2\pi)^n|C|\right)}+\dfrac{n\log e}{2}=\log{\sqrt{(2\pi e)^n|C|}}, \end{align*}\] where $\displaystyle\int f(\mathbf{x})d\mathbf{x}=1$, and $\displaystyle\int (\mathbf{x}-\mu_{\mathbf{x}})^\intercal C^{-1}(\mathbf{x}-\mu_{\mathbf{x}})f(\mathbf{x})d\mathbf{x}=n$.

Properties of $\textbf{Entropy}$:
(i) Invariant to translation: \[h(X)=h(X+a)\] (ii) Variant to change of scale \[h(aX)=h(X)+\log |a|\] or, in vector form, \[h(A\mathbf{X})=h(\mathbf{X})+\log|\text{det}(A)|\]

Comments

Popular posts from this blog

How regularization affects the critical points in linear networks

MNIST Dataset