間違いしかありません.コメントにてご指摘いただければ幸いです(気が付いた点を特に断りなく頻繁に書き直していますのでご注意ください).

ベルヌイモデルのエントロピーやKLダイバージェンスを考える

\(P_{Ber}\)の\(H_n(x)\)や\(D_n(x\parallel y)\)を考える

\(n\)回の試行結果である\(x^n\)の1の発生回数を\(m\)とすると\(P_{Ber}\)の\(\theta\)の最尤推定値\(\hat{\theta}\)は\(\frac{m}{n}\)となる. $$\begin{array}{rcl} \displaystyle \hat{\theta} &=&\displaystyle \frac{m}{n}\\ H_n(P)&\overset{\mathrm{def}}{=}& E^{n}_{P}\left[-\log_2{P(X^n)}\right] \\ \displaystyle H(\theta) &\overset{\mathrm{def}}{=}& \displaystyle -\theta\log_{2}{\left(\theta\right)}-\left(1-\theta\right)\log_{2}{\left(1-\theta\right)} \quad\dotso P=P_{Ber}で\thetaを引数として括弧の内側に記載する. \\ \displaystyle H(\hat{\theta}) &=&\displaystyle -\hat{\theta}\log_{2}{\left(\hat{\theta}\right)}-\left(1-\hat{\theta}\right)\log_{2}{\left(1-\hat{\theta}\right)}\\ &=&\displaystyle -\frac{m}{n}\log_{2}{\left(\frac{m}{n}\right)}-\left(1-\frac{m}{n}\right)\log_{2}{\left(1-\frac{m}{n}\right)}\\ &=&\displaystyle \frac{1}{n}\left\{-m\log_{2}{\left(\frac{m}{n}\right)}-\left(n-m\right)\log_{2}{\left(\frac{n-m}{n}\right)}\right\}\\ &=&\displaystyle \frac{1}{n}\left\{ \displaystyle -m\log_{2}{ \left(m\right) } \displaystyle +m\log_{2}{ \left(n\right) } \displaystyle -\left(n-m\right)\log_{2}{ \left(n-m\right) } \displaystyle +\left(n-m\right)\log_{2}{ \left(n\right) } \displaystyle \right\}\\ \displaystyle D(\hat{\theta}\parallel \theta) &=&\displaystyle \hat{\theta}\log_{2}{\left(\frac{\hat{\theta}}{\theta}\right)}+\left(1-\hat{\theta}\right)\log_{2}{\left(\frac{1-\hat{\theta}}{1-\theta}\right)}\\ &=&\displaystyle \frac{m}{n}\log_{2}{\left(\frac{\frac{m}{n}}{\theta}\right)}+\left(1-\frac{m}{n}\right)\log_{2}{\left(\frac{1-\frac{m}{n}}{1-\theta}\right)}\\ &=&\displaystyle \frac{1}{n}\left\{ \displaystyle m\log_{2}{\left(\frac{\frac{m}{n}}{\theta}\right)} \displaystyle +\left(n-m\right)\log_{2}{\left(\frac{1-\frac{m}{n}}{1-\theta}\right)} \displaystyle \right\}\\ &=&\displaystyle \frac{1}{n}\left\{ \displaystyle m\log_{2}{\left(\frac{m}{n}\right)} \displaystyle -m\log_{2}{\left(\theta\right)} \displaystyle +\left(n-m\right)\log_{2}{\left(1-\frac{m}{n}\right)} \displaystyle \displaystyle -\left(n-m\right)\log_{2}{\left(1-\theta\right)} \displaystyle \right\}\\ &=&\displaystyle \frac{1}{n}\left\{ \displaystyle m\log_{2}{\left(m\right)} \displaystyle -m\log_{2}{\left(n\right)} \displaystyle -m\log_{2}{\left(\theta\right)} \displaystyle +\left(n-m\right)\log_{2}{\left(n-m\right)} \displaystyle -\left(n-m\right)\log_{2}{\left(n\right)} \displaystyle -\left(n-m\right)\log_{2}{\left(1-\theta\right)} \displaystyle \right\}\\ \displaystyle H(\hat{\theta})+D(\hat{\theta}\parallel \theta) &=&\displaystyle \frac{1}{n}\left\{ \displaystyle -m\log_{2}{\left(\theta\right)}-\left(n-m\right)\log_{2}{\left(1-\theta\right)} \displaystyle \right\}\\ \displaystyle n\left\{H(\hat{\theta})+D(\hat{\theta}\parallel \theta)\right\} &=&\displaystyle -m\log_{2}{\left(\theta\right)}-\left(n-m\right)\log_2{\left(1-\theta\right)}\\ \end{array}$$ $$\begin{array}{rcl} -\log_2{\left( L(\theta|x^n) \right)} &=&-m\log_2{\left( \theta \right)}-(n-m)\log_2{\left(1-\theta\right)}\\ n\left\{H\left( \hat{\theta} \right)+D(\hat{\theta}\parallel \theta)\right\} &=&-m\log_2{\left( \theta \right)}-\left( n-m \right)\log_2{\left( 1-\theta \right)}\\ -\log_2{\left( L\left(\theta|x^n\right) \right)} &=&n\left\{H\left( \hat{\theta} \right)+D\left( \hat{\theta}\parallel \theta \right)\right\}\\ &=&nH\left( \hat{\theta} \right)+nD\left( \hat{\theta}\parallel \theta \right)\\ -\log_2{\left( L\left(\hat{\theta}|x^n\right) \right)} &=&nH\left( \hat{\theta} \right)+nD\left( \hat{\theta}\parallel \hat{\theta} \right)\quad\dotso\theta=\hat{\theta}\\ &=&nH\left( \hat{\theta} \right)+n0\quad\dotso D\left( \hat{\theta}\parallel \hat{\theta} \right)=0\\ &=&nH\left( \frac{m}{n} \right)\quad\dotso \hat{\theta}=\frac{m}{n}\\ \end{array}$$

0 件のコメント:

コメントを投稿