解答
1
QEM(θ|θ′)+H(θ′)+KL(θ′‖θ)=n∑h=1∑zhlog(p(xh,zh|θ))p(xh,zh|θ′)p(xh|θ′)−n∑h=1∑zhp(zh|xh,θ′)log(p(zh|xh,θ′))+n∑h=1∑zhp(zh|xh,θ′)log(p(zh|xh,θ′)p(zh|xh,θ))=n∑h=1∑zhlog(p(xh,zh|θ))p(xh,zh|θ′)p(xh|θ′)−n∑h=1∑zhp(zh|xh,θ′)log(p(zh|xh,θ))=n∑h=1∑zhlog(p(xh,zh|θ))p(zh|xh,θ′)−n∑h=1∑zhp(zh|xh,θ′)log(p(zh|xh,θ))(∵ Conditional probability)=n∑h=1∑zhp(zh|xh,θ′)log(p(xh|θ))(∵ Conditional probability)=n∑h=1log(p(xh|θ))∑zhp(zh|xh,θ′)=n∑h=1log(p(xh|θ))(∵ Marginalization)=l(θ|D)
2
KL(θ′‖θ)|θ=θ′=n∑h=1∑zhp(zh|xh,θ′)log(p(zh|xh,θ′)p(zh|xh,θ′))=n∑h=1∑zhp(zh|xh,θ′)⋅log(1)=0
より、θ=θ′ で l(θ|D)=QEM(θ|θ′)+H(θ′)
また、
∂∂θl(θ|D)|θ=θ′=n∑h=1∂∂θ(log(p(xh|θ)))|θ=θ′=n∑h=11p(xh|θ′)∂∂θ(p(xh|θ))|θ=θ′∂∂θ(QEM(θ|θ′)+H(θ′))|θ=θ′=n∑h=1∑zh∂∂θ(p(xh,zh|θ))p(xh,zh|θ)|θ=θ′p(xh,zh|θ′)p(xh|θ′)=n∑h=11p(xh|θ′)∑zh∂∂θ(p(xh,zh|θ))|θ=θ′=n∑h=11p(xh|θ′)∂∂θ(p(xh|θ))|θ=θ′
となるので、θ に関する一階微分に関しても先の等式が成り立つ。
ゆえに、題意が成り立つ。