Notes for Stein Variational Gradient Descent

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

References

贝叶斯机器学习常常提到的一个概念是inference，所谓inference即通过已知的先验与数据得到posterior的方法，其具体过程为

$p(y|x)=\frac{p(x|y)p(y)}{\int_{\mathcal{Y}}p(x|y)p(y)dy}$

可以说，贝叶斯及其学习的一切都围绕inference展开，上式在实践应用中最大的问题就是分母上的积分不一定可以算出closed-form solution，为此Bayesian们想了很多办法，包括但不限于

Sampling: MCMC, HMC, etc.
Variational inference: approximate $p(y|x)$ with $q(y)$, and then
- Mean field: $q(y)=\prod_{i}q_{i}(y)$
- Black box variational inference: VAE. etc.
Expectation propagation…

这篇论文中提出的SVGD也是一种概率近似的方法，其核心思想是通过density transformation来最小化 $q(y)$ 与 $p(y|x)$ 的KL

$q(y)=q_{T} \circ q_{T-1} \circ q_{T-2} \dots \circ q_{1} \circ q_{0}$

Define operator $\mathcal{A}_{p}\phi(x):=\phi(x)\nabla_{x}\log p(x)^{T}+\nabla_{x}\phi(x)$, then

$\mathbb{E}_{x\sim{p}}[\mathcal{A}_{p}\phi(x)]=0$

The identity holds true if either