Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
References
- [Paper by Liu] Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
- [Paper by Jordan] A Kernelized Stein Discrepancy for Goodness-of-fit Tests
- [Blog] The Stein Gradient
- [Tutorial slides] Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
- [Paper] Stein Variational Policy Gradient
- [Paper] Reinforcement Learning with Deep Energy-Based Policies
Introduction
贝叶斯机器学习常常提到的一个概念是inference,所谓inference即通过已知的先验与数据得到posterior的方法,其具体过程为
可以说,贝叶斯及其学习的一切都围绕inference展开,上式在实践应用中最大的问题就是分母上的积分不一定可以算出closed-form solution,为此Bayesian们想了很多办法,包括但不限于
- Sampling: MCMC, HMC, etc.
- Variational inference: approximate $p(y|x)$ with $q(y)$, and then
- Mean field: $q(y)=\prod_{i}q_{i}(y)$
- Black box variational inference: VAE. etc.
- Expectation propagation…
这篇论文中提出的SVGD也是一种概率近似的方法,其核心思想是通过density transformation来最小化 $q(y)$ 与 $p(y|x)$ 的KL
Stein’s identity
Define operator $\mathcal{A}_{p}\phi(x):=\phi(x)\nabla_{x}\log p(x)^{T}+\nabla_{x}\phi(x)$, then
The identity holds true if either
- $p(x)\phi(x)=0\qquad$ $\forall x\in\partial\mathcal{X}$
- $\lim_{|x|\rightarrow\infty}p(x)\phi(x)=0$ when $x\in{\mathbb{R}^{d}}$