Week Summaries

20190904

Assessing Generalization in Deep Reinforcement Learning

Systematic empirical evaluation shows that vanilla deep RL algorithms generalize better than specialized deep RL algorithms designed specifically for generalization. In other words, simply training on varied environments is so far the most effective strategy for generalization.

作者对各种深度强化学习的泛化性做了广泛调研，发现在变化的环境中训练agent是目前为止最有效的加强模型泛化性方法

Quantifying Generalization in Reinforcement Learning

OpenAI的文章，新release出一个叫做CoinRun的benchmark，与之前的环境最大的区别在于，这个游戏的环境是实时生成的，i.e. agent不会两次遇到相同的环境。因此该环境要求agent不能通过死记硬背学习策略，对DRL的generalization ability提出更高的挑战

作者在3层的CNN上采用PPO算法训练了256M数量级的timesteps，每个trajectory平均长度100，部分agent在封闭环境训练，部分在开放环境中训练，实验表明所有的agent都有不同程度的过拟合，但与预期一致，开放环境中训练得到的agent测试得分比封闭环境的agent强很多

此外IMPALA的CNN结构的泛化性能远强于其他的CNN结构（可能说明state的abstract embedding重要性）

其他的一些结论：

Dropout and L2 regularization: Both noticeably reduce the generalization gap, though L2 regularization has a bigger impact.

Data augmentation (modified Cutout) and batch normalization: Both data augmentation and batch normalization significantly improve generalization.

Environmental stochasticity: Training with stochasticity improves generalization to a greater extent than any of the previously mentioned techniques (see the paper for details).