使用matplotlib绘制带有方差区间的曲线

在强化学习的论文中经常可以看到一条收敛线，周围还有浅浅的范围线，一直比较疑惑这个范围线的实际含义，似乎不同论文中这个范围线的实际含义是不同的

例如有些文章中，范围线随时间变化非常剧烈，表示的是不同random seed下运行结果的标准差，而大部分Berkeley和OpenAI的文章中，范围线都很比较平滑，代表的似乎是标准差的滑动平均

直到看TD3的时候才发现，Figure 5的caption处写明了画图的方式

The shaded region represents half a standard deviation of the average evaluation over 10 trials. Curves are smoothed uniformly for visual clarity.

matplotlib.pyplot.fill_between

实现该绘图功能需要用到的最重要的一个就是matplotlib.pyplot.fill_between，参数说明详见官方文档，函数原型如下

matplotlib.pyplot.fill_between(x, 
                               y1, 
                               y2=0, 
                               where=None,
                               interpolate=False, 
                               step=None, *, 
                               data=None, **kwargs)

函数的功能是将两条曲线之间的面积用制定颜色填充，在绘图时我们只需要手动计算出方差区间，然后使用该函数填充区间即可

注意需要将alpha参数调小，从而降低填充区域的透明度，避免原来绘制的图像被填充区域覆盖掉

Implementation

# Suppose variable `reward_sum` is a list containing all the reward summary scalars
def plot_with_variance(reward_mean, reward_std, color='yellow', savefig_dir=None):
    """plot_with_variance
        reward_mean: typr list, containing all the means of reward summmary scalars collected during training
        reward_std: type list, containing all variance
        savefig_dir: if not None, this must be a str representing the directory to save the figure
    """
    half_reward_std = reward_std / 2.0
    lower = [x - y for x, y in zip(reward_mean, half_reward_std)]
    upper = [x + y for x, y in zip(reward_mean, half_reward_std)]
    plt.figure()
    xaxis = list(range(len(lower)))
    plt.plot(xaxis, reward_mean, color=color)
    plt.fill_between(xaxis, lower, upper, color=color, alpha=0.2)
    plt.grid()
    plt.xlabel('Episode')
    plt.ylabel('Average reward')
    plt.title('The convergence of rewards')
    if savefig_dir is not None and type(savefig_dir) is str:
        plt.savefig(savefig_dir, format='svg')
    plt.show()