Diffuser Figure 1
Diffuser 相对于传统的强化学习模型也是比较颠覆的,它生成的计划不是在时间轴上逐步展开,而是从整个序列意义上的模糊变得逐渐精确。扩散模型本身的进一步研究也是计算机视觉中的一个火热的话题,在其模型本身上很可能未来几年也会有突破。
不过扩散模型本身目前相比于其它生成模型有一个特别的缺陷,那就是它的生成速度相比于其它生成模型会更慢。很多相关领域的专家认为这一点可能在未来几年内会被缓解。不过数秒的生成时间目前对于强化学习需要实时控制的情景来说是很难接受的。Diffuser 提出了能够提升生成速度的方法:从上一步的计划开始增加少量噪音来重新生成下一步的计划,不过这样做会一定程度上降低模型的表现。
参考
- Decision Transformer: Reinforcement Learning via Sequence Modeling https://arxiv.org/abs/2106.01345
- Offline Reinforcement Learning as One Big Sequence Modeling Problem https://arxiv.org/abs/2106.02039
- A Generalist Agent https://arxiv.org/abs/2205.06175
- Planning with Diffusion for Flexible Behavior Synthesis https://arxiv.org/abs/2205.09991
- Attention Is All You Need https://arxiv.org/abs/1706.03762
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929
- Masked Autoencoders Are Scalable Vision Learners https://arxiv.org/abs/2111.06377
- Relational Deep Reinforcement Learning https://arxiv.org/abs/1806.01830
- Grid-to-Graph: Flexible Spatial Relational Inductive Biases for Reinforcement Learning https://arxiv.org/abs/2102.04220
- Transformers are Meta-Reinforcement Learners https://arxiv.org/abs/2206.06614
- Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions https://arxiv.org/abs/1912.02875