CS294-112 深度强化学习 秋季学期(伯克利)NO.4 Policy gradients introduction

 

 

 

 

 

 

 

 

 

 

 

green bar is the reward function, blue curve is the possibility of differenct trajectories

 

if green bars are equally increased to yellow bars, the result will change!

 

 

 

 

 

  

 

 

 

 

 

 

 

 

  

 

 

 

 

 

 

 

 

猜你喜欢

转载自www.cnblogs.com/ecoflex/p/9085805.html