Q Learning vs Policy Gradients

  1. Policy Gradients is generally believed to be able to apply to a wider range of problems. For instance, on occasions when the Q function (i.e. reward function) is too complex to be learned, DQN will fail miserably. 
  2. Policy Gradients is still capable of learning a good policy since it directly operates in the policy space.
  3. , Policy Gradients usually show faster convergence rate than DQN, but has a tendency to converge to a local optimal.
  4. Since Policy Gradients model probabilities of actions, it is capable of learning stochastic policies
  5. Policy Gradients can be easily applied to model continuous action space since the policy network is designed to model probability distribution,  DQN has to go through an expensive action discretization process
  6. one of the biggest drawbacks of Policy Gradients is the high variance in estimating the gradient

猜你喜欢

转载自blog.csdn.net/liyaohhh/article/details/81784036
今日推荐