Agent Examples¶
Agents solving the HighwayEnv environments are available in the eleurent/rl-agents and DLR-RM/stable-baselines3 repositories.
See the quickstart guide for training examples and notebooks.
Deep Q-Network¶
The DQN agent solving highway-v0.¶
This model-free value-based reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.
Reference implementation: eleurent/rl-agents — Deep Q-Network
Deep Deterministic Policy Gradient¶
The DDPG agent solving parking-v0.¶
This model-free policy-based reinforcement learning agent is optimized directly by gradient ascent. It uses Hindsight Experience Replay to efficiently learn how to solve a goal-conditioned task.
Reference implementation: openai/baselines — HER
Value Iteration¶
The Value Iteration agent solving highway-v0.¶
Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp(). This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant speed without changing lanes. This model bias can be a source of mistakes.
The agent then performs a Value Iteration to compute the corresponding optimal state-value function.
Reference implementation: eleurent/rl-agents — Value Iteration
Monte-Carlo Tree Search¶
The MCTS agent solving highway-v0.¶
This agent leverages a transition and reward model to perform a stochastic tree search (Coulom, 2006) of the optimal trajectory. No particular assumption is required on the state representation or transition model.
Reference implementation: eleurent/rl-agents — Monte-Carlo Tree Search