Agent Examples

Agents solving the HighwayEnv environments are available in the eleurent/rl-agents and DLR-RM/stable-baselines3 repositories.

See the quickstart guide for training examples and notebooks.

Deep Q-Network

DQN agent solving highway-v0

The DQN agent solving highway-v0.

This model-free value-based reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.

Reference implementation: eleurent/rl-agents — Deep Q-Network

Deep Deterministic Policy Gradient

DDPG agent solving parking-v0

The DDPG agent solving parking-v0.

This model-free policy-based reinforcement learning agent is optimized directly by gradient ascent. It uses Hindsight Experience Replay to efficiently learn how to solve a goal-conditioned task.

Reference implementation: openai/baselines — HER

Value Iteration

Value Iteration agent solving highway-v0

The Value Iteration agent solving highway-v0.

Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp(). This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant speed without changing lanes. This model bias can be a source of mistakes.

The agent then performs a Value Iteration to compute the corresponding optimal state-value function.

Reference implementation: eleurent/rl-agents — Value Iteration