Agent Examples¶

Agents solving the HighwayEnv environments are available in the eleurent/rl-agents and DLR-RM/stable-baselines3 repositories.

See the quickstart guide for training examples and notebooks.

Deep Q-Network¶

This model-free value-based reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.

Reference implementation: eleurent/rl-agents — Deep Q-Network

Deep Deterministic Policy Gradient¶

This model-free policy-based reinforcement learning agent is optimized directly by gradient ascent. It uses Hindsight Experience Replay to efficiently learn how to solve a goal-conditioned task.

Reference implementation: openai/baselines — HER

Value Iteration¶

Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp(). This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant speed without changing lanes. This model bias can be a source of mistakes.

The agent then performs a Value Iteration to compute the corresponding optimal state-value function.

Reference implementation: eleurent/rl-agents — Value Iteration

Monte-Carlo Tree Search¶

This agent leverages a transition and reward model to perform a stochastic tree search (Coulom, 2006) of the optimal trajectory. No particular assumption is required on the state representation or transition model.

Reference implementation: eleurent/rl-agents — Monte-Carlo Tree Search