Parking#

A goal-conditioned continuous control task in which the ego-vehicle must park in a given space with the appropriate heading.

https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/parking-env.gif

Usage#

env = gym.make("parking-v0")

Default configuration#

{
    "observation": {
        "type": "KinematicsGoal",
        "features": ['x', 'y', 'vx', 'vy', 'cos_h', 'sin_h'],
        "scales": [100, 100, 5, 5, 1, 1],
        "normalize": False
    },
    "action": {
        "type": "ContinuousAction"
    },
    "simulation_frequency": 15,
    "policy_frequency": 5,
    "screen_width": 600,
    "screen_height": 300,
    "centering_position": [0.5, 0.5],
    "scaling": 7
    "show_trajectories": False,
    "render_agent": True,
    "offscreen_rendering": False
}

More specifically, it is defined in:

classmethod ParkingEnv.default_config() dict[source]#

Default environment configuration.

Can be overloaded in environment implementations, or by calling configure(). :return: a configuration dict

API#

class highway_env.envs.parking_env.ParkingEnv(config: dict | None = None, render_mode: str | None = None)[source]#

A continuous control environment.

It implements a reach-type task, where the agent observes their position and speed and must control their acceleration and steering so as to reach a given goal.

Credits to Munir Jojo-Verge for the idea and initial implementation.

classmethod default_config() dict[source]#

Default environment configuration.

Can be overloaded in environment implementations, or by calling configure(). :return: a configuration dict

define_spaces() None[source]#

Set the types and spaces of observation and action from config.

compute_reward(achieved_goal: ndarray, desired_goal: ndarray, info: dict, p: float = 0.5) float[source]#

Proximity to the goal is rewarded

We use a weighted p-norm

Parameters:
  • achieved_goal – the goal that was achieved

  • desired_goal – the goal that was desired

  • info (dict) – any supplementary information

  • p – the Lp^p norm used in the reward. Use p<1 to have high kurtosis for rewards in [0, 1]

Returns:

the corresponding reward