If you’re looking to get started with Reinforcement Learning, the OpenAI gym is undeniably the most popular choice for implementing environments to train your agents. A wide range of environments that are used as benchmarks for proving the efficacy of any new research methodology are implemented in OpenAI Gym, out-of-the-box. Furthermore, OpenAI gym provides an easy API to implement your own environments.
In this article, I will introduce the basic building blocks of OpenAI Gym. Here is a list of things I have covered in this article.
So let’s get started.
The first thing we do is to make sure we have the latest version of gym
installed.
One can either use conda
or pip
to install gym
. In our case, we’ll use pip
.
The fundamental building block of OpenAI Gym is the Env
class. It is a Python class that basically implements a simulator that runs the environment you want to train your agent in. Open AI Gym comes packed with a lot of environments, such as one where you can move a car up a hill, balance a swinging pendulum, score well on Atari games, etc. Gym also provides you with the ability to create custom environments as well.
We start with an environment called MountainCar
, where the objective is to drive a car up a mountain. The car is on a one-dimensional track, positioned between two “mountains”. The goal is to drive up the mountain on the right; however, the car’s engine is not strong enough to scale the mountain in a single go. Therefore, the only way to succeed is to drive back and forth to build up momentum.
The goal of the Mountain Car Environment is to gain momentum and reach the flag.
The basic structure of the environment is described by the observation_space
and the action_space
attributes of the Gym Env
class.
The observation_space
defines the structure as well as the legitimate values for the observation of the state of the environment. The observation can be different things for different environments. The most common form is a screenshot of the game. There can be other forms of observations as well, such as certain characteristics of the environment described in vector form.
Similarly, the Env
class also defines an attribute called the action_space
, which describes the numerical structure of the legitimate actions that can be applied to the environment.
The observation for the mountain car environment is a vector of two numbers representing velocity and position. The middle point between the two mountains is taken to be the origin, with right being the positive direction and left being the negative direction.
We see that both the observation space as well as the action space are represented by classes called Box
and Discrete
, respectively. These are one of the various data structures provided by gym
in order to implement observation and action spaces for different kind of scenarios (discrete action space, continuous action space, etc). We will dig further into these later in the article.
In this section, we cover functions of the Env
class that help the agent interact with the environment. Two such important functions are:
reset
: This function resets the environment to its initial state, and returns the observation of the environment corresponding to the initial state.step
: This function takes an action as an input and applies it to the environment, which leads to the environment transitioning to a new state. The reset function returns four things:observation
: The observation of the state of the environment.reward
: The reward that you can get from the environment after executing the action that was given as the input to the step
function.done
: Whether the episode has been terminated. If true, you may need to end the simulation or reset the environment to restart the episode.info
: This provides additional information depending on the environment, such as number of lives left, or general information that may be conducive in debugging.Let us now see an example that illustrates the concepts discussed above. We first begin by resetting the environment, then we inspect an observation. We then apply an action and inspect the new observation.
In this case, our observation is not the screenshot of the task being performed. In many other environments (like Atari, as we will see), the observation is a screenshot of the game. In either of the scenarios, if you want to see how the environment looks in the current state, you can use the render
method.
This should display the environment in its current state in a pop-up window. You can close the window using the close
function.
If you want to see a screenshot of the game as an image, rather than as a pop-up window, you should set the mode
argument of the render
function to rgb_array
.
OUTPUT
Collecting all the little blocks of code we have covered so far, the typical code for running your agent inside the MountainCar
environment would look like the following. In our case we just take random actions, but you can have an agent that does something more intelligent based on the observation you get.
The observation_space
for our environment was Box(2,)
, and the action_space
was Discrete(2,)
. What do these actually mean? Both Box
and Discrete
are types of data structures called “Spaces” provided by Gym to describe the legitimate values for the observations and actions for the environments.
All of these data structures are derived from the gym.Space
base class.
Box(n,)
corresponds to the n
-dimensional continuous space. In our case n=2
, thus the observational space of our environment is a 2-D space. Of course, the space is bounded by upper and lower limits which describe the legitimate values our observations can take. We can determine this using the high
and low
attributes of the observation space. These correspond to the maximum and minimum positions/velocities in our environment, respectively.
You can set these upper/lower limits while defining your space, as well as when you are creating an environment.
The Discrete(n)
box describes a discrete space with [0.....n-1]
possible values. In our case n = 3
, meaning our actions can take values of either 0, 1, or 2. Unlike Box
, Discrete
does not have a high
and low
method, since, by the very definition, it is clear what type of values are allowed.
If you try to input invalid values in the step
function of our environment (in our case, say, 4), it will lead to an error.
OUTPUT
There are multiple other spaces available for various use cases, such as MultiDiscrete
, which allow you to use more than one discrete variable for your observation and action space.
The Wrapper
class in OpenAI Gym provides you with the functionality to modify various parts of an environment to suit your needs. Why might such a need arise? Maybe you want to normalize your pixel input, or maybe you want to clip your rewards. While typically you could accomplish the same by making another class that sub-classes your environment Env
class, the Wrapper
class allows us to do it more systematically.
But before we begin, let’s switch to a more complex environment that will really help us appreciate the utility that Wrapper
brings to the table. This complex environment is going to be the the Atari game Breakout.
Before we begin, we install Atari components of gym
.
If you have an error to the tune of AttributeError: module 'enum' has no attribute 'IntFlag'
, you might need to uninstall the enum
package, and then re-attempt the install.
Gameplay of Atari Breakout
Let’s now run the environment with random actions.
Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size. Our action space contains 4 discrete actions (Left, Right, Do Nothing, Fire)
Now that we have our environment loaded, let us suppose we have to make certain changes to the Atari Environment. It’s a common practice in Deep RL that we construct our observation by concatenating the past k
frames together. We have to modify the Breakout Environment such that both our reset
and step
functions return concatenated observations.
For this we define a class of type gym.Wrapper
to override the reset
and return
functions of the Breakout Env
. The Wrapper
class, as the name suggests, is a wrapper on top of an Env
class that modifies some of its attributes and functions.
The __init__
function is defined with the Env
class for which the wrapper is written, and the number of past frames to be concatenated. Note that we also need to redefine the observation space since we are now using concatenated frames as our observations. (We modify the observation space from (210, 160, 3) to (210, 160, 3 * num_past_frames.)
In the reset
function, while we are initializing the environment, since we don’t have any previous observations to concatenate, we concatenate just the initial observations repeatedly.
Now, to effectively get our modified environment, we wrap our environment Env
in the wrapper we just created.
Let us now verify whether the observations are indeed concatenated or not.
There is more to Wrappers
than the vanilla Wrapper
class. Gym also provides you with specific wrappers that target specific elements of the environment, such as observations, rewards, and actions. Their use is demonstrated in the following section.
ObservationWrapper
: This helps us make changes to the observation using the observation
method of the wrapper class.RewardWrapper
: This helps us make changes to the reward using the reward
function of the wrapper class.ActionWrapper
: This helps us make changes to the action using the action
function of the wrapper class.Let us suppose that we have to make the follow changes to our environment:
Now we apply all of these wrappers to our environment in a single line of code to get a modified environment. Then, we verify that all of our intended changes have been applied to the environment.
In case you want to recover the original Env
after applying wrappers to it, you can use the unwrapped
attribute of an Env
class. While the Wrapper
class may look like just any other class that sub-classes from Env
, it does maintain a list of wrappers applied to the base Env
.
A lot of Deep RL algorithms (like Asynchronous Actor Critic Methods) use parallel threads, where each thread runs an instance of the environment to both speed up the training process and improve efficiency.
Now we will use another library, also by OpenAI, called baselines
. This library provides us with performant implementations of many standard Deep RL algorithms to compare any novel algorithm with. In addition to these implementations, baselines
also provides us with many other features that enable us to prepare our environments in accordance with the way they were used in OpenAI experiments.
One of these features includes wrappers which allow you to run multiple environments in parallel using a single function call. Before we begin, we first proceed with the installation of baselines by running the following commands in a terminal.
You may need to restart your Jupyter notebook for the installed package to be available.
The wrapper of interest here is called SubProcEnv
, which will run all the environments in an asynchronous method. We first create a list of function calls that return the environment we are running. In code, I have used a lambda
function to create an anonymous function that returns the gym environment.
This envs
now acts as a single environment where we can call the reset
and step
functions. However, these functions return an array of observations/actions now, rather than a single observation/action.
Calling the render
function on the vectorized envs
displays screenshots of the games in a tiled fashion.
The following screen plays out.
render
for the SubProcEnv
environment.
You can find more about Vectorized environments here.
That’s it for Part 1. Given the things we have covered in this part, you should be able to start training your reinforcement learning agents in environments available from OpenAI Gym. But what if the environment you want to train your agent in is not available anywhere? If that’s the case, you are in luck for a couple of reasons!
Firstly, OpenAI Gym offers you the flexibility to implement your own custom environments. Second, doing that is precisely what Part 2 of this series is going to be about. Till then, enjoy exploring the enterprising world of reinforcement learning using Open AI Gym!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!