[RL] Create Environment 만들 때 참고

2020. 9. 6. 13:13관심있는 주제/RL

Create Environment 참고

 

 

gym 공식 문서


https://github.com/openai/gym/blob/master/docs/creating-environments.md

 

openai/gym

A toolkit for developing and comparing reinforcement learning algorithms. - openai/gym

github.com

https://stackoverflow.com/questions/45068568/how-to-create-a-new-gym-environment-in-openai

 

How to create a new gym environment in OpenAI?

I have an assignment to make an AI Agent that will learn play a video game using ML. I want to create a new environment using OpenAI Gym because I don't want to use an existing environment. How can I

stackoverflow.com

gym-foo/
  README.md
  setup.py
  gym_foo/
    __init__.py
    envs/
      __init__.py
      foo_env.py
      foo_extrahard_env.py
#############################      

class FooEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        pass

    def _step(self, action):
        """

        Parameters
        ----------
        action :

        Returns
        -------
        ob, reward, episode_over, info : tuple
            ob (object) :
                an environment-specific object representing your observation of
                the environment.
            reward (float) :
                amount of reward achieved by the previous action. The scale
                varies between environments, but the goal is always to increase
                your total reward.
            episode_over (bool) :
                whether it's time to reset the environment again. Most (but not
                all) tasks are divided up into well-defined episodes, and done
                being True indicates the episode has terminated. (For example,
                perhaps the pole tipped too far, or you lost your last life.)
            info (dict) :
                 diagnostic information useful for debugging. It can sometimes
                 be useful for learning (for example, it might contain the raw
                 probabilities behind the environment's last state change).
                 However, official evaluations of your agent are not allowed to
                 use this for learning.
        """
        self._take_action(action)
        self.status = self.env.step()
        reward = self._get_reward()
        ob = self.env.getState()
        episode_over = self.status != hfo_py.IN_GAME
        return ob, reward, episode_over, {}

    def _reset(self):
        pass

    def _render(self, mode='human', close=False):
        pass

    def _take_action(self, action):
        pass

    def _get_reward(self):
        """ Reward is given for XY. """
        if self.status == FOOBAR:
            return 1
        elif self.status == ABC:
            return self.somestate ** 2
        else:
            return 0

 

gym 환경 예제들

https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym

 

좋은 예제라고 함. 

https://github.com/openai/gym-soccer/blob/master/gym_soccer/envs/soccer_env.py

 

 


 

stable-baselines에서 정의한 가상 환경

https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html

 

Using Custom Environments — Stable Baselines 2.10.2a0 documentation

To use the rl baselines with custom environments, they just need to follow the gym interface. That is to say, your environment must implement the following methods (and inherits from OpenAI Gym Class): Note If you are using images as input, the input value

stable-baselines.readthedocs.io

import gym
from gym import spaces
class CustomEnv(gym.Env):
  """Custom Environment that follows gym interface"""
  metadata = {'render.modes': ['human']}

  def __init__(self, arg1, arg2, ...):
    super(CustomEnv, self).__init__()
    # Define action and observation space
    # They must be gym.spaces objects
    # Example when using discrete actions:
    self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)
    # Example for using image as input:
    self.observation_space = spaces.Box(low=0, high=255,
                                        shape=(HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8)

  def step(self, action):
    ...
    return observation, reward, done, info
  def reset(self):
    ...
    return observation  # reward, done, info can't be included
  def render(self, mode='human'):
    ...
  def close (self):
    ...
  

 

rllib에서 정의한 환경

https://docs.ray.io/en/latest/rllib-env.html

import gym, ray
from ray.rllib.agents import ppo

class MyEnv(gym.Env):
    def __init__(self, env_config):
        self.action_space = <gym.Space>
        self.observation_space = <gym.Space>
    def reset(self):
        return <obs>
    def step(self, action):
        return <obs>, <reward: float>, <done: bool>, <info: dict>

거의 비슷한 것을 알 수 있음.

 

https://www.quora.com/How-can-one-create-their-own-reinforcement-learning-environment-in-Python

 

How can one create their own reinforcement learning environment in Python?

Answer (1 of 3): OpenAI Gym is your starting point. Reinforcement algorithms implementation libraries like stable-baselines or keras-rl work with OpenAI Gym out of the box. But even for implementing an algorithm by hand it can be used. You must implement d

www.quora.com

import gym 
from gym import error, spaces, utils 
from gym.utils import seeding 
 
class FooEnv(gym.Env): 
  metadata = {'render.modes': ['human']} 
 
  def __init__(self): 
    ... 
  def step(self, action): 
    ... 
  def reset(self): 
    ... 
  def render(self, mode='human'): 
    ... 
  def close(self): 
    ... 

 

Environment의 인풋 아웃풋 확인

import gym
env = gym.make("CartPole-v1")
observation = env.reset()
for _ in range(1000):
    env.render()
action = env.action_space.sample()  # your agent here (this takes random actions)
observation, reward, done, info = env.step(action)
if done:
    observation = env.reset()
env.close()

 

 


rllib custon env 만들 때 참고하기

 

docs.ray.io/en/latest/rllib-env.html

import gym
from gym.utils import seeding


class Example_v0 (gym.Env):
    # possible actions
    MOVE_LF = 0
    MOVE_RT = 1

    # possible positions
    LF_MIN = 1
    RT_MAX = 10

    # land on the GOAL position within MAX_STEPS steps
    MAX_STEPS = 10

    # possible rewards
    REWARD_AWAY = -2
    REWARD_STEP = -1
    REWARD_GOAL = MAX_STEPS

    metadata = {
        "render.modes": ["human"]
        }


    def __init__ (self):
        # the action space ranges [0, 1] where:
        #  `0` move left
        #  `1` move right
        self.action_space = gym.spaces.Discrete(2)

        # NB: Ray throws exceptions for any `0` value Discrete
        # observations so we'll make position a 1's based value
        self.observation_space = gym.spaces.Discrete(self.RT_MAX + 1)

        # possible positions to chose on `reset()`
        self.goal = int((self.LF_MIN + self.RT_MAX - 1) / 2)

        self.init_positions = list(range(self.LF_MIN, self.RT_MAX))
        self.init_positions.remove(self.goal)

        # NB: change to guarantee the sequence of pseudorandom numbers
        # (e.g., for debugging)
        self.seed()

        self.reset()


    def reset (self):
        """
        Reset the state of the environment and returns an initial observation.
        Returns
        -------
        observation (object): the initial observation of the space.
        """
        self.position = self.np_random.choice(self.init_positions)
        self.count = 0

        # for this environment, state is simply the position
        self.state = self.position
        self.reward = 0
        self.done = False
        self.info = {}

        return self.state


    def step (self, action):
        """
        The agent takes a step in the environment.
        Parameters
        ----------
        action : Discrete
        Returns
        -------
        observation, reward, done, info : tuple
            observation (object) :
                an environment-specific object representing your observation of
                the environment.
            reward (float) :
                amount of reward achieved by the previous action. The scale
                varies between environments, but the goal is always to increase
                your total reward.
            done (bool) :
                whether it's time to reset the environment again. Most (but not
                all) tasks are divided up into well-defined episodes, and done
                being True indicates the episode has terminated. (For example,
                perhaps the pole tipped too far, or you lost your last life.)
            info (dict) :
                 diagnostic information useful for debugging. It can sometimes
                 be useful for learning (for example, it might contain the raw
                 probabilities behind the environment's last state change).
                 However, official evaluations of your agent are not allowed to
                 use this for learning.
        """
        if self.done:
            # code should never reach this point
            print("EPISODE DONE!!!")

        elif self.count == self.MAX_STEPS:
            self.done = True;

        else:
            assert self.action_space.contains(action)
            self.count += 1

            if action == self.MOVE_LF:
                if self.position == self.LF_MIN:
                    # invalid
                    self.reward = self.REWARD_AWAY
                else:
                    self.position -= 1

                    if self.position == self.goal:
                        # on goal now
                        self.reward = self.REWARD_GOAL
                        self.done = 1
                    elif self.position < self.goal:
                        # moving away from goal
                        self.reward = self.REWARD_AWAY
                    else:
                        # moving toward goal
                        self.reward = self.REWARD_STEP

            elif action == self.MOVE_RT:
                if self.position == self.RT_MAX:
                    # invalid
                    self.reward = self.REWARD_AWAY
                else:
                    self.position += 1

                    if self.position == self.goal:
                        # on goal now
                        self.reward = self.REWARD_GOAL
                        self.done = 1
                    elif self.position > self.goal:
                        # moving away from goal
                        self.reward = self.REWARD_AWAY
                    else:
                        # moving toward goal
                        self.reward = self.REWARD_STEP

            self.state = self.position
            self.info["dist"] = self.goal - self.position

        try:
            assert self.observation_space.contains(self.state)
        except AssertionError:
            print("INVALID STATE", self.state)

        return [self.state, self.reward, self.done, self.info]


    def render (self, mode="human"):
        """Renders the environment.
        The set of supported modes varies per environment. (And some
        environments do not support rendering at all.) By convention,
        if mode is:
        - human: render to the current display or terminal and
          return nothing. Usually for human consumption.
        - rgb_array: Return an numpy.ndarray with shape (x, y, 3),
          representing RGB values for an x-by-y pixel image, suitable
          for turning into a video.
        - ansi: Return a string (str) or StringIO.StringIO containing a
          terminal-style text representation. The text can include newlines
          and ANSI escape sequences (e.g. for colors).
        Note:
            Make sure that your class's metadata 'render.modes' key includes
              the list of supported modes. It's recommended to call super()
              in implementations to use the functionality of this method.
        Args:
            mode (str): the mode to render with
        """
        s = "position: {:2d}  reward: {:2d}  info: {}"
        print(s.format(self.state, self.reward, self.info))


    def seed (self, seed=None):
        """Sets the seed for this env's random number generator(s).
        Note:
            Some environments use multiple pseudorandom number generators.
            We want to capture all such seeds used in order to ensure that
            there aren't accidental correlations between multiple generators.
        Returns:
            list<bigint>: Returns the list of seeds used in this env's random
              number generators. The first value in the list should be the
              "main" seed, or the value which a reproducer should pass to
              'seed'. Often, the main seed equals the provided 'seed', but
              this won't be true if seed=None, for example.
        """
        self.np_random, seed = seeding.np_random(seed)
        return [seed]


    def close (self):
        """Override close in your subclass to perform any necessary cleanup.
        Environments will automatically close() themselves when
        garbage collected or when the program exits.
        """
        pass

 

github.com/ray-project/ray/blob/master/rllib/examples/custom_env.py

www.datahubbs.com/building-custom-gym-environments-for-rl/

rllib 참고

def env_creator(env_name):
    if env_name == 'MyEnv-v0':
        from custom_gym.envs.custom_env import CustomEnv0 as env
    elif env_name == 'MyEnv-v1':
        from custom_gym.envs.custom_env import CustomEnv1 as env
    else:
        raise NotImplementedError
    return env
    
env_name = 'MyEnv-v0'
config = {
    # Whatever config settings you'd like...
    }
trainer = agents.ppo.PPOTrainer(
    env=env_creator(env_name), 
    config=config)
max_training_episodes = 10000
while True:
    results = trainer.train()
    # Enter whatever stopping criterion you like
    if results['episodes_total'] >= max_training_episodes:
        break
print('Mean Rewards:\t{:.1f}'.format(results['episode_reward_mean']))

 

R에서는 비슷하게 함수로 제공한다. ㄷㄷㄷ

https://rdrr.io/github/markdumke/reinforcelearn/man/makeEnvironment.html

 

makeEnvironment: Create reinforcement learning environment. in markdumke/reinforcelearn: Reinforcement Learning

This function creates an environment for reinforcement learning.

rdrr.io

(유튜브에는 여러 다른 강화학습 알고리즘의 코드를 보여주는 듯함)

https://www.youtube.com/watch?v=vmrqpHldAQ0&list=PL-9x0_FO_lgmP3TtVCD4X1U9oSalSuI1o&index=31&ab_channel=MachineLearningwithPhil

https://www.neuralnet.ai/designing-your-own-open-ai-gym-compatible-reinforcement-learning-environment/

 

 

https://towardsdatascience.com/create-your-own-reinforcement-learning-environment-beb12f4151ef

 

Create Your Own Reinforcement Learning Environment

We are not just going to solve another reinforcement learning environment but going to create one from scratch.

towardsdatascience.com

https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e

 

Creating a Custom OpenAI Gym Environment for Stock Trading

OpenAI’s gym is an awesome package that allows you to create custom reinforcement learning agents. It comes with quite a few pre-built…

towardsdatascience.com

https://github.com/sungreong/gym-soccer/blob/master/gym_soccer/envs/soccer_env.py

 

sungreong/gym-soccer

Contribute to sungreong/gym-soccer development by creating an account on GitHub.

github.com

https://towardsdatascience.com/reinforcement-learning-from-scratch-designing-and-solving-a-task-all-within-a-python-notebook-48c40021da4

 

728x90