GithubHelp home page GithubHelp logo

pyrddlgym-project / pyrddlgym Goto Github PK

View Code? Open in Web Editor NEW
58.0 7.0 15.0 50.96 MB

A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.

Home Page: https://pyrddlgym.readthedocs.io/

License: Other

Python 100.00%
gym-environments model-based rddl reinforcement-learning gym gymnasium planning simulator simulation

pyrddlgym's Introduction

pyRDDLGym

Warning

As of Feb 9, 2024, the pyRDDLGym API has been updated to version 2.0, and is no longer backwards compatible with the previous stable version 1.4.4. While we strongly recommend that you update to 2.0, in case you require the old API, you can install the last stable version with pip: pip install pyRDDLGym==1.4.4, or directly from github pip install git+https://github.com/pyrddlgym-project/pyRDDLGym@version_1.4.4_stable.

A Python toolkit for auto-generation of OpenAI Gym environments from Relational Dynamic Influence Diagram Language (RDDL) description files. This is currently the official parser, simulator and evaluation system for RDDL in Python, with new features and enhancements to the RDDL language.

Contents

Purpose and Benefits

Installation

We require Python 3.8+ and the following packages: ply, pillow>=9.2.0, numpy>=1.22, matplotlib>=3.5.0, gymnasium, pygame, termcolor. You can install our package, along with all of its prerequisites, using pip

pip install pyRDDLGym

Since pyRDDLGym does not come with any premade environments, you can either load RDDL documents from your local file system, or install rddlrepository for easy access to preexisting domains

pip install rddlrepository

Usage

Running the Example

pyRDDLGym comes with several run scripts as starting points for you to use in your own scripts. To simulate an environment, from the install directory of pyRDDLGym, type the following into a shell supporting the python command (you need rddlrepository):

python -m pyRDDLGym.examples.run_gym "Cartpole_Continuous_gym" "0" 1

which loads instance "0" of the CartPole control problem with continuous actions from rddlrepository and simulates it with a random policy for one episode.

Loading an Environment

Instantiation of an existing environment by name is as easy as:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0")

Loading your own domain files is just as straightforward

import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")

Both versions above instantiate env as an OpenAI gym environment, so that the usual reset() and step() calls work as intended.

You can also pass custom settings to the make command, i.e.:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0", enforce_action_constraints=True, ...)

Creating your Own Visualizer

You can design your own visualizer by subclassing from pyRDDLGym.core.visualizer.viz.BaseViz and overriding the render(state) method. Then, changing the visualizer of the environment is easy

viz_class = ...   # the class name of your custom viz
env.set_visualizer(viz_class)

Recording Movies

You can record an animated gif or movie of the agent interaction with an environment (described below). To do this, simply pass a MovieGenerator object to the set_visualizer method:

from pyRDDLGym.core.visualizer.movie import MovieGenerator
movie_gen = MovieGenerator("/path/where/to/save", "env_name")
env.set_visualizer(viz_class, movie_gen=movie_gen)

Interacting with an Environment

Agents map states to actions through the sample_action(obs) function, and can be used to interact with an environment. For example, to initialize a random agent:

from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)

All agent instances support one-line evaluation in a given environment:

stats = agent.evaluate(env, episodes=1, verbose=True, render=True)

which returns a dictionary of summary statistics (e.g. "mean", "std", etc...), and which also visualizes the domain in real time. Of course, if you wish, the standard OpenAI gym interaction is still available to you:

total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
    env.render()
    action = agent.sample_action(state)
    next_state, reward, terminated, truncated, _ = env.step(action)
    print(f'state = {state}, action = {action}, reward = {reward}')
    total_reward += reward
    state = next_state
    done = terminated or truncated
    if done:
        break
print(f'episode ended with reward {total_reward}')

# release all viz resources, and finish logging if used
env.close()

Note

All observations (for a POMDP), states (for an MDP) and actions are represented by dict objects, whose keys correspond to the appropriate fluents as defined in the RDDL description. Here, the syntax is pvar-name___o1__o2..., where pvar-name is the pvariable name, followed by 3 underscores, and object parameters o1, o2... are separated by 2 underscores.

Warning

There are two known issues not documented with RDDL:

  1. the minus (-) arithmetic operation must have spaces on both sides, otherwise there is ambiguity whether it refers to a mathematical operation or to variables
  2. aggregation-union-precedence parsing requires for encapsulating parentheses around aggregations, e.g., (sum_{}[]).

Status

A complete archive of past and present RDDL problems, including all IPPC problems, is also available to clone\pip

Software for related simulators:

The parser used in this project is based on the parser from Thiago Pbueno's pyrddl (used in rddlgym).

Citing pyRDDLGym

Please see our paper describing pyRDDLGym. If you found this useful, please consider citing us:

@article{taitler2022pyrddlgym,
      title={pyRDDLGym: From RDDL to Gym Environments},
      author={Taitler, Ayal and Gimelfarb, Michael and Gopalakrishnan, Sriram and Mladenov, Martin and Liu, Xiaotian and Sanner, Scott},
      journal={arXiv preprint arXiv:2211.05939},
      year={2022}}

License

This software is distributed under the MIT License.

Contributors

  • Ayal Taitler (University of Toronto, CA)
  • Michael Gimelfarb (University of Toronto, CA)
  • Jihwan Jeong (University of Toronto, CA)
  • Sriram Gopalakrishnan (Arizona State University/J.P. Morgan, USA)
  • Martin Mladenov (Google, BR)
  • Jack Liu (University of Toronto, CA)

pyrddlgym's People

Contributors

ataitler avatar buvnswrn avatar gmmdmdidems avatar haz avatar iliathesmirnov avatar jackliuto avatar jihwan-jeong avatar marirsg2 avatar mike-gimelfarb avatar mmladenov-google avatar pecey avatar ssanner avatar thomaskeller79 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyrddlgym's Issues

Segmentation fault

Hi

I was trying to run the demo, the demo terminated with the following output.

episode ended with reward -24845.0
Segmentation fault

I am wondering how to solve the "Segmentation fault" part? I am using wsl2 with Ubuntu 20.04, is that related?

Thank you.

Import Error

I'm new in the pyRDDLGym, I just installed the package with
pip install pyRDDLGym

However when I follows the Getting Started page of the official document with
from pyRDDLGym import ExampleManager

This error occur
ImportError: cannot import name 'ExampleManager' from 'pyRDDLGym' (/usr/local/lib/python3.8/dist-packages/pyRDDLGym/init.py)

RDDLSimServer data.json file truncated

With large instances (many objects and states) and correspondingly large data.json files, it very often happens that the data.json files are cut off, i.e. a part is missing and is not formatted correctly.

I cannot exactly identify the cause of the problem, but it is not due to the implementation. I can rule out that it is due to a lack of disk space, and it shouldn't be due to memory either, as a Docker container has no resource constraints by default. No error is thrown if a file is saved incorrectly formatted/truncated and it is also not possible to say that a file is no longer written correctly above a certain size. Some files with 13MB were written correctly and some whose correct size is 8MB were only written up to 2.4MB.

However, switching from json to orjson, a faster and more memory-efficient alternative, eliminates all problems.

Fix

def dump_data(self, fn):

with

import orjson

def dump_data(self, fn):
   """Dumps the data to a json file."""
   json_content = orjson.dumps(self.logs)
   with open(fn, mode="wb") as f:
      f.write(json_content)

Reservoir viz references invalid fluent

The reservoir viz references an object "res" that does not exist, instead of "id". I believe there is a corrected reservoir domain we discussed on another branch, so perhaps we should merge it into main?

action-preconditions hard enforcement

Currently action-preconditions are not enforced and present in the RDDL only for model-based planning usage.
A boolean flag for the environment should be added to specify the user desire to enforce the action-preconditions in case ho choses so. There might be users who might what this functionality.

Questions regarding changes to JaxRDDLCompiler - params to jax_expr

There has been some changes in JaxRDDLCompiler which has altered the lambda functions returned for CPF evaluation. In the file the logic is jax_cpfs[cpf] = self._jax(expr, info, dtype=dtype). Earlier expr was a lambda function that had two params - 1. dictionary of state, action and interim variables with their current values and 2. a PRNG key.

Now I think it is expecting three values. I couldn't find documentation of what the three params should be. Can someone please let me know where should I be looking or just explain what the three params that expr expects now?

Thank you.

HVAC - Instance 0 - Issues with dimension?

Hi,

We are getting a dimension issue with HVAC when we run instance 0, but not with instances with multiple objects. The other environments are running fine. The error message is as follows: (expr for temp-1)
image

Can someone please help?

Thanks!

Release 1.0.0 on PyPI potentially missing files

I was trying out pyRDDL Gym and installed it using pip. I tried to run the example code using README, but got an error while importing the package.
image

I verified the source files by downloading version 1.0.0 and 0.99.0 that while 0.99.0 contains a Debug folder inside Core, 1.0.0 doesn't, but it is being referenced to. Surprisingly, the folder exists in Github.

Can this be fixed on PyPI? Please let me know if I am missing something or you would like any more information.

ArithmeticError: Cannot evaluate arithmetic operation

When running PROST via the provided Docker, I get the following ArithmeticError from the RDDLSimAgent (domain.rddl and instance.rddl are appended):

raise ArithmeticError(
ArithmeticError: Cannot evaluate arithmetic operation / at [0 0 0 0] and [0 0 1 2].
( sum_{?c2: course} [ PREREQ(?c2, ?c) ^ passed(?c2) ] ) / ( sum_{?c2: course} [ PREREQ(?c2, ?c) ] )

The domain is a slightly modified version of the Academic Advising domain. Considering the error message, there must be an error in the cpfs part for the definition of passed'(?c):

           else if ([[sum_{?c2 : course} (PREREQ(?c2,?c) ^ passed(?c2))]
                                      / [sum_{?c2 : course} PREREQ(?c2,?c)]] == 1.0)
                 then Bernoulli( 0.95 )

The error message and the lists provided confuse me a little, because they lead me to believe that the summation does not work (e.g. missing brackets). I changed the part above to the following, which I think should lead to the same result and it works:

else if ((sum_{?c2 : course}[PREREQ(?c2,?c) ^ passed(?c2)]) == (sum_{?c2 : course}[PREREQ(?c2,?c)]))
                                then Bernoulli( 0.95 )

domain.rddl

//
// Academic Advising Domain
//
// Author:  Libby Ferland ([email protected])
//
// In this domain, a student may take courses at a given cost
// and passes the course with a probability determined by how
// many of the prerequisites they have successfully passed.
// A student also receives a penalty at each time step if they
// have not yet graduated from their program (i.e., completed
// all required courses).  We allow multiple courses to be
// taken in a semester in some instances.
//
// Modified for competition and translation purposes by Scott Sanner.
//
///////////////////////////////////////////////////////////////

domain academic_advising_prob_mdp {

  types {
    course : object;
  };

  pvariables {

    // Nonfluents: course prerequisites
    PREREQ(course, course) : { non-fluent, bool, default = false }; // First argument is a prereq of second argument

    // Nonfluents: course passing probabilities
    PRIOR_PROB_PASS_NO_PREREQ(course) : { non-fluent, real, default = 1.0 }; // Probability of passing a course with no prereqs
    PRIOR_PROB_PASS(course)           : { non-fluent, real, default = 0.2 }; // Probability of passing a course regardless of prereq status
    
    // Nonfluents: program requirements for graduation
    PROGRAM_REQUIREMENT(course) : { non-fluent, bool, default = false }; // Specifies whether course is program requirement 
    
    // Nonfluents: costs/penalties
    COURSE_COST(course)        : { non-fluent, real, default = -1 }; // Cost for taking a course
    COURSE_RETAKE_COST(course) : { non-fluent, real, default = -2 }; // Cost for re-taking a course (heavily discouraged)
    PROGRAM_INCOMPLETE_PENALTY : { non-fluent, real, default = -5 }; // Penalty at each time step for having an incomplete program

	// State
    passed(course) : { state-fluent, bool, default = false };
    taken(course)  : { state-fluent, bool, default = false };

	// Action
    takeCourse(course)   : { action-fluent, bool, default = false };
  };

  cpfs {

	// Determine whether each course was passed
	// Modification: differentiate courses with no prereqs since should be easier to pass such introductory courses
	// For courses with prereqs:
	//   if PRIOR_PROB_PASS=.2 and 0 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (0/4)) = Bernoulli(.2)
	//                             1 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (1/4)) = Bernoulli(.4)
	//                             3 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (3/4)) = Bernoulli(.8)
    passed'(?c) = 
    	if (takeCourse(?c) ^ ~passed(?c)) // If take a course and not already passed 
			then [ 	if (~exists_{?c2 : course} PREREQ(?c2,?c))
			       	then Bernoulli( PRIOR_PROB_PASS_NO_PREREQ(?c) )  
                                else if ([[sum_{?c2 : course} (PREREQ(?c2,?c) ^ passed(?c2))]
                                                   / [sum_{?c2 : course} PREREQ(?c2,?c)]] == 1.0)
                                then Bernoulli( 0.95 )
                                else
				     Bernoulli( 0.05 ) 
			]
			else
				passed(?c); // Value persists if course not taken or already passed
	
	taken'(?c) = taken(?c) | takeCourse(?c);

  };

  // A student is assessed a cost for taking each course and a penalty for not completing their program   
  reward = 
 	  [sum_{?c : course} [COURSE_COST(?c) * (takeCourse(?c) ^ ~taken(?c))]]
 	+ [sum_{?c : course} [COURSE_RETAKE_COST(?c) * (takeCourse(?c) ^ taken(?c))]]
 	+ [PROGRAM_INCOMPLETE_PENALTY * ~[forall_{?c : course} (PROGRAM_REQUIREMENT(?c) => passed(?c))]];

}     

instance.rddl

non-fluents nf_academic_advising_prob_inst_mdp__0 {
	domain = academic_advising_prob_mdp; 
	objects{
		course : {CS11, CS12, CS21, CS22 };
	};

	non-fluents {
		PREREQ(CS12,CS21);
		PREREQ(CS11,CS22);
		PREREQ(CS12,CS22);
		PROGRAM_REQUIREMENT(CS22);
	};
}

instance academic_advising_prob_inst_mdp__0 {
	domain = academic_advising_prob_mdp;
	non-fluents = nf_academic_advising_prob_inst_mdp__0;
	max-nondef-actions = 1;
	horizon = 40;
	discount = 1.0;
}

Edit: I have corrected the error message to the above specified instance.

Does aggregation support enum types?

Does RDDL support aggregations such as sum_{?q : enum_values} [ do something with ?q ]? What is a typical use case for this?
Currently there is a design choice to be made whether enum types should be handled the same as objects. Currently there are two separate dictionaries for storing enum vs objects, and only the object dict is checked for valid type during simulation.

Movie automation

We should add a built in functionality in the RDDLEnv for creating movies of episodes for the rendered images.

RecSim Visualization breaks sometimes

The RecSim visualization breaks sometimes. The stacktrace is as follows:

Traceback (most recent call last):
  File "main.py", line 228, in <module>
    main(args)
  File "main.py", line 206, in main
    run(cfg, args.n_episodes, key, env, cfg_env, g_obs_keys, ga_keys, args.render)
  File "main.py", line 83, in run
    env.render()
  File "/nfs/nfs2/home/palchatt/Research/IPC23/pyRDDLGymHelper/Core/Env/RDDLEnv.py", line 253, in render
    image = self._visualizer.render(self.state)
  File "/nfs/nfs2/home/palchatt/Research/IPC23/pyRDDLGymHelper/Visualizer/RecSimViz.py", line 138, in render
    self._user_plot = self._ax_scatter.scatter(
  File "/nfs/nfs9/home/nobackup/palchatt/ipc23/lib/python3.8/site-packages/matplotlib/__init__.py", line 1442, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/nfs/nfs9/home/nobackup/palchatt/ipc23/lib/python3.8/site-packages/matplotlib/axes/_axes.py", line 4673, in scatter
    collection = mcoll.PathCollection(
  File "/nfs/nfs9/home/nobackup/palchatt/ipc23/lib/python3.8/site-packages/matplotlib/collections.py", line 996, in __init__
    self.set_sizes(sizes)
  File "/nfs/nfs9/home/nobackup/palchatt/ipc23/lib/python3.8/site-packages/matplotlib/collections.py", line 963, in set_sizes
    scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
FloatingPointError: invalid value encountered in sqrt

My guess is that maybe some of the state values are becoming too small. Maybe rounding off the state values to a certain precision?

Bitrot in XADD framework

Running pylint, I get the following errors in the XADD code files:

************* Module pyRDDLGym.XADD.RDDLLevelAnalysisXADD
pyRDDLGym\XADD\RDDLLevelAnalysisXADD.py:28:20: E1101: Instance of 'RDDLLevelAnalysisWXADD' has no '_check_is_fluent' member (no-member)
pyRDDLGym\XADD\RDDLLevelAnalysisXADD.py:40:8: E1101: Instance of 'RDDLLevelAnalysisWXADD' has no '_check_deps_by_fluent_type' member (no-member)
pyRDDLGym\XADD\RDDLLevelAnalysisXADD.py:55:12: E1101: Instance of 'RDDLLevelAnalysisWXADD' has no '_check_deps_by_fluent_type' member (no-member)

************* Module pyRDDLGym.XADD.RDDLModelXADD
pyRDDLGym\XADD\RDDLModelXADD.py:246:31: E1101: Instance of 'RDDLModelWXADD' has no '_aggr_to_scope' member (no-member)
pyRDDLGym\XADD\RDDLModelXADD.py:260:8: E1101: Instance of 'RDDLModelWXADD' has no '_aggr_to_scope' member (no-member)

************* Module pyRDDLGym.XADD.RDDLSimulatorXADD
pyRDDLGym\XADD\RDDLSimulatorXADD.py:4:0: E0611: No name 'RDDLSimulatorWConstraints' in module 'pyRDDLGym.Core.Simulator.RDDLSimulator' (no-name-in-module)

Some of them suggest that the XADD framework needs to be updated to the latest API. For example the constraints parsing are moved to a separate class RDDLConstraints in the env folder.

Jihwan, when you get a chance, can you look into this?

Mountain Car not working

I tried running the following code, which was solving mountaincar like a month back, but now something seems to be broken and mountain car doesn't reach its goal position.

python3 JaxExample.py MountainCar 1 60 1 False

I would appreciate any help from your side.

Integration of interval analysis with constraints parsing (Difficulty: DIFFICULT)

Currently, we have interval analysis that infers bounds on fluents, i.e. state in [100, 500], reward in [-1.0, 5.0]...
We also have a constraints parser that parses inequality constraints on action/state fluents, i.e. action-fluent(?x) >= non-fluent(?x) + 100..
However, we do not currently have a way to integrate them. For example, if bounds on state fluents are not stated as constraints, but clipped in the cpfs? How can they be integrated into the constraints parser to get tighter gym bounds for RDDLEnv?

Run error for the demo code on the tutorials

I was trying to run the demo code but ran into the following error:

agent = RandomAgent(action_space=myEnv.action_space, num_actions=myEnv.NumConcurrentActions)

AttributeError: 'RDDLEnv' object has no attribute 'NumConcurrentActions'

Could someone provide some help?

I am attaching the demo code below for reference.

from pyRDDLGym import RDDLEnv
from pyRDDLGym import ExampleManager
from pyRDDLGym.Policies.Agents import RandomAgent

ENV = 'Wildfire'

get the environment infos

EnvInfo = ExampleManager.GetEnvInfo(ENV)

set up the environment class, choose instance 0 because every example has at least one example instance

myEnv = RDDLEnv.RDDLEnv(domain=EnvInfo.get_domain(), instance=EnvInfo.get_instance(0))

set up the environment visualizer

myEnv.set_visualizer(EnvInfo.get_visualizer())

set up an example aget

agent = RandomAgent(action_space=myEnv.action_space, num_actions=myEnv.NumConcurrentActions)

total_reward = 0
state = myEnv.reset()

for step in range(myEnv.horizon):
myEnv.render()
action = agent.sample_action()
next_state, reward, done, info = myEnv.step(action)
total_reward += reward
state = next_state
if done:
break

print("episode ended with reward {}".format(total_reward))
myEnv.close()

Additional features

There are a couple things that I’d say might still be worthwhile to implement because they give flexibility at potentially little development cost:

  1. Support for categorical distribution - if I recall we don’t have this because it requires enum. However we could hypothetically implement this without enum through the use of pvariable like so

sample = Categorical[p] where p is a parameterized real type eg p(?atom)

this is just syntactic sugar for something that could be done via Bernoulli but will be quite hard to specify.

  1. Support for state action constraints - we talked about this but I think there could be cases where we want constraints to be function of both state and action. Currently doing this with either invariant or precondition will trigger a crash. It wouldn’t be supported by the gym most likely so perhaps this is not as important for the time being.

What do you think?

Prost integration

Is it possible to use Prost as RDDLSimAgent? Is there any documentation on how to do this?

Mountain Car Reward Fn

The current definition of mountain car uses pos' and vel' while the termination criteria uses pos and vel. Should the reward function should also be using pos and vel rather than pos' and vel' to keep things in sync?

Fix function argument type

Hi,
The function get_instance(self, num: int) in the ExampleManager class assumes an int argument to load specific instance files. As some domains in the examples have instance files including chars, like instance1c.rddl it is not possible to use the function above. Changing the argument type to str will fix the issue.

Support for gymnasium

We talked about this, but I am just putting it here together with a proposal how to support it. Both rllib and stable-baselines require gymnasium, so we may want to switch. Their API is largely the same, so I propose the following fix in the RDDLEnv imports:

try:
    import gymnasium as gym
    gymnasium = True
except Exception:
    import gym
    gymnasium = False

Then, since gymnasium uses the "new" openai gym api interface, we need to set new_gym_api = True and the rest should take care of itself.

This way, if the user installs pyRDDLGym-rl, the gymnasium will be automatically installed. If not, then pyRDDLGym requires gym, so it will default to that. For the full details how we can modify the training loop, see: https://gymnasium.farama.org/content/migration-guide/

Do we need the noop_values field in RandomAgent, do we need seed in agent.evaluate()?

Discussed in https://github.com/orgs/pyrddlgym-project/discussions/251

Originally posted by GMMDMDIDEMS February 23, 2024
I try to achieve reproducible results with the RandomAgent. As I understand it, I should get the same results by specifying a seed, e.g.:

agent = RandomAgent(
    action_space=env.action_space, num_actions=env.max_allowed_actions, seed=42
)

However, if I change the number of episodes I get different different results. Shouldn't the results be identical in every episode?
What is the noop_values argument in the RandomAgent used for?

What is the purpose of the env seed that can be assigned in the def evaluate(...) method?

About derived fluents in existing RDDL domains

Pull request #71 will break some RDDL domains due to dependency checks of derived fluents. In future domain coding, please make sure derived CPF does not depend on action fluent, or dep analysis will complain.

JAXPlanner: discrete control problems

Is it possible to use the JAXPlanner for discrete control problems?

When applying the introductory JAX tutorial to the SupplyChain problem, the empty dict is (almost) always output as the action during evaluation:

step       = 15
state      = {'demand-old___w1': 20, 'demand-old___w2': 15, 'demand-old___w3': 5, 'demand-new___w1': 18, 'demand-new___w2': 11, 'demand-new___w3': 1, 'epoch': 15, 'stock-factory': 0, 'stock-warehouse___w1': 0, 'stock-warehouse___w2': 0, 'stock-warehouse___w3': 0}
action     = {}
next state = {'demand-old___w1': 18, 'demand-old___w2': 11, 'demand-old___w3': 1, 'demand-new___w1': 15, 'demand-new___w2': 5, 'demand-new___w3': 0, 'epoch': 16, 'stock-factory': 0, 'stock-warehouse___w1': 0, 'stock-warehouse___w2': 0, 'stock-warehouse___w3': 0}
reward     = 20.0
step       = 16
state      = {'demand-old___w1': 18, 'demand-old___w2': 11, 'demand-old___w3': 1, 'demand-new___w1': 15, 'demand-new___w2': 5, 'demand-new___w3': 0, 'epoch': 16, 'stock-factory': 0, 'stock-warehouse___w1': 0, 'stock-warehouse___w2': 0, 'stock-warehouse___w3': 0}
action     = {'ship___w1': 1}
next state = {'demand-old___w1': 15, 'demand-old___w2': 5, 'demand-old___w3': 0, 'demand-new___w1': 11, 'demand-new___w2': 1, 'demand-new___w3': 1, 'epoch': 17, 'stock-factory': 0, 'stock-warehouse___w1': 0, 'stock-warehouse___w2': 0, 'stock-warehouse___w3': 0}
reward     = 12.94

However, the action produce is never part of the actions during all steps.

Improve branched exception handling in vectorized ops

Currently, a branched expression such as

cpf(?x) = if(pvar(?x) == 0) then default(?x) else 1.0 / pvar(?x);

will raise a divide-by-zero exception such the 1.0 / pvar(?x) must be evaluated for all ?x regardless of the branch condition.
This is an unfortunate consequence of vectorized sampling. A current workaround is to call numpy.seterr(all='ignore') before simulation, but it is not a true fix.

We should try to fix this generally.

Bug in Reservoir visualization

If the alpha value here becomes greater than 1, then reservoir visualization breaks. Here is the stacktrace.

 File "/geode2/home/u070/palchatt/BigRed3/IPC23/main.py", line 71, in run
    obs = env.reset()
  File "/geode2/home/u070/palchatt/BigRed3/IPC23/pyRDDLGymHelper/Core/Env/RDDLEnv.py", line 228, in reset
    image = self._visualizer.render(self.state)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23/pyRDDLGymHelper/Visualizer/ReservoirViz.py", line 283, in render
    fig, ax = self.render_res(curr_t)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23/pyRDDLGymHelper/Visualizer/ReservoirViz.py", line 223, in render_res
    shape_rect = plt.Rectangle((init_x, init_y + interval), interval, interval / 4, fc='darkcyan',
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
    return func(*args, **kwargs)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/patches.py", line 714, in __init__
    super().__init__(**kwargs)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
    return func(*args, **kwargs)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/patches.py", line 100, in __init__
    self._internal_update(kwargs)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/artist.py", line 1223, in _internal_update
    return self._update_props(
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/artist.py", line 1199, in _update_props
    ret.append(func(v))
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/patches.py", line 379, in set_alpha
    super().set_alpha(alpha)
  File "/geode2/home/u070/palchatt/BigRed3/IPC23-venv/lib/python3.9/site-packages/matplotlib/artist.py", line 1020, in set_alpha
    raise ValueError(f'alpha ({alpha}) is outside 0-1 range')

Clipping the alpha values in the 0-1 range should fix the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.