facebookresearch / denoised_mdp Goto Github PK

Open source code for paper "Denoised MDPs: Learning World Models Better Than the World Itself"

License: Other

Python 100.00%

denoised_mdp's Introduction

Denoised MDPs: Learning World Models Better Than The World Itself

Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian

We provide a PyTorch implementation of Denoised MDPs: Learning World Models Better Than The World Itself, published in ICML 2022.

(We also provide a PyTorch implementation of Dreamer that is carefully written and verified to reproduce results. See here for usages.)

The raw real world is noisy. How can reinforcement learning agent successfully learn with such raw data, where signals can be strongly entangled with noises? Denoised MDP characterizes information into four distinct types, based on controllability and relation with rewards, and proposes to extract a state representation space containing only information both controllable and reward-relevant. Under this view, several prior works can be seen as insufficiently removing noisy information.

To properly extract only the useful signal, Denoised MDP considers novel factorized MDP transition structures, where signal representation and noise representation are separated into distinct latent spaces. The state abstraction (i.e., representation learning) problem is turned into a regularized model fitting problem: fitting the factorized forward model to collected trajectories, while requiring the signal latents to be minimally informative of the raw observations.

The resulting variational formulation (derivation in paper) successfully disentangles a variety of noise types (and also noiseless settings), outperforming baseline methods that often can only do well for certain particular noise types.

Visualizations

For environments with distinct types of noises, we visualize latent factorization idenfitied by Denoised MDP, and other baseline methods. Only Denoised MDP successfully disentangle signal from noises across all environments.

Task: Press green button to shift TV hue to green (a RoboDesk variant).
True Signal: Robot joint position, TV green-ness, Green light on desk.
True Noise: Lighting, Camera, TV content, Imperfect sensor.

robodesk_tv_green_hue_noisy.mp4

Task: Make reacher robot touch the target red object.
True Signal: Robot joint position, Target location.
True Noise: Background.

dmc_reacher_easy_video_background.mp4
Task: Make the walker robot move forward when sensor readings are noisily affected by background images.
True Signal: Robot joint position.
True Noise: Background, Imperfect sensor.

dmc_walker_walk_video_background_noisy_sensor.mp4
Task: Move half cheetah robot forward when camera is shaky.
True Signal: Robot joint position.
True Noise: Background, Camera.

dmc_cheetah_run_video_background_camera_jitter.mp4

Requirements

The code has been tested on

CUDA 11 with NVIDIA RTX Titan, NVIDIA 2080Ti, and NVIDIA Titan XP,
mujoco=2.2.0 with egl renderer.

Software dependencices (also in requirements.txt):

torch>=1.9.0
tqdm
numpy>=1.17.0
PIL
tensorboardX>=2.5
attrs>=21.4.0
hydra-core==1.2.0
omegaconf==2.2.1
mujoco
dm_control

Environments

The code supports the following environments:

`kind`	`spec`	Description
`robodesk`	`${TASK_NAME}` or `${TASK_NAME}_noisy` (e.g., `tv_green_hue_noisy`)	RoboDesk environment (`96x96` resolution) with a diverse set of distractors (when using `${TASK_NAME}_noisy` variant). The distractors are implemented and descriped in details at this RoboDesk fork.
`dmc`	`${DOMAIN_NAME}_${TASK_NAME}_${VARIANT}` with `VARIANT` being one of `[noiseless, video_background, video_background_noisy_sensor, video_background_camera_jitter]` (e.g., `cheetah_run_video_background_camera_jitter`)	DeepMind Control (DMC) environment (`64x64` resolution) with four possible variants, representing different types of noises.

In the paper, we used the following 13 environments, all run with 1000 max episode length and action repeat of 2:

`kind`	`spec`
`robodesk`	`tv_green_hue_noisy`
`dmc`	`${DOMAIN_NAME}_${TASK_NAME} in [cheetah_run, walker_walk, reacher_easy]` each with all 4 `VARIANT` options

NOTE: All noisy environments require the driving_car class of the Kinetics-400 training dataset. Some instructions for downloading the dataset can be found here. After downloading, you may either place it under ~kinetics/070618/400 (so that the videos are ~kinetics/070618/400/train/driving_car/*.mp4) or specify KINETICS_DIR environment variatble (so that the videos are ${KINETICS_DIR}/train/driving_car/*.mp4).

Training and Evaluation

env CUDA_VISIBLE_DEVICES=0 \              # GPU ID for training
    EGL_DEVICE_ID=0 \                     # GPU ID for rendering
    KINETICS_DIR=/path/to/kinectics/ \    # Videos for noisy env
    python main.py \
        env.kind=robodesk \               # Env kind
        env.spec=tv_green_hue_noisy \     # Env spec
        learning.model_learning.kl.alpha=2 \       # alpha, weight of the KL terms
        learning.model_learning.kl.beta_y=0.125 \  # beta, smaller => stronger regularization
        learning.model_learning.kl.beta_z=0.125 \  # beta, smaller => stronger regularization
        seed=12 \                         # Seed
        output_folder=subdir/for/output/  # [Optional] subdirectory under `./results`
                                          # for storing outputs. If not given, a folder name will
                                          # be automatically constructed with information from
                                          # given config

Hyperparameter choice (see also Appendix A.2 for more details):

alpha parameter is selected proportional to the size of observation. For DMC (64x64x3 observations), we use alpha=1. For RoboDesk (96x96x3 observations), we use alpha=2.
beta parameter (for y and z component) controls the regularization strength, and should be set in (0, 1). Noisier environments benefit from a smaller value.

Default behaviors:

Train the Figure 2b Denoised MDP variant over 10^6 environment steps, with 5000 steps prefilling the replay buffer, and then training for 100 iterations for every 100 steps.
Optimize policy bybackpropagation through dynamics (Dreamer-style). One can switch to Soft Actor-Critic via specifying learning/policy_learning=sac (note the / rather than .).
Evaluate for 10 episodes every 10000 steps.
Visualize for 3 episodes (both full reconstruction and with noise latent fixed) every 20000 steps.

We use Hydra to handle argument specification. You can use Hydra's overriding syntax to specify all sorts of config options. See config.py for the complete set of options. Additionally, one can also check the config.yaml file generated in output directory for all options.

Figure 2c Variant with `x`, `y` and `z` Latents

To use the Figure 2c variant with three sets of latents x, y, and z, set

learning.model.transition.z.belief_size={NONZERO_Z_BELIEF_SIZE} \
learning.model.transition.z.state_size={NONZERO_Z_STATE_SIZE} \

for some non-zero belief and state sizes for the z latent component.

In the paper Appendix B.5, a variant similar to Figure 2c is compared, with z prior not depending on y. To reproduce those results on DeepMind Control Suite environments, set

learning.model.transition.x.belief_size=120 \
learning.model.transition.x.state_size=20 \
learning.model.transition.y.belief_size=70 \
learning.model.transition.y.state_size=10 \
learning.model.transition.z.belief_size=70 \
learning.model.transition.z.state_size=10 \
learning.model.transition.z_prior_uses_y=False \

Reproducing `Dreamer`

When y and z latent spaces are completely turned off (i.e., empty), the code essentially is Dreamer. This can be done by setting

learning.model.transition.x.belief_size=200 \  # give `x` the dimensionality specified in Dreamer paper
learning.model.transition.x.state_size=30 \
learning.model.transition.y.belief_size=0 \
learning.model.transition.y.state_size=0 \
learning.model.transition.z.belief_size=0 \
learning.model.transition.z.state_size=0 \

Code Structure

To facilitate easier parsing and usage of this repository, we provide a detailed note on how our code is structured here.

Pre-emption

Upon receiving SIGUSR1, the provided code starts writing all necessary states (including replay buffer) into a folder under the output directory (usually taking up to 10 minutes), and exits naturally afterwards. When the code is run with the same output directory, it continues from that state (and deletes the saved state). This may be particularly useful if you are running on a shared cluster.

Citation

Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian. "Denoised MDPs: Learning World Models Better Than The World Itself" International Conference on Machine Learning. 2022.

@inproceedings{wang2022denoisedmdps,
  title={Denoised MDPs: Learning World Models Better Than The World Itself},
  author={Wang, Tongzhou and Du, Simon S. and Torralba, Antonio and Isola, Phillip and Zhang, Amy and Tian, Yuandong},
  booktitle={International Conference on Machine Learning},
  organization={PMLR},
  year={2022}
}

If you find the RoboDesk distractor options (see this repository for more options and details) useful for your research, please also cite the following:

Click to show RoboDesk distractor bibtex!

@misc{wang2022robodeskdistractor,
  author = {Tongzhou Wang},
  title = {RoboDesk with A Diverse Set of Distractors},
  year = {2022},
  howpublished = {\url{https://github.com/SsnL/robodesk}},
}

@misc{kannan2021robodesk,
  author = {Harini Kannan and Danijar Hafner and Chelsea Finn and Dumitru Erhan},
  title = {RoboDesk: A Multi-Task Reinforcement Learning Benchmark},
  year = {2021},
  howpublished = {\url{https://github.com/google-research/robodesk}},
}

Questions

For questions about the code provided in this repository, please open an GitHub issue.

For questions about the paper, please contact Tongzhou Wang (tongzhou _AT_ mit _DOT_ edu).

License

This repo is under CC BY-NC 4.0. Please check LICENSE file.

denoised_mdp's People

Contributors

Stargazers

Watchers

Forkers

ssnl hazho jbaruz tmats alexliuyuren sebimarkgraf zzhang1987 franktiantt paperwave shism2

denoised_mdp's Issues

Issue with fix for z node in MDP

See c925c9e#commitcomment-106360821

Error in robodesk: ValueError: No way to determine width or height from video.

I tried to reproduce results in the robot tv_green_hue environment. I run across such errors when loading the videos.

2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/interaction.py", line 124, in env_interact
2022-12-16 17:25:14 [ERROR]      observation, info = env.reset()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/utils.py", line 74, in reset
2022-12-16 17:25:14 [ERROR]      observation, info = self.non_auto_reset_env.reset()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/__init__.py", line 73, in reset
2022-12-16 17:25:14 [ERROR]      obs = self.inner.reset()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/robodesk.py", line 345, in reset
2022-12-16 17:25:14 [ERROR]      return self._get_obs()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/robodesk.py", line 500, in _get_obs
2022-12-16 17:25:14 [ERROR]      return {'image': self.render(resize=True),
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/anaconda3/envs/interp/lib/python3.8/site-packages/gym/core.py", line 66, in render
2022-12-16 17:25:14 [ERROR]      return render_func(self, *args, **kwargs)
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/robodesk.py", line 223, in render
2022-12-16 17:25:14 [ERROR]      m.pre_render()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/utils.py", line 567, in pre_render
2022-12-16 17:25:14 [ERROR]      self.ensure_mujoco_updated()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/utils.py", line 594, in ensure_mujoco_updated
2022-12-16 17:25:14 [ERROR]      self.ensure_texure_updated()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/utils.py", line 573, in ensure_texure_updated
2022-12-16 17:25:14 [ERROR]      img = self.tv_source.get_image()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/video_source.py", line 131, in get_image
2022-12-16 17:25:14 [ERROR]      [s.get_image() for s in self.sources],
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/video_source.py", line 131, in <listcomp>
2022-12-16 17:25:14 [ERROR]      [s.get_image() for s in self.sources],
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/video_source.py", line 103, in get_image
2022-12-16 17:25:14 [ERROR]      self.load_frames_if_needed()
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/research/denoised_mdp/denoised_mdp/envs/robodesk/robodesk/robodesk/video_source.py", line 70, in load_frames_if_needed
2022-12-16 17:25:14 [ERROR]      video = skvideo.io.vread(video_f)
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/anaconda3/envs/interp/lib/python3.8/site-packages/skvideo/io/io.py", line 144, in vread
2022-12-16 17:25:14 [ERROR]      reader = FFmpegReader(fname, inputdict=inputdict, outputdict=outputdict, verbosity=verbosity)
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/anaconda3/envs/interp/lib/python3.8/site-packages/skvideo/io/ffmpeg.py", line 44, in __init__
2022-12-16 17:25:14 [ERROR]      super(FFmpegReader,self).__init__(*args, **kwargs)
2022-12-16 17:25:14 [ERROR]    File "/nfs/users/ext_yuren.liu/anaconda3/envs/interp/lib/python3.8/site-packages/skvideo/io/abstract.py", line 115, in __init__
2022-12-16 17:25:14 [ERROR]      raise ValueError(
2022-12-16 17:25:14 [ERROR]  ValueError: No way to determine width or height from video. Need `-s` in `inputdict`. Consult documentation on I/O.

The command line to run experiments is shown as follows:

env CUDA_VISIBLE_DEVICES=0 EGL_DEVICE_ID=0 KINETICS_DIR=/nfs/users/ext_yuren.liu/research/kinetics-downloader/dataset HYDRA_FULL_ERROR=1 python main.py env.kind=robodesk env.spec=tv_green_hue_noisy learning.model_learning.kl.alpha=2 learning.model_learning.kl.beta_y=0.125 learning.model_learning.kl.beta_z=0.125 seed=12

I downloaded many videos as readme.md suggested. (about 33G) in this directory. /nfs/users/ext_yuren.liu/research/kinetics-downloader/dataset/train/driving_car .

I try to solve this problem by searching on google. The following answer does not work for me because the error occured when the first video was loaded.

scikit-video/scikit-video#60

The implementation of transition model is slightly different from that described in the paper.

Hi there, there is a place I'm confused about. In the paper, the prior $p_\theta^{(z)} = p_\theta(x_{t},y_{t},z_{t-1},a)$. but in the transition model from the code, the $y_{t}$ is missing, when compute the prior $p_\theta^{(z)}$.

finger_to_target_dist() got an unexpected keyword argument 'maybe_noisy'

Hi, I meet a new error. I run the reacher_easy, then it shows:

denoised_mdp_origin/denoised_mdp/envs/dmc/dmc2gym/local_dm_control_suite/reacher.py", line 124, in get_reward
2023-05-05 22:39:31 [ERROR] return rewards.tolerance(physics.finger_to_target_dist(maybe_noisy=True, success_radius=radius), (0, radius))
2023-05-05 22:39:31 [ERROR] TypeError: finger_to_target_dist() got an unexpected keyword argument 'maybe_noisy'

How to solve this？

TypeError: Multiple inheritance with NamedTuple is not supported

On Python 3.9 multiple inheritance does not seem to be supported for the NamedTuple.
I am uncertain if using a different Python version fixes this, but then a specific Python version should be included in the README.

Traceback (most recent call last):
  File "/home/sebbo/Projekte/denoised_mdp/main.py", line 32, in <module>
    from denoised_mdp.envs import (
  File "/home/sebbo/Projekte/denoised_mdp/denoised_mdp/__init__.py", line 7, in <module>
    from . import envs
  File "/home/sebbo/Projekte/denoised_mdp/denoised_mdp/envs/__init__.py", line 19, in <module>
    from .interaction import env_interact_random_actor, env_interact_with_model, EnvInteractData
  File "/home/sebbo/Projekte/denoised_mdp/denoised_mdp/envs/interaction.py", line 49, in <module>
    class EnvInteractData(Generic[StateT], NamedTuple):  # thank god we py37 https://stackoverflow.com/a/50531189
  File "/usr/lib/python3.9/typing.py", line 1929, in _namedtuple_mro_entries
    raise TypeError("Multiple inheritance with NamedTuple is not supported")
TypeError: Multiple inheritance with NamedTuple is not supported

Maybe, this could be solved by using a dataclass instead of a NamedTuple as it was done at other places.

A question about the test_env_seed

I found in your code, no matter what changes made to the main seed, it won't influence the test_env_seed at all. In other words, with different training seeds, it only evaluate on a same seed (test_seed: int = 1841). Is there any intention for this choice? Or should I also split a more seed for evaluation? Something might like this:

# original seeding at https://github.com/facebookresearch/denoised_mdp/blob/main/main.py#L546C5-L546C104
torch_seed, np_seed, data_collect_env_seed, replay_buffer_seed = split_seed(cast(int, cfg.seed), 4)
# to
torch_seed, np_seed, data_collect_env_seed, replay_buffer_seed, test_env_seed = split_seed(cast(int, cfg.seed), 5)

A question about visualization

I find that there exists the "floor" in the visualization image when I run a video background experiment. Like this:

And I notice that the video shown in the repository has no floor:

Is the question I meet normal？
Thanks.

TIA and Dreamer Question

Hi, thanks for the great paper and great code!

I am wondering

Can I reproduce TIA results using this codebase?
You said the RSSM is modified from the unofficial pytorch implementation with some inconsistencies fixed. May I know what kind of inconsistencies since I am using the unofficial pytorch implementation as well.

Thank you

attrs not set by command in README

Followed instructions in #7 to no avail.

y_prior_state is not defined in the false branch

Hi, I met this error :

RuntimeError:

y_prior_state is not defined in the false branch:
File "/data0/svc4/code/rl/denoised_mdp/denoised_mdp/agents/networks/transition.py", line 607
x_prior_states[t] = x_prior_state

        if self.y_belief_size > 0:  # JIT doesn't seem to like empty tensors that much, so use this
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            # [Y prior] Compute belief (deterministic hidden state)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_belief = self.y_rnn(
            ~~~~~~~~~~~~~~~~~~~~~~
                self.y_state_pre_rnn(prev_y_state),
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                prev_y_belief,
                ~~~~~~~~~~~~~~
            )
            ~
            # [Y prior] Compute state prior by applying transition dynamics
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_mean, _y_prior_stddev = torch.chunk(self.y_belief_to_state_prior(y_belief), 2, dim=-1)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_stddev = F.softplus(_y_prior_stddev) + self.min_stddev
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_state = y_prior_mean + y_prior_stddev * y_prior_noises[t]
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            # [Y prior] save results
            ~~~~~~~~~~~~~~~~~~~~~~~~
            y_beliefs[t] = y_belief
            ~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_means[t] = y_prior_mean
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_stddevs[t] = y_prior_stddev
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            y_prior_states[t] = y_prior_state
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        else:
        ~~~~~
            y_belief = y_beliefs[t]
            ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        # [XY posterior]

and was used here:
File "/data0/svc4/code/rl/denoised_mdp/denoised_mdp/agents/networks/transition.py", line 663
y_posterior_state = y_posterior_states[t]
x_state_for_z_belief = x_prior_state
y_state_for_z_belief = y_prior_state
~~~~~~~~~~~~~ <--- HERE

        if self.z_belief_size > 0:

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I didn't change any code of the original project. Thanks for any suggestions.

Clear instructions on reproducing 2c results

Is this all that's required?

learning.model.transition.z.belief_size=120
learning.model.transition.z.state_size=20

Cannot run the experiments. KeyError: 'planning_horizon'

I went across this error when I tried to run the experiments using the following command.

env CUDA_VISIBLE_DEVICES=0 EGL_DEVICE_ID=0 KINETICS_DIR=/nfs/users/ext_yuren.liu/research/kinetics-downloader/dataset HYDRA_FULL_ERROR=1 python main.py env.kind=robodesk env.spec=tv_green_hue_noisy learning.model_learning.kl.alpha=2 learning.model_learning.kl.beta_y=0.125 learning.model_learning.kl.beta_z=0.125 seed=12

The error information is shown as follows:

 File "/nfs/users/ext_yuren.liu/research/denoised_mdp/config.py", line 256, in to_config_and_instantiate
    global_config = convert(dict_cfg, Config)
  File "/nfs/users/ext_yuren.liu/research/denoised_mdp/config.py", line 249, in convert
    kwargs[k.lstrip('_')] = convert(subv, fields[k].type)
  File "/nfs/users/ext_yuren.liu/research/denoised_mdp/config.py", line 249, in convert
    kwargs[k.lstrip('_')] = convert(subv, fields[k].type)
  File "/nfs/users/ext_yuren.liu/research/denoised_mdp/config.py", line 249, in convert
    kwargs[k.lstrip('_')] = convert(subv, fields[k].type)
KeyError: 'planning_horizon'

I added some codes to print the fields and v values.

def convert(v: Any, desired_ty: Type):
        if isinstance(v, DictConfig):
            ty = OmegaConf.get_type(v)
            assert issubclass(ty, desired_ty)
            if attrs.has(ty):
                fields = attrs.fields_dict(ty)
                print('fields keys: #################### \n', fields.keys())
                print('fields values: #################### \n', fields.values())
                kwargs = {}
                print('v.keys(): #################### \n', v.keys())
                for k, subv in v.items():
                    # sign, attrs auto-strip leading underscore so we have to manually do this
                    # rather than using `OmegaConf.to_object`
                    # print(k, subv)
                    # print('fields type: ', fields[k].type)
                    kwargs[k.lstrip('_')] = convert(subv, fields[k].type)
                v = ty(**kwargs)
        elif isinstance(desired_ty, type) and issubclass(desired_ty, enum.Enum):
            if isinstance(v, str) and v != MISSING:
                v = desired_ty[v]
        return v

The printed information shows that the keys of the fields and v are not matched. It seems that fields are fields of BasePolicyLearning.Config and v is DynamicsBackpropagateActorCritic.Config.

fields keys: ####################
dict_keys(['_target_', '_partial_', 'discount', 'actor_lr', 'actor_grad_clip_norm'])

v.keys(): ####################
 dict_keys(['_partial_', 'discount', 'actor_lr', 'actor_grad_clip_norm', '_target_', 'planning_horizon', 'lambda_return_discount', 'value', 'value_lr', 'value_grad_clip_norm'])

I didn't change any code of the original project. My python version is 3.8.13. Packages are installed by pip install -r requirements.txt.

AssertionError when Results Directory does not exist

When trying the code for the first time, the validation in the config throws an error due to the default path not existing.

Traceback (most recent call last):
  File "/home/sebbo/Projekte/denoised_mdp/main.py", line 48, in <module>
    from config import InstantiatedConfig, to_config_and_instantiate
  File "/home/sebbo/Projekte/denoised_mdp/config.py", line 227, in <module>
    cs.store(name='config', node=Config())
  File "<attrs generated init config.Config>", line 24, in __init__
  File "/home/sebbo/Projekte/denoised_mdp/config.py", line 80, in exists_path
    assert os.path.exists(value)
AssertionError

This could be made easier by automatically creating the directory when it does not exist.