GithubHelp home page GithubHelp logo

rddy / mimi Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 2.0 204 KB

Code for the paper, "First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization"

License: MIT License

Python 26.45% Jupyter Notebook 73.55%

mimi's People

Contributors

rddy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

guyko81 q87718350

mimi's Issues

rollout_policy LunarLander

running the lander notebook I got an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-f979eb03d797> in <module>
      4   gp_min_kwargs=gp_min_kwargs,
      5   ep_kwargs=ep_kwargs,
----> 6   reward_model_train_kwargs=mi_model_train_kwargs
      7 )

~\anaconda3\envs\mimienv\lib\site-packages\mimi\opt.py in run(self, n_pols, n_steps_per_pol, n_eps_per_pol, gp_min_kwargs, ep_kwargs, reward_model_train_kwargs)
     80       self.param_bounds,
     81       n_calls=n_pols,
---> 82       **gp_min_kwargs
     83     )
     84     policy = self.policy_from_params(res.x)

~\anaconda3\envs\mimienv\lib\site-packages\skopt\optimizer\gp.py in gp_minimize(func, dimensions, base_estimator, n_calls, n_random_starts, n_initial_points, initial_point_generator, acq_func, acq_optimizer, x0, y0, random_state, verbose, callback, n_points, n_restarts_optimizer, xi, kappa, noise, n_jobs, model_queue_size)
    266         n_restarts_optimizer=n_restarts_optimizer,
    267         x0=x0, y0=y0, random_state=rng, verbose=verbose,
--> 268         callback=callback, n_jobs=n_jobs, model_queue_size=model_queue_size)

~\anaconda3\envs\mimienv\lib\site-packages\skopt\optimizer\base.py in base_minimize(func, dimensions, base_estimator, n_calls, n_random_starts, n_initial_points, initial_point_generator, acq_func, acq_optimizer, x0, y0, random_state, verbose, callback, n_points, n_restarts_optimizer, xi, kappa, n_jobs, model_queue_size)
    297     for n in range(n_calls):
    298         next_x = optimizer.ask()
--> 299         next_y = func(next_x)
    300         result = optimizer.tell(next_x, next_y)
    301         result.specs = specs

~\anaconda3\envs\mimienv\lib\site-packages\mimi\opt.py in cost_of_policy_params(self, policy_params)
     51   def cost_of_policy_params(self, policy_params):
     52     policy = self.policy_from_params(policy_params)
---> 53     rollouts = utils.rollout_policy(
     54       policy,
     55       self.env,

AttributeError: module 'mimi.utils' has no attribute 'rollout_policy'

format_rollouts

this is so much fun! :)
but there's one more error: in format_rollouts (utils.py) we define 'rewards': []. However during training at slice_data the code tries to select the validation indexes of all key-value pairs, while the rewards is not filled (at least not in LunarLander). So it throws an index error, because the 'rewards' list remains empty.

I simply commented the rewards definition out, I hope the model still learns (based on the paper it should, haven't checked that part of the code yet)

def format_rollouts(rollouts, env):
  data = {
    'obses': [],
    'actions': [],
    'next_obses': [],
    #'rewards': []
  }

question for set up

First of all, thank you for the great articles and research. It said that I have No module named 'disvae' when I run " import disvae.utils.modelIO β€œat address, mimi-main\notebooks\mimi\models.py. I tried to download the module by using pip but it failed.

The error information was "ERROR: Could not find a version that satisfies the requirement disvae (from versions: none)ERROR: No matching distribution found for disvae".

Please tell me how to install it. I was wondering if the module is written by yourself. If that is true, please tell me where is the module and how to install it.

question in the setting up of lander notebook

First of all, thank you for the great articles and research.

1
I meet the prroblem that 'LunarLander' object has no attribute 'helipad_idx' in setting up lander notebook. Pleaae, telling me what the meaning of "self.goal = self.env.helipad_idx" in envs.py about 319 row. Maybe I can change or ignore this row to fix this problem.

I changed some parts of the codes to fix some problems that I met before. I am wondering if they may cause this problem.

1.About 304 line in envs.py, I change "self.n_act_dim = self.env.action_space.low.size" to "self.n_act_dim = self.env.action_space", "self.n_env_obs_dim = self.env.observation_space.low.size" to "self.n_env_obs_dim = self.env.observation_space". I did that to fix the problem that "'Discrete' object has no attribute 'low'"

2.I use the following code:
mi_model_init_kwargs = {
'n_env_obs_dim': env.n_min_env_obs_dim,
'n_user_obs_dim': env.n_user_obs_dim,
#'n_act_dim': env.n_act_dim,
'n_act_dim': 4,
'n_layers': 2,
'layer_size': 64
}

This help me to overcome the problem that " Error converting shape to a TensorShape: Dimension value must be integer or None or have an index method, got value 'Discrete(4)' with type '<class 'gym.spaces.discrete.Discrete'>'."

It seems that I always meet problems in the "n_act_dim". Please helping me to fix my problems.

question: mutual information as simple reward sign for general RL algorithms

Hi,

I thought it's the easiest to ask here: I was wondering if simply calculating the mutual information between the user input $x_t$ and the state $s_t$ and $S_{t+1}$, and giving it to the system as a simple $r_t$, then any general RL algorithm (TD3, DDPG, PPO) could use it as the target. And by maximizing the mutual information the agent could navigate as per the user request.
I probably miss something important, but can you give a feedback please?

Thanks!

Questions

Hello.

First of all, thank you for the great articles and research. Currently, I'm working on software for people with disabilities and trying to apply your research in it. Unfortunately, I'm not very familiar with the area of mutual information estimation and I have a bunch of questions. I hope you provide some brief answers.

  • Can we reuse old samples? Can I collect some new samples with the current interface, but also leave samples collected with the old interface?

  • Can we reuse the old estimator or does it needs to be trained from scratch? If we reuse the old estimator, it can drastically boost convergence, but also can introduce a bias. We need to provide fast feedback to the user and can't train an estimator/interface for too long, so I trying to find some solutions to overcome this limitation.

  • In my project, I have fairly big and complex raw observations. I train a network that takes them and, in a supervised fashion, estimates the desired action. Can I use latent representation from that network as input for MIMI? The first network provides fairly good estimations, but they are not always intuitive, so I wanna apply MIMI and train policy to sample more intuitive actions from the estimations. I'm concerning that latent representation could provide data leakage and MIMI would be collapsed to a trivial solution. Obviously, I can just use other ways to reduce inputs, but they would require additional computations and may introduce other problems.

  • As I see, we are using 1 network to estimate scores of joint (x, y) pairs and n_mine_samp=32 networks to estimate scores of marginal pairs. Is it just the default implementation or have you tried separable estimators (f(x, y) = g(x) * h(y), so we process only 2N "items" instead of N * N) and they failed? In my opinion, it may be much more efficient, but I don't know how crucial a role here is bias/variance.
    Theoretically, separable estimators cover all pairs, 32 * N * N, instead of just 32 * N, so can be more restrictive. They have issues with bias/variance, but we are using 32 of them, so it may be not a problem. At the same time, we have access to all marginals, so may estimate tf.reduce_mean(tf.exp(shuffled_stats), axis=1) more precisely.
    What do you think about that? Of course, it would be better just to test it, but it requires some time and effort.

Best regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.