GithubHelp home page GithubHelp logo

homework's People

Contributors

abhishekunique avatar carmezim avatar cbfinn avatar gkahn13 avatar joschu avatar katerakelly avatar mbchang avatar rddy avatar snasiriany avatar svlevine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

homework's Issues

[README] Mujoco Requirement

Hi folks, first of all, thanks for the amazing material and work you open sourced, outstanding and generous gift to the community.

I wondered if would be interesting to add a side note regarding Mujoco 1.3 on recent Macs.
This version didn't support NVMe disks and still an open issue where there are more details regarding it.

Cost function in HW4, index error, dimension of states

Hi,

It seems the cost function for the half-cheetah environment (cost_functions.py) has a mistake. I'm getting the error

---> 50     score -= (next_state[17] - state[17]) / 0.01 #+ 0.1 * (np.sum(action**2))
IndexError: index 17 is out of bounds for axis 0 with size 17

Is the dimension 17 correct for the environment? Thanks :)

`gym.benchmark_spec` no longer valid

Got AttributeError: module 'gym' has no attribute 'benchmark_spec' thrown from this line.

gym no longer have this experimental feature, manually setting env = gym.make(game_name) and max_timestep as mentioned from here as workaround.

Copyrights of lecture notes

Thx for the amazing work.
For now, I am personally writing lecture notes based on cs294-fall2017 course. And I am doing this for clearing my thoughts of some ideas and math. Besides, I am planning to open source it (here is the repo), but I don't know if there is any copyright issues I should aware of, or do you think it's appropriate?

Could you please release some experiment result so I can check if my code is right?

I am not t student of this class, I study it by watching videos of this class.
I am finishing the homework to practice. I know it's not appropriate to release answer, but is it possible to release some experiments result so I check self-check if my answer is right?

For example:
when I follow guide and complete the code for an assignment, and run the experiments as guide suggest. the guide tell me what result should I expect, if my result deviate too much, then I probably wrong.

Port HW1 over to use OpenAI RoboSchool

OpenAI has re-implemented the Mujuco environments in Bullet making them available to everyone. It would be really helpful for people following the course after the fact if HW1 could be ported over to use these new environments.

I could probably send some time working on this, but I'm not sure how I would generate new expert policies.

ReplayBuffer - a subtle bug around head pointer boundary

Hi!

Looking at the _encode_observation function it seems you have a subtle bug in there.

Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.

But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).

The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.

Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:

    def _handle_start_index_edge_cases(self, start_index, end_index):
      # Edge case 1:
      if not self._buffer_full() and start_index < 0:
          start_index = 0

      # Edge case 2:
      # Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
      # belongs to completely different episodes
      if self._buffer_full():
          if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
              start_index = self.current_free_slot_index

where my num_previous_frames_to_fetch is your frame_history_len.

hw4 and HalfCheetah pybullet/roboschool

Hi all,
I'm trying to figure out how to port the cost function of HalfCheetah MuJoCo to HalfCheetah pybullet for model based RL. The state vector is not really the same for the two environments. 26-dim for pybullet instead of 20-dim. Any idea on how to implement the cost function in order to produce the same behaviour with the two environments ?

AttributeError for get_wrapper_by_name(env, "Monitor") in hw3

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    135     for t in count():
    136         ### Check stopping criterion
--> 137         if stopping_criterion is not None and stopping_criterion(env, t):
    138             break
    139 

<ipython-input-11-ab71394a8fd6> in stopping_criterion(env, t)
      2         # notice that here t is the number of steps of the wrapped env,
      3         # which is different from the number of steps in the underlying env
----> 4         return get_wrapper_by_name(env, "Monitor").get_total_steps() >= num_timesteps

AttributeError: '_Monitor' object has no attribute 'get_total_steps'
AttributeError                            Traceback (most recent call last)
<ipython-input-9-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    201 
    202         ### 4. Log progress
--> 203         episode_rewards = get_wrapper_by_name(env, "Monitor").get_episode_rewards()
    204         if len(episode_rewards) > 0:
    205             mean_episode_reward = np.mean(episode_rewards[-100:])

AttributeError: '_Monitor' object has no attribute 'get_episode_rewards'

Seems like whenever I can't get any specified attributes when calling get_wrapper_by_name(env, "Monitor")

I have update gym version, but it didn't work out.

hw1 dropout implementation

In tf_utils.py HW1, I found dropout implementation as follows,

def dropout(x, pkeep, phase=None, mask=None):
    mask = tf.floor(pkeep + tf.random_uniform(tf.shape(x))) if mask is None else mask
    if phase is None:
        return mask * x
    else:
        return switch(phase, mask*x, pkeep*x)

In test phase, should it be x/pkeep instead of x*pkeep(based on inverse dropout theory)? If not, why?

Thanks for your explanation.

HW1 Setup

I encountered two issues when trying to follow the setup instruction for HW1.

  1. Latest version of mujoco-py wants MuJoCo 1.50 rather than MuJoCo 1.31.
  2. XXX-v1 environment is no longer available from latest version of gym; I had to use XXX-v2 instead.

I cannot find tensorflow 1.10.5

When I use conda search

I can find version 1.10.0 , 1.11.0..., I dont know if it may matter if I choose version 1.10.0.

Or, Where could I find version 1.10.5.

Sorry I am a newer for tf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.