berkeleydeeprlcourse / homework Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 1.0K 2.54 MB

Assignments for CS294-112.

License: MIT License

Python 92.56% Shell 0.74% TeX 6.71%

homework's People

Contributors

Stargazers

Watchers

Forkers

kalugny conansi tsingjinyun dotchen jeiting jxwuyi shivajid manu34414 gojira mcdavid109 vertix o-tawab shi69 vladimir84 alok botyue aidsj yongduek ian09 williamd4112 tmannen shuanglu66liu gwding vashishtmadhavan hamzamerzic nottombrown yuxuanhuang yangarbiter junhongxu shi-yan zidanmusk abdullahmohamed55 ymao1993 amiltonwong fangyizhang timsl mimoralea valdersoul th-yong luofan18 zimuw scitator carmezim kkweon haanvid alexseong ebtech yangwenca lgvaz g-wang minghchen ariesll ajinomotos xmyqsh shmuma cthorey janehanzhen rossjwark markcsie radu ajaytalati stevenxxiu zhanghc12 iretiayo zhengliz kmather73 emo-eth kevinleestone johannah jerome89 akki2825 kitu2007 rogertrullo tenzindj chenglongchen vdpappu jaisanliang hippogriff anair13 adityab arendu-zz importsysu timwee nathanin magnusja mquad guoxs girishsk l1aoxingyu dhawgupta kvony litoeknee vladprytula rockt peratham ethanluoyc robindume karen7j brahmasp brennannichyporuk

homework's Issues

hw2 instruction, Probable Typo?

Hi,
At the end of the page three of the HW2 instruction, (Problem 1.a) Few equations sum over is. Shouldn't they be ts.

[README] Mujoco Requirement

Hi folks, first of all, thanks for the amazing material and work you open sourced, outstanding and generous gift to the community.

I wondered if would be interesting to add a side note regarding Mujoco 1.3 on recent Macs.
This version didn't support NVMe disks and still an open issue where there are more details regarding it.

Cost function in HW4, index error, dimension of states

Hi,

It seems the cost function for the half-cheetah environment (cost_functions.py) has a mistake. I'm getting the error

---> 50     score -= (next_state[17] - state[17]) / 0.01 #+ 0.1 * (np.sum(action**2))
IndexError: index 17 is out of bounds for axis 0 with size 17

Is the dimension 17 correct for the environment? Thanks :)

`gym.benchmark_spec` no longer valid

Got AttributeError: module 'gym' has no attribute 'benchmark_spec' thrown from this line.

gym no longer have this experimental feature, manually setting env = gym.make(game_name) and max_timestep as mentioned from here as workaround.

Thx for the amazing work.
For now, I am personally writing lecture notes based on cs294-fall2017 course. And I am doing this for clearing my thoughts of some ideas and math. Besides, I am planning to open source it (here is the repo), but I don't know if there is any copyright issues I should aware of, or do you think it's appropriate?

Could you please release some experiment result so I can check if my code is right?

I am not t student of this class, I study it by watching videos of this class.
I am finishing the homework to practice. I know it's not appropriate to release answer, but is it possible to release some experiments result so I check self-check if my answer is right?

For example:
when I follow guide and complete the code for an assignment, and run the experiments as guide suggest. the guide tell me what result should I expect, if my result deviate too much, then I probably wrong.

Port HW1 over to use OpenAI RoboSchool

OpenAI has re-implemented the Mujuco environments in Bullet making them available to everyone. It would be really helpful for people following the course after the fact if HW1 could be ported over to use these new environments.

I could probably send some time working on this, but I'm not sure how I would generate new expert policies.

ReplayBuffer - a subtle bug around head pointer boundary

Hi!

Looking at the _encode_observation function it seems you have a subtle bug in there.

Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.

But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).

The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.

Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:

    def _handle_start_index_edge_cases(self, start_index, end_index):
      # Edge case 1:
      if not self._buffer_full() and start_index < 0:
          start_index = 0

      # Edge case 2:
      # Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
      # belongs to completely different episodes
      if self._buffer_full():
          if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
              start_index = self.current_free_slot_index

where my num_previous_frames_to_fetch is your frame_history_len.

hw4 and HalfCheetah pybullet/roboschool

Hi all,
I'm trying to figure out how to port the cost function of HalfCheetah MuJoCo to HalfCheetah pybullet for model based RL. The state vector is not really the same for the two environments. 26-dim for pybullet instead of 20-dim. Any idea on how to implement the cost function in order to produce the same behaviour with the two environments ?

AttributeError for get_wrapper_by_name(env, "Monitor") in hw3

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    135     for t in count():
    136         ### Check stopping criterion
--> 137         if stopping_criterion is not None and stopping_criterion(env, t):
    138             break
    139 

<ipython-input-11-ab71394a8fd6> in stopping_criterion(env, t)
      2         # notice that here t is the number of steps of the wrapped env,
      3         # which is different from the number of steps in the underlying env
----> 4         return get_wrapper_by_name(env, "Monitor").get_total_steps() >= num_timesteps

AttributeError: '_Monitor' object has no attribute 'get_total_steps'

AttributeError                            Traceback (most recent call last)
<ipython-input-9-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    201 
    202         ### 4. Log progress
--> 203         episode_rewards = get_wrapper_by_name(env, "Monitor").get_episode_rewards()
    204         if len(episode_rewards) > 0:
    205             mean_episode_reward = np.mean(episode_rewards[-100:])

AttributeError: '_Monitor' object has no attribute 'get_episode_rewards'

Seems like whenever I can't get any specified attributes when calling get_wrapper_by_name(env, "Monitor")

I have update gym version, but it didn't work out.

hw1 dropout implementation

In tf_utils.py HW1, I found dropout implementation as follows,

def dropout(x, pkeep, phase=None, mask=None):
    mask = tf.floor(pkeep + tf.random_uniform(tf.shape(x))) if mask is None else mask
    if phase is None:
        return mask * x
    else:
        return switch(phase, mask*x, pkeep*x)

In test phase, should it be x/pkeep instead of x*pkeep(based on inverse dropout theory)? If not, why?

Thanks for your explanation.

HW1 Setup

I encountered two issues when trying to follow the setup instruction for HW1.

Latest version of mujoco-py wants MuJoCo 1.50 rather than MuJoCo 1.31.
XXX-v1 environment is no longer available from latest version of gym; I had to use XXX-v2 instead.

I cannot find tensorflow 1.10.5

When I use conda search

I can find version 1.10.0 , 1.11.0..., I dont know if it may matter if I choose version 1.10.0.

Or, Where could I find version 1.10.5.

Sorry I am a newer for tf

berkeleydeeprlcourse / homework Goto Github PK

homework's People

Contributors

Stargazers

Watchers

Forkers

homework's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs