GithubHelp home page GithubHelp logo

higgsfield / rl-adventure Goto Github PK

View Code? Open in Web Editor NEW
3.0K 3.0K 588.0 751 KB

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL

Jupyter Notebook 97.90% Python 2.10%

rl-adventure's Introduction

higgsfield - multi node training without crying

Higgsfield is an open-source, fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters, such as Large Language Models (LLMs).

PyPI version

architecture

Higgsfield serves as a GPU workload manager and machine learning framework with five primary functions:

  1. Allocating exclusive and non-exclusive access to compute resources (nodes) to users for their training tasks.
  2. Supporting ZeRO-3 deepspeed API and fully sharded data parallel API of PyTorch, enabling efficient sharding for trillion-parameter models.
  3. Offering a framework for initiating, executing, and monitoring the training of large neural networks on allocated nodes.
  4. Managing resource contention by maintaining a queue for running experiments.
  5. Facilitating continuous integration of machine learning development through seamless integration with GitHub and GitHub Actions. Higgsfield streamlines the process of training massive models and empowers developers with a versatile and robust toolset.

Install

$ pip install higgsfield==0.0.3

Train example

That's all you have to do in order to train LLaMa in a distributed setting:

from higgsfield.llama import Llama70b
from higgsfield.loaders import LlamaLoader
from higgsfield.experiment import experiment

import torch.optim as optim
from alpaca import get_alpaca_data

@experiment("alpaca")
def train(params):
    model = Llama70b(zero_stage=3, fast_attn=False, precision="bf16")

    optimizer = optim.AdamW(model.parameters(), lr=1e-5, weight_decay=0.0)

    dataset = get_alpaca_data(split="train")
    train_loader = LlamaLoader(dataset, max_words=2048)

    for batch in train_loader:
        optimizer.zero_grad()
        loss = model(batch)
        loss.backward()
        optimizer.step()

    model.push_to_hub('alpaca-70b')

How it's all done?

  1. We install all the required tools in your server (Docker, your project's deploy keys, higgsfield binary).
  2. Then we generate deploy & run workflows for your experiments.
  3. As soon as it gets into Github, it will automatically deploy your code on your nodes.
  4. Then you access your experiments' run UI through Github, which will launch experiments and save the checkpoints.

Design

We follow the standard pytorch workflow. Thus you can incorporate anything besides what we provide, deepspeed, accelerate, or just implement your custom pytorch sharding from scratch.

Enviroment hell

No more different versions of pytorch, nvidia drivers, data processing libraries. You can easily orchestrate experiments and their environments, document and track the specific versions and configurations of all dependencies to ensure reproducibility.

Config hell

No need to define 600 arguments for your experiment. No more yaml witchcraft. You can use whatever you want, whenever you want. We just introduce a simple interface to define your experiments. We have even taken it further, now you only need to design the way to interact.

Compatibility

We need you to have nodes with:

  • Ubuntu
  • SSH access
  • Non-root user with sudo privileges (no-password is required)

Clouds we have tested on:

  • Azure
  • LambdaLabs
  • FluidStack

Feel free to open an issue if you have any problems with other clouds.

Getting started

Here you can find the quick start guide on how to setup your nodes and start training.

API for common tasks in Large Language Models training.

Platform Purpose Estimated Response Time Support Level
Github Issues Bug reports, feature requests, install issues, usage issues, etc. < 1 day Higgsfield Team
Twitter For staying up-to-date on new features. Daily Higgsfield Team
Website Discussion, news. < 2 days Higgsfield Team

rl-adventure's People

Contributors

cehnegaitne avatar higgsfield avatar xjohn600 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-adventure's Issues

batch_size for DQN

Thanks for your codes which work very well.
The origin DQN(1.dqn.ipynb) set batch_size=32 which is ineffective on GUP so I changed it to 256. But it's bad performance. Is there any experience in this?

batch_size=32

Screen Shot 2019-10-24 at 10 13 44 AM

batch_size=256

Screen Shot 2019-10-24 at 10 04 52 AM

Or should I try more experiments with batch_size=256 or try more random seed?
Thanks again.

The update in DQN

Hi,

I get a question about your implementation of DQN, which is supposed to have a C-interval-update between target q-network and current q-network. I see this update in your implementation of DDQN. Can you please tell me why it is this way?

In my point of view, your implementation of ddqn is actually dqn.


Best,
Yuxuan

Distributional Reinforcement Learning with Quantile Regression

Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!

huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
quantile_loss = (tau - (u < 0).float()).abs() * huber_loss

RL-Adventure/3.dueling dqn.ipynb missing forward?

def compute_td_loss(batch_size):
    state, action, reward, next_state, done = replay_buffer.sample(batch_size)

    state      = Variable(torch.FloatTensor(np.float32(state)))
    next_state = Variable(torch.FloatTensor(np.float32(next_state)))
    action     = Variable(torch.LongTensor(action))
    reward     = Variable(torch.FloatTensor(reward))
    done       = Variable(torch.FloatTensor(done))

    q_values      = current_model(state)
    next_q_values = target_model(next_state)

    q_value          = q_values.gather(1, action.unsqueeze(1)).squeeze(1)
    next_q_value     = next_q_values.max(1)[0]
    expected_q_value = reward + gamma * next_q_value * (1 - done)
    
    loss = (q_value - expected_q_value.detach()).pow(2).mean()
        
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return loss

no forward call?

edit: sry, found my issue caused by Variable not "the missing forward", it works without calling forward(), the result is the same, can be closed.

Error - possibly due to "Variable()" ?

Hi, many thanks for sharing the code.

I have experienced an error running 1.dqn straight out of the box. The error message shown after I run the 12th cell of code is as shown below.

My computer is running with PyTorch 0.4.1, and I suspect that the error is due to a change in the "Variable" API (as used in cells 8 and 10 for example)? If so, has anyone updated the code for the latest PyTorch 0.4.1?

Any ideas would be appreciated! Thanks in advance!


Error message after cell 12:


/home/USER/anaconda3/envs/RL/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.


AssertionError Traceback (most recent call last)
in ()
12 action = model.act(state, epsilon)
13
---> 14 next_state, reward, done, _ = env.step(action)
15 replay_buffer.push(state, action, reward, next_state, done)
16

~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action)
29 def step(self, action):
30 assert self._episode_started_at is not None, "Cannot call env.step() before calling reset()"
---> 31 observation, reward, done, info = self.env.step(action)
32 self._elapsed_steps += 1
33

~/anaconda3/envs/RL/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py in step(self, action)
52
53 def step(self, action):
---> 54 assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
55 state = self.state
56 x, x_dot, theta, theta_dot = state

AssertionError: tensor(0) (<class 'torch.Tensor'>) invalid

development

hello,thanks for sharing your code
i want to implement duel dqn for mountain car.can you suggest anything?

frame_stack default to False

Hello, is it correct that frame_stack in wrap_deepmind is never in use? Will you be able to get velocity information if you only parse 1 frame at a time?

Licensing

Hello, i plan to use your DQN code for my bachelor's thesis and will of course reference where i got the code from, is there any further acceptable use policy on your code?

Error in projection_distribution (Distributional DQN) ?

Hi,

I have a question regarding the projection_distribution method. It seems that when you are projecting back on the support/bins, at lines :

proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist * (u.float() - b)).view(-1)) 
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist * (b - l.float()) ).view(-1))

the distribution next_dist is scaled by the support from the line
next_dist = target_model(next_state).data.cpu() * support
It seems like this should not be the case. This results in the final projected distribution not summing up to one. It seems one should do something like

next_dist_raw = target_model(next_state).data.cpu()
next_dist = next_dist_raw * support
next_action = next_dist.sum(2).max(1)[1]
next_action = next_action.unsqueeze(1).unsqueeze(1).expand(next_dist.size(0), 1, next_dist.size(2))
next_dist = next_dist.gather(1, next_action).squeeze(1)
next_dist_raw = next_dist_raw.gather(1, next_action).squeeze(1)
proj_dist.view(-1).index_add_(0, (l + offset).view(-1), (next_dist_raw * (u.float() - b)).view(-1))
proj_dist.view(-1).index_add_(0, (u + offset).view(-1), (next_dist_raw * (b - l.float()) ).view(-1))

This results in a distribution that contains the same amount of mass as the original one.

Thank you,
Lucas

cuda tensor instead of int in 1.dqn

Hi! Thanks for the great tutorials!
I had an issue with class DQN(nn.Module), in method act this thing
action = q_value.max(1)[1].data[0]
seemed to return some torch cuda tensor, that env.step naturally couldn't take as input.
I replaced it with with
action = int(q_value.max(1)[1].data[0].cpu().int().numpy())
and it works for me.

Error in Priority Update for Prioritized Replay

It looks like you're updating the priorities in the replay buffer according to the weighted and squared TD error.

loss  = (q_value - expected_q_value.detach()).pow(2) * weights
prios = loss + 1e-5
replay_buffer.update_priorities(indices, prios.data.cpu().numpy())

However, the algorithm in the original paper updates the priority only according to the absolute value of the TD error, which is not weighted. I believe this is a mistake in your implementation

Environment/dependencies

Hi all,

I am currently trying to run the 'quantile regression dqn' notebook, but it breaks in the training stage at line
loss = compute_td_loss(batch_size).
At some point I realised I actually have no idea if it could be due to my environment, and I wasn't able to verify this with the documentation.

Could someone please report which python/torch versions were succesfully tested? I'd be grateful to be able to put this nice code to work!

Best regards,

Jan

About Distributional DQN--projection_distribution

I am confused about Distributional DQN. Why 'next_dists' multiplied by support in the function of ’projection_distribution‘?My model got bad learning after using it. I would appreciate it if you could give me an answer in your spare time!

ModuleNotFoundError: No module named 'common'

I can't install the packages

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from common.wrappers import make_atari, wrap_deepmind, wrap_pytorch

ModuleNotFoundError: No module named 'common'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.