rentruewang / koila Goto Github PK

Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.

Home Page: https://koila.rentruewang.com

License: MIT License

Python 100.00%

pytorch lazy-evaluation out-of-memory python machine-learning deep-learning memory-management gradient-accumulation neural-network

koila's Introduction

👋 You've reached the GitHub profile of RenChu!

The languages I love and write:

and Python C extensions for best of both worlds.

See the snippets repository for more of my thoughts.

koila's People

Contributors

Stargazers

Watchers

koila's Issues

i have no idea how to use Koila in my code

Hey, I use Coqui-ai TTS through a simple Python code.

from TTS.api import TTS

tts = TTS(model_name="MODEL NAME",
          progress_bar=True,
          gpu=True)


tts.tts_to_file(text="TEXT", file_path='Audio.wav')

but I always get CUDA out of memory.
and I'm not really sure how to use Koila with my code or Coqui-ai tts in general. any help?

[BUG] pip can't find the package on Kaggle & Colab

Hello, as who's suffering from "cuda out of memory" errors on Kaggle notebook, I can't wait to use your package. However, I run into errors when I try to install koila on both Kaggle and Colab notebooks.

Describe the bug
!pip install koila outputs the following error message: ERROR: Could not find a version that satisfies the requirement koila (from versions: none) ERROR: No matching distribution found for koila on Kaggle and Colab notebooks.

To Reproduce
Steps to reproduce the behavior:

Run !pip install koila on Kaggle / Colab

I'd appreciate it if anyone provides me with an alternate solution until this error gets fixed.

Compatibility with python 3.7?

I'm wondering what makes it incompatible with python 3.7 because that is the python version I am using and I can't upgrade to 3.8

DataLoader Implementation

If it isn't already possible, it would be nice to be able to integrate this with the dataloader class

is kolia support python3.6?

Getting NameError for lazy in (input, label) = lazy(input, label, batch=0) ?

Compatibility with conv2d and batch norm?

I'm still getting the memory error... wondering if it's because of conv layers or batch norm...

Can't install from pip (PyPi)

I am unable to install this from PyPi using Pip. I'm not sure why, but I opened this issue in case anyone else was having this problem and was searching here.

The output I get is this:

pip install koila
ERROR: Could not find a version that satisfies the requirement koila
ERROR: No matching distribution found for koila

Compatibility with PyTorch hooks.

Hello, I found this project is interesting. However, I found the lazy tensor mechanism is impossible to work with the PyTorch backward hooks, which makes it difficult to be used in combination with PyTorch checkpointing (https://pytorch.org/docs/stable/checkpoint.html). Checkpointing is a common way to avoid OOM in training.

Major overhaul

I'm planning on making a major overhaul, to simplify the code and make it more scalable.

Currently this project relies too much on checks to determine if an object is a LazyTensor or a torch.Tensor, however, it's not only difficult to maintain, but can also negatively affect performance.

I'm on my way to create a new wrapper for torch.Tensor that matches LazyTensor's API but executes immediately for internal use.

Also, I'm modifying the LazyTensor's API to match torch.Tensor's.

I'll be using this issue to track my progress.

Closes: #22
Closes: #25

Can not get is_cuda attribute from LazyTensor

Hi! We are building LazyTensor too!

Hi! I'm with Pytorch team and it looks like we are also building something similar to koila here: https://github.com/pytorch/pytorch/tree/lazy_tensor_staging . We would love to connect and learn more about your work!
If you are interested, could you please reply to this issue and drop me a line at k o r o v a i k o n AT gmail.com (no spaces obviously)

Using Koila with Big Sleep?

Hi, this project could be revolutionary, if only I knew how to use it :)

You surely heard of Big Sleep, right? Using CLIP and BIGGAN, from just a line of text it's capable of generating amazing visuals and unique works of art, which is why is getting more and more popular among an ever growing number of artists and curious people who have been deeply fascinated by the potential of these techniques...

However many of us have not been able to run these kind of projects on our machines because of low VRAM in consumer GPUs and crazy market prices and ended up stumbling almost immediately on the infamous CUDA Memory Error... (Yes, Google Colab is nice and all, but running this projects locally makes for a totally different kind of "technological chill" if you know what I mean :) )

So, I was thinking, would it be possible to apply Koila to Big Sleep, to fix those errors?
If so, that'd be a game changer! It would at the same time benefit a huge number of users, and translate into massive traction for Koila!
Looking at the README I thought the whole process would have been very simple so I tried looking at it myself... but in the end I had to give up because I've just approached this field and I still miss much of the necessary background to figure out these kind of details.

So yeah, would you consider providing a short example for this use case of Koila + Big Sleep, if feasible? In that case just a few lines of code could potentially mean the beginning of a little revolution :)

KeyError: 0

Thanks for your nice work!
I wrapped my input and label (feat, label) = lazy(feat, label, batch=0)
Then I met the following error when running it.

File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/lazy.py", line 504, in lazy_forward
out = LazyTensor(LazyFunction(func, shape_func)(*args, **kwargs))
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/lazy.py", line 51, in __call__
prepass = self.prepass_func(*args, **kwargs)
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/prepasses.py", line 286, in tranpose
batch = b.map(lambda x: {dim0: dim1, dim1: dim0}[x])
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/interfaces.py", line 78, in map
index = func(self.index)
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/prepasses.py", line 286, in
batch = b.map(lambda x: {dim0: dim1, dim1: dim0}[x])
KeyError: 0

Integration with huggingface

If somehow we can integrate this with hugging face models while doing inference then its job is done for production-level deployments.

wrong error in getting-started.py

Hello,
I noticed you fix the lazy label bug and the getting-started.py is able to run.
But it can not pass the assertion. The grad diff is quite large!

assert all(
[print(torch.max(grad - lazy_grad)) for (grad, lazy_grad) in zip(grads, lazy_grads)]
)

tensor(0.0698)
tensor(0.0227)
tensor(0.0717)
tensor(0.0415)
tensor(0.5402)
tensor(0.7869)

RecursionError: maximum recursion depth exceeded while calling a Python object

  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
    return lazy_forward(func, shape_impl, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 408, in __torch_function__
    if not builtins.all(
  File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 409, in <genexpr>
    issubclass(typ, (LazyTensor, Tensor, int, float, bool)) for typ in types
  File "/opt/conda/lib/python3.8/abc.py", line 102, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
RecursionError: maximum recursion depth exceeded while calling a Python object

Incompatible with einops

Thanks for your nice project. But it is not compatible with einops.
Einops is convenient for pytorch user, and I think many people use it.
I hope it works on einops, too. Thank you.

Stack overflow (endless loop) when gradients are disabled

I've just installed and tried out koila. However there seems to be an endless loop when applying it to my backbone model. It uses Conv1d and gradients are disabled. Also it seems like koila does not handle the permute operation.

Issues with "No custom methods found. Evaluating eagerly."

I tried this with a HuggingFace transformers model and set my batch size artificially large. Initially I saw the following before OOM memory.

DEBUG    __getattr__ called for pin_memory. Automatically resolving function. 
DEBUG    No custom methods found. Evaluating eagerly.

I changed the option of dataloader_pin_memory = False and got a little farther.

DEBUG    __getattr__ called for to. Automatically resolving function.
DEBUG    No custom methods found. Evaluating eagerly.

This was resolved by moving the data to the GPU (calling .to('cuda:0')) in the collator ( this is done in the model). The next error was..

DEBUG    __getattr__ called for float. Automatically resolving function
DEBUG    No custom methods found. Evaluating eagerly.

This one I'm not sure how to resolve and I'm not certain that "Evaluating eagerly" is even the issue. However, after the first one of those debug statements I see the OOM error. Any advice?

cannot get "device" attribute from LazyTensor

I have code that depends on getting the device on which the tensor is stored. The device is then used to initialize a new empty tensor that my model needs. Long story short, if tensor x is wrapped in LazyTensor then accessing x.device leads to an error.

Maybe you need to consider transparently exposing most (if not all) attributes of the wrapped tensor?

Maths domain error

I am using Koila to solve an OOM error during my training. But the following error occurs :
``Traceback (most recent call last):
File "/mnt/sdb2/Adama/configure_docker_for_transvw/pytorch/train.py", line 92, in
loss.backward()
File "/home/nanaa/.local/lib/python3.10/site-packages/koila/lazy.py", line 435, in backward
for mini_batch_size in gpus.split_batch(
File "/home/nanaa/.local/lib/python3.10/site-packages/koila/gpus.py", line 100, in split_batch
batch_size = 2 ** (math.floor(math.log2(max_batch)))
ValueError: math domain error```
Probably due to the value of max_batch ?

Typo in README

Just a typo for an incomplete sentence. Just wanted to let you know :)

koila/README.md

Line 150 in cca5830

 `Koila` solves that by eagerly evaluating when being converted to strings, integers, or any Python values. This way, when debugging 

This is fantastic, great work! Just to be clear...

Just making sure, this lazy wrapper somehow divvies up the computations per GPU budget, right? it doesn't just... sub-sample a smaller batch and ignore the remainder, right?

unet3d - koila.errors.UnsupportedError

I am trying to apply koila lazy eval on a Unet3D.

# defining the model
import torch
import torch.nn as nn
import torch.nn.functional as F


def conv3(in_channels, out_channels, stride, norm='BatchNorm3d', act='GELU'):
    return nn.Sequential(
            nn.Conv3d(in_channels, out_channels, 3, 1, 1),
            getattr(nn, norm)(out_channels),
            getattr(nn, act)())


def double_conv3(in_channels, out_channels, stride):
    return nn.Sequential(conv3(in_channels, out_channels, 1),
                         conv3(out_channels, out_channels, stride))

def merge_skip(x, skip):
    x = F.upsample(x, size=skip.shape[-3:], mode='trilinear', align_corners=True)
    return torch.cat((x,skip),dim=1)



class Unet3D(nn.Module):
    def __init__(self, in_channels, out_channels, num_layers=4, base=16):  
        super().__init__()
	
        enc_channels = [in_channels]+[base * 2**i for i in range(num_layers)]
        dec_channels = [base * 2**i for i in range(num_layers-1,-1,-1)]+[out_channels]

        self.encoders = nn.ModuleList()
        for i in range(len(enc_channels)-1):
            cin = enc_channels[i]
            cout = enc_channels[i+1]
            enc = double_conv3(cin, cout, 2)
            self.encoders.append(enc)

        self.decoders = nn.ModuleList()
        for i in range(len(dec_channels)-1):
            cin_skip = enc_channels[-i-2]
            cin_up = dec_channels[i]
            cin = cin_skip + cin_up 
            cout = dec_channels[i+1]
            dec = double_conv3(cin, cout, 1)	
            self.decoders.append(dec)

    def forward(self, x, return_all=False):
        out = [x]
        for encoder in self.encoders:
            x = encoder(x)
            out.append(x)
        n = len(out)
        for i, decoder in enumerate(self.decoders): 
            skip = out[n - 2 - i]
            x = merge_skip(out[-1], skip)
            x = decoder(x)
            out.append(x)

        if return_all:
            return out 
        else:
            return out[-1]

# test of koila on unet
def test_lazy():
    net = Unet3D(1,3)
    net.cuda()
    s = 64 
    b,c,d,h,w = 2,1,s,s,s
    x = torch.randn(b,c,d,h,w).cuda()
    t = torch.randint(0,3, (b,d,h,w)).cuda()

    loss_fn = nn.CrossEntropyLoss()
    net.zero_grad()

    lazy_x, lazy_t = lazy(x, t, batch=0)
    lazy_out = net(lazy_x)
    lazy_loss = loss_fn(lazy_out, lazy_t) 
    assert isinstance(lazy_loss, LazyTensor), type(lazy_loss)
    lazy_loss.backward()



# This fails
test_lazy()

This fails and outputs:

tensors = (tensor([[[[[-8.9936e-02, -7.9037e-02, -1.5048e-02,  ...,  2.9969e-01,
             2.9774e-01, -1.0489e-01],
        ...]]], device='cuda:0',
       grad_fn=<UpsampleTrilinear3DBackward1>), <koila.lazy.LazyTensor object at 0x7fa21bf99880>)
dim = 1, args = (), kwargs = {}, shapes = [torch.Size([2, 128, 64, 64, 64]), (2, 64, 64, 64, 64)]
no_dim = [torch.Size([2, 64, 64, 64]), (2, 64, 64, 64)], result_size = torch.Size([2, 64, 64, 64])
size = (2, 64, 64, 64)

    def cat(
        tensors: Sequence[TensorLike], dim: int = 0, *args: Any, **kwargs: Any
    ) -> PrePass:
        mute_unused_args(*args, **kwargs)

        if len(tensors) == 0:
            raise ValueError("Expected a sequence of tensors. Got empty sequence.")

        shapes = [t.size() for t in tensors]
        no_dim = [t[:dim] + t[dim + 1 :] for t in shapes]

        result_size = no_dim[0]
        for size in no_dim[1:]:
            if result_size != size:
                raise ValueError(
                    f"Dimension should be equal outside dim {dim}. Got {shapes}."
                )

        if len(set(interfaces.bat(t) for t in tensors)) != 1:
>           raise UnsupportedError
E           koila.errors.UnsupportedError

../miniconda3/envs/snakes/lib/python3.9/site-packages/koila/prepasses.py:423: UnsupportedError

getting-started.py failed!

I run the following code and set the input batch size as 20. (pytorch 1.10.0)
python example/getting-started.py
The errros.
Traceback (most recent call last):
File "/home/user/codes/koila/examples/getting-started.py", line 97, in
lazy_loss.backward()
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 439, in backward
mini_batch = self.run((total, total + mini_batch_size))
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 187, in run
return data.run(partial)
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 94, in _run
result = self.func(*real_args, **real_kwargs)
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (16) to match target batch_size (20).

Compatibility with GANs?

Not an issue, but a question. Would you think this works well and correctly in a GAN setting where two networks competing with each other?

Got an error when using lazy.

I'm doing a NMT task.I use my own data loading function rather than using torch dataset.I got an "int object doesn't has attribute 'size' " error.
Here's my data loading code:

def get_batches(sz, pad=0):
    for i in range(0, len(datatmp), sz):
        n=0
        srcdata = []
        trgdata = []
        for j in range(n, sz):
            srcdata.append(datatmp[i+j][0])
            trgdata.append(datatmp[i+j][1])
        a = randint(1, 2)
        src_max_seq_length=max([len(srcdata[i]) for i in range(len(srcdata))])
        trg_max_seq_length=max([len(trgdata[i]) for i in range(len(trgdata))])
        # pad src to src_max_seq_length
        for i in range(len(srcdata)):
            srcdata[i] = srcdata[i] + [pad for j in range(src_max_seq_length-len(srcdata[i]))]
        #pad trg to trg_max_seq_length
        for i in range(len(trgdata)):
            trgdata[i] = trgdata[i] + [pad for j in range(trg_max_seq_length-len(trgdata[i]))]

        sr = np.ndarray(shape=(sz, src_max_seq_length))
        tg = np.ndarray(shape=(sz, trg_max_seq_length))
        for i in range(len(srcdata)):
            for j in range(len(srcdata[i])):
                sr[i][j] = srcdata[i][j]
        for i in range(len(trgdata)):
            for j in range(len(trgdata[i])):
                tg[i][j] = trgdata[i][j]
        #srcdata = np.array(srcdata)
        #trgdata = np.array(trgdata)
        srcdata = torch.from_numpy(sr)
        trgdata = torch.from_numpy(tg)
        src = Variable(srcdata, requires_grad=False).long()
        trg = Variable(trgdata, requires_grad=False).long()
        yield Batch(src, trg, pad)#Batch is only a simple class
class Batch:
    "Object for holding a batch of data with mask during training."
    def __init__(self, src, trg=None, pad=0):
        self.src = src
        self.src_mask = (src != pad).unsqueeze(-2)
        if trg is not None:
            self.trg = trg[:, :-1]
            self.trg_y = trg[:, 1:]
            self.trg_mask = \
                self.make_std_mask(self.trg, pad)
            self.ntokens = (self.trg_y != pad).data.sum()
    
    @staticmethod
    def make_std_mask(tgt, pad):
        "Create a mask to hide padding and future words."
        tgt_mask = (tgt != pad).unsqueeze(-2)
        tgt_mask = tgt_mask & Variable(
            subsequent_mask(tgt.size(-1)).type_as(tgt_mask.data))
        return tgt_mask

ps:The code is adapted from 'Annotated Transformer'

rentruewang / koila Goto Github PK

koila's Introduction

👋 You've reached the GitHub profile of RenChu!

The languages I love and write:

koila's People

Contributors

Stargazers

Watchers

Forkers

koila's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs