and Python C extensions for best of both worlds.
See the snippets repository for more of my thoughts.
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.
Home Page: https://koila.rentruewang.com
License: MIT License
and Python C extensions for best of both worlds.
See the snippets repository for more of my thoughts.
Hey, I use Coqui-ai TTS through a simple Python code.
from TTS.api import TTS
tts = TTS(model_name="MODEL NAME",
progress_bar=True,
gpu=True)
tts.tts_to_file(text="TEXT", file_path='Audio.wav')
but I always get CUDA out of memory.
and I'm not really sure how to use Koila with my code or Coqui-ai tts in general. any help?
Hello, as who's suffering from "cuda out of memory" errors on Kaggle notebook, I can't wait to use your package. However, I run into errors when I try to install koila on both Kaggle and Colab notebooks.
Describe the bug
!pip install koila
outputs the following error message: ERROR: Could not find a version that satisfies the requirement koila (from versions: none) ERROR: No matching distribution found for koila
on Kaggle and Colab notebooks.
To Reproduce
Steps to reproduce the behavior:
!pip install koila
on Kaggle / ColabI'd appreciate it if anyone provides me with an alternate solution until this error gets fixed.
I'm wondering what makes it incompatible with python 3.7 because that is the python version I am using and I can't upgrade to 3.8
If it isn't already possible, it would be nice to be able to integrate this with the dataloader class
I'm still getting the memory error... wondering if it's because of conv layers or batch norm...
I am unable to install this from PyPi using Pip. I'm not sure why, but I opened this issue in case anyone else was having this problem and was searching here.
The output I get is this:
pip install koila
ERROR: Could not find a version that satisfies the requirement koila
ERROR: No matching distribution found for koila
Hello, I found this project is interesting. However, I found the lazy tensor mechanism is impossible to work with the PyTorch backward hooks, which makes it difficult to be used in combination with PyTorch checkpointing (https://pytorch.org/docs/stable/checkpoint.html). Checkpointing is a common way to avoid OOM in training.
I'm planning on making a major overhaul, to simplify the code and make it more scalable.
Currently this project relies too much on checks to determine if an object is a LazyTensor
or a torch.Tensor
, however, it's not only difficult to maintain, but can also negatively affect performance.
I'm on my way to create a new wrapper for torch.Tensor
that matches LazyTensor
's API but executes immediately for internal use.
Also, I'm modifying the LazyTensor
's API to match torch.Tensor
's.
I'll be using this issue to track my progress.
Hi! I'm with Pytorch team and it looks like we are also building something similar to koila here: https://github.com/pytorch/pytorch/tree/lazy_tensor_staging . We would love to connect and learn more about your work!
If you are interested, could you please reply to this issue and drop me a line at k o r o v a i k o n AT gmail.com (no spaces obviously)
Hi, this project could be revolutionary, if only I knew how to use it :)
You surely heard of Big Sleep, right? Using CLIP and BIGGAN, from just a line of text it's capable of generating amazing visuals and unique works of art, which is why is getting more and more popular among an ever growing number of artists and curious people who have been deeply fascinated by the potential of these techniques...
However many of us have not been able to run these kind of projects on our machines because of low VRAM in consumer GPUs and crazy market prices and ended up stumbling almost immediately on the infamous CUDA Memory Error... (Yes, Google Colab is nice and all, but running this projects locally makes for a totally different kind of "technological chill" if you know what I mean :) )
So, I was thinking, would it be possible to apply Koila to Big Sleep, to fix those errors?
If so, that'd be a game changer! It would at the same time benefit a huge number of users, and translate into massive traction for Koila!
Looking at the README I thought the whole process would have been very simple so I tried looking at it myself... but in the end I had to give up because I've just approached this field and I still miss much of the necessary background to figure out these kind of details.
So yeah, would you consider providing a short example for this use case of Koila + Big Sleep, if feasible? In that case just a few lines of code could potentially mean the beginning of a little revolution :)
Thanks for your nice work!
I wrapped my input and label (feat, label) = lazy(feat, label, batch=0)
Then I met the following error when running it.
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/lazy.py", line 504, in lazy_forward
out = LazyTensor(LazyFunction(func, shape_func)(*args, **kwargs))
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/lazy.py", line 51, in __call__
prepass = self.prepass_func(*args, **kwargs)
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/prepasses.py", line 286, in tranpose
batch = b.map(lambda x: {dim0: dim1, dim1: dim0}[x])
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/interfaces.py", line 78, in map
index = func(self.index)
File "/home/victor/anaconda3/envs/py38_tab/lib/python3.8/site-packages/koila/prepasses.py", line 286, in
batch = b.map(lambda x: {dim0: dim1, dim1: dim0}[x])
KeyError: 0
If somehow we can integrate this with hugging face models while doing inference then its job is done for production-level deployments.
Hello,
I noticed you fix the lazy label bug and the getting-started.py is able to run.
But it can not pass the assertion. The grad diff is quite large!
assert all(
[print(torch.max(grad - lazy_grad)) for (grad, lazy_grad) in zip(grads, lazy_grads)]
)
tensor(0.0698)
tensor(0.0227)
tensor(0.0717)
tensor(0.0415)
tensor(0.5402)
tensor(0.7869)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 420, in __torch_function__
return lazy_forward(func, shape_impl, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 572, in lazy_forward
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 408, in __torch_function__
if not builtins.all(
File "/opt/conda/lib/python3.8/site-packages/koila/tensors.py", line 409, in <genexpr>
issubclass(typ, (LazyTensor, Tensor, int, float, bool)) for typ in types
File "/opt/conda/lib/python3.8/abc.py", line 102, in __subclasscheck__
return _abc_subclasscheck(cls, subclass)
RecursionError: maximum recursion depth exceeded while calling a Python object
Thanks for your nice project. But it is not compatible with einops.
Einops is convenient for pytorch user, and I think many people use it.
I hope it works on einops, too. Thank you.
I've just installed and tried out koila. However there seems to be an endless loop when applying it to my backbone model. It uses Conv1d and gradients are disabled. Also it seems like koila does not handle the permute operation.
I tried this with a HuggingFace transformers model and set my batch size artificially large. Initially I saw the following before OOM memory.
DEBUG __getattr__ called for pin_memory. Automatically resolving function.
DEBUG No custom methods found. Evaluating eagerly.
I changed the option of dataloader_pin_memory = False
and got a little farther.
DEBUG __getattr__ called for to. Automatically resolving function.
DEBUG No custom methods found. Evaluating eagerly.
This was resolved by moving the data to the GPU (calling .to('cuda:0')) in the collator ( this is done in the model). The next error was..
DEBUG __getattr__ called for float. Automatically resolving function
DEBUG No custom methods found. Evaluating eagerly.
This one I'm not sure how to resolve and I'm not certain that "Evaluating eagerly" is even the issue. However, after the first one of those debug statements I see the OOM error. Any advice?
I have code that depends on getting the device
on which the tensor is stored. The device
is then used to initialize a new empty tensor that my model needs. Long story short, if tensor x
is wrapped in LazyTensor
then accessing x.device
leads to an error.
Maybe you need to consider transparently exposing most (if not all) attributes of the wrapped tensor?
I am using Koila to solve an OOM error during my training. But the following error occurs :
``Traceback (most recent call last):
File "/mnt/sdb2/Adama/configure_docker_for_transvw/pytorch/train.py", line 92, in
loss.backward()
File "/home/nanaa/.local/lib/python3.10/site-packages/koila/lazy.py", line 435, in backward
for mini_batch_size in gpus.split_batch(
File "/home/nanaa/.local/lib/python3.10/site-packages/koila/gpus.py", line 100, in split_batch
batch_size = 2 ** (math.floor(math.log2(max_batch)))
ValueError: math domain error```
Probably due to the value of max_batch ?
Just a typo for an incomplete sentence. Just wanted to let you know :)
Line 150 in cca5830
Just making sure, this lazy wrapper somehow divvies up the computations per GPU budget, right? it doesn't just... sub-sample a smaller batch and ignore the remainder, right?
I am trying to apply koila lazy eval on a Unet3D.
# defining the model
import torch
import torch.nn as nn
import torch.nn.functional as F
def conv3(in_channels, out_channels, stride, norm='BatchNorm3d', act='GELU'):
return nn.Sequential(
nn.Conv3d(in_channels, out_channels, 3, 1, 1),
getattr(nn, norm)(out_channels),
getattr(nn, act)())
def double_conv3(in_channels, out_channels, stride):
return nn.Sequential(conv3(in_channels, out_channels, 1),
conv3(out_channels, out_channels, stride))
def merge_skip(x, skip):
x = F.upsample(x, size=skip.shape[-3:], mode='trilinear', align_corners=True)
return torch.cat((x,skip),dim=1)
class Unet3D(nn.Module):
def __init__(self, in_channels, out_channels, num_layers=4, base=16):
super().__init__()
enc_channels = [in_channels]+[base * 2**i for i in range(num_layers)]
dec_channels = [base * 2**i for i in range(num_layers-1,-1,-1)]+[out_channels]
self.encoders = nn.ModuleList()
for i in range(len(enc_channels)-1):
cin = enc_channels[i]
cout = enc_channels[i+1]
enc = double_conv3(cin, cout, 2)
self.encoders.append(enc)
self.decoders = nn.ModuleList()
for i in range(len(dec_channels)-1):
cin_skip = enc_channels[-i-2]
cin_up = dec_channels[i]
cin = cin_skip + cin_up
cout = dec_channels[i+1]
dec = double_conv3(cin, cout, 1)
self.decoders.append(dec)
def forward(self, x, return_all=False):
out = [x]
for encoder in self.encoders:
x = encoder(x)
out.append(x)
n = len(out)
for i, decoder in enumerate(self.decoders):
skip = out[n - 2 - i]
x = merge_skip(out[-1], skip)
x = decoder(x)
out.append(x)
if return_all:
return out
else:
return out[-1]
# test of koila on unet
def test_lazy():
net = Unet3D(1,3)
net.cuda()
s = 64
b,c,d,h,w = 2,1,s,s,s
x = torch.randn(b,c,d,h,w).cuda()
t = torch.randint(0,3, (b,d,h,w)).cuda()
loss_fn = nn.CrossEntropyLoss()
net.zero_grad()
lazy_x, lazy_t = lazy(x, t, batch=0)
lazy_out = net(lazy_x)
lazy_loss = loss_fn(lazy_out, lazy_t)
assert isinstance(lazy_loss, LazyTensor), type(lazy_loss)
lazy_loss.backward()
# This fails
test_lazy()
This fails and outputs:
tensors = (tensor([[[[[-8.9936e-02, -7.9037e-02, -1.5048e-02, ..., 2.9969e-01,
2.9774e-01, -1.0489e-01],
...]]], device='cuda:0',
grad_fn=<UpsampleTrilinear3DBackward1>), <koila.lazy.LazyTensor object at 0x7fa21bf99880>)
dim = 1, args = (), kwargs = {}, shapes = [torch.Size([2, 128, 64, 64, 64]), (2, 64, 64, 64, 64)]
no_dim = [torch.Size([2, 64, 64, 64]), (2, 64, 64, 64)], result_size = torch.Size([2, 64, 64, 64])
size = (2, 64, 64, 64)
def cat(
tensors: Sequence[TensorLike], dim: int = 0, *args: Any, **kwargs: Any
) -> PrePass:
mute_unused_args(*args, **kwargs)
if len(tensors) == 0:
raise ValueError("Expected a sequence of tensors. Got empty sequence.")
shapes = [t.size() for t in tensors]
no_dim = [t[:dim] + t[dim + 1 :] for t in shapes]
result_size = no_dim[0]
for size in no_dim[1:]:
if result_size != size:
raise ValueError(
f"Dimension should be equal outside dim {dim}. Got {shapes}."
)
if len(set(interfaces.bat(t) for t in tensors)) != 1:
> raise UnsupportedError
E koila.errors.UnsupportedError
../miniconda3/envs/snakes/lib/python3.9/site-packages/koila/prepasses.py:423: UnsupportedError
I run the following code and set the input batch size as 20. (pytorch 1.10.0)
python example/getting-started.py
The errros.
Traceback (most recent call last):
File "/home/user/codes/koila/examples/getting-started.py", line 97, in
lazy_loss.backward()
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 439, in backward
mini_batch = self.run((total, total + mini_batch_size))
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 187, in run
return data.run(partial)
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/koila/tensors.py", line 94, in _run
result = self.func(*real_args, **real_kwargs)
File "/home/user/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (16) to match target batch_size (20).
Not an issue, but a question. Would you think this works well and correctly in a GAN setting where two networks competing with each other?
I'm doing a NMT task.I use my own data loading function rather than using torch dataset.I got an "int object doesn't has attribute 'size' " error.
Here's my data loading code:
def get_batches(sz, pad=0):
for i in range(0, len(datatmp), sz):
n=0
srcdata = []
trgdata = []
for j in range(n, sz):
srcdata.append(datatmp[i+j][0])
trgdata.append(datatmp[i+j][1])
a = randint(1, 2)
src_max_seq_length=max([len(srcdata[i]) for i in range(len(srcdata))])
trg_max_seq_length=max([len(trgdata[i]) for i in range(len(trgdata))])
# pad src to src_max_seq_length
for i in range(len(srcdata)):
srcdata[i] = srcdata[i] + [pad for j in range(src_max_seq_length-len(srcdata[i]))]
#pad trg to trg_max_seq_length
for i in range(len(trgdata)):
trgdata[i] = trgdata[i] + [pad for j in range(trg_max_seq_length-len(trgdata[i]))]
sr = np.ndarray(shape=(sz, src_max_seq_length))
tg = np.ndarray(shape=(sz, trg_max_seq_length))
for i in range(len(srcdata)):
for j in range(len(srcdata[i])):
sr[i][j] = srcdata[i][j]
for i in range(len(trgdata)):
for j in range(len(trgdata[i])):
tg[i][j] = trgdata[i][j]
#srcdata = np.array(srcdata)
#trgdata = np.array(trgdata)
srcdata = torch.from_numpy(sr)
trgdata = torch.from_numpy(tg)
src = Variable(srcdata, requires_grad=False).long()
trg = Variable(trgdata, requires_grad=False).long()
yield Batch(src, trg, pad)#Batch is only a simple class
class Batch:
"Object for holding a batch of data with mask during training."
def __init__(self, src, trg=None, pad=0):
self.src = src
self.src_mask = (src != pad).unsqueeze(-2)
if trg is not None:
self.trg = trg[:, :-1]
self.trg_y = trg[:, 1:]
self.trg_mask = \
self.make_std_mask(self.trg, pad)
self.ntokens = (self.trg_y != pad).data.sum()
@staticmethod
def make_std_mask(tgt, pad):
"Create a mask to hide padding and future words."
tgt_mask = (tgt != pad).unsqueeze(-2)
tgt_mask = tgt_mask & Variable(
subsequent_mask(tgt.size(-1)).type_as(tgt_mask.data))
return tgt_mask
ps:The code is adapted from 'Annotated Transformer'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.