ai-forever / ru-dalle Goto Github PK

View Code? Open in Web Editor NEW

1.6K 37.0 248.0 27.57 MB

Generate images from texts. In Russian

Home Page: https://rudalle.ru/

License: Apache License 2.0

Python 1.23% Jupyter Notebook 98.77%

image-generation text-to-image python pytorch dalle openai russian russian-language transformer

ru-dalle's Introduction

ruDALL-E

Generate images from texts

pip install rudalle==1.1.3

🤗 HF Models:

ruDALL-E Malevich (XL)
ruDALL-E Emojich (XL) (readme here)
ruDALL-E Surrealist (XL)
ruDALL-E Kandinsky (XXL) (soon)

Minimal Example:

Example usage ruDALL-E Malevich (XL) with 3.5GB vRAM!

Finetuning example

generation by ruDALLE:

import ruclip
from rudalle.pipelines import generate_images, show, super_resolution, cherry_pick_by_ruclip
from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_realesrgan
from rudalle.utils import seed_everything

# prepare models:
device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device)
tokenizer = get_tokenizer()
vae = get_vae(dwt=True).to(device)

# pipeline utils:
realesrgan = get_realesrgan('x2', device=device)
clip, processor = ruclip.load('ruclip-vit-base-patch32-384', device=device)
clip_predictor = ruclip.Predictor(clip, processor, device, bs=8)
text = 'радуга на фоне ночного города'

seed_everything(42)
pil_images = []
scores = []
for top_k, top_p, images_num in [
    (2048, 0.995, 24),
]:
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, bs=8, top_p=top_p)
    pil_images += _pil_images
    scores += _scores

show(pil_images, 6)

auto cherry-pick by ruCLIP:

top_images, clip_scores = cherry_pick_by_ruclip(pil_images, text, clip_predictor, count=6)
show(top_images, 3)

super resolution:

sr_images = super_resolution(top_images, realesrgan)
show(sr_images, 3)

text, seed = 'красивая тян из аниме', 6955

Image Prompt

see jupyters/ruDALLE-image-prompts-A100.ipynb

text, seed = 'Храм Василия Блаженного', 42
skyes = [red_sky, sunny_sky, cloudy_sky, night_sky]

VideoDALL-E | ruCogVideo by @cene555

Video generation example Finetuning example

Aspect ratio images -->NEW<--

Kandinsky 12B

Request access: Here

роботы акварелью в стиле ван гога

FID = 15.4 (COCO Valid)

🚀 Contributors 🚀

@bes shared great idea and realization with IDWT for decoding images with higher quality 512x512! 😈💪 thanks a lot for your constructive advices, appreciate it
@neverix thanks a lot for contributing for speed up of inference
@Igor Pavlov trained model and prepared code with super-resolution
@oriBetelgeuse thanks a lot for easy API of generation using image prompt
@Alex Wortega created first FREE version colab notebook with fine-tuning ruDALL-E Malevich (XL) on sneakers domain 💪
@Anton Lozhkov Integrated to Huggingface Spaces with Gradio, see here

Supported by

Social Media

ru-dalle's People

Contributors

Stargazers

Watchers

Forkers

odv052 alexwortega aamamaev higal sav6622 viktortkachenko vitaliytalyh kiku-jw inews2 bluesky314 oleg3190 kensinghton pauldok zhanghongyong123456 techthiyanes d-egorov02 dmitryshv neverix sobakapavlova furingleb mikeyhodl aburan doytsujin musicnova skylion007 afiaka87 ak391 luc-leonard zerohackz cmeninwa andrewtrefilov coinindexagency teraform555 highcwu daniel-kurushin gilza1639 myjcom thedenk vk4arm ilyamirin netotsamy xjohnxjohn lapsimoon vkirilenko minimaxir kailuenliang avalanche vladimirshleyev strangetcy cxz sayhelloroman ultraivanov mishav78 chenus2018 shonenkov neuroidss bes-dev killjoy505 dimwap margaritawraith tomeandrew baekpica limbicnation licoriceleaf mrbananahuman uribah kapitsa2811 jhads romandevjavascript jack000 ollmer jorik041 vnikme mazeofith ckcreative edcase142 gyhandy ml-and-ai-repo arnabgho sunilkgrao arvitaly ikasumi cat2 brunoreisportela colonnade-consulting-ltd nebula-system projectsstartup fred7b gustavecortal sts0mrg0 servusoft ruze00 haythembelhadj anvarknian ulugbek98 artemvergazov mikalaj jesdi devgangsters captainrock

ru-dalle's Issues

ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

I'm getting this error recently:

import transformers
import more_itertools
from tqdm.auto import tqdm

from rudalle.pipelines import show, cherry_pick_by_clip
from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_ruclip
from rudalle.utils import seed_everything, torch_tensors_to_pil_list

# prepare models:
device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')

causes

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_209/2921329298.py in <module>
      3 from tqdm.auto import tqdm
      4 
----> 5 from rudalle.pipelines import show, cherry_pick_by_clip
      6 from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_ruclip
      7 from rudalle.utils import seed_everything, torch_tensors_to_pil_list

~/.conda/envs/default/lib/python3.9/site-packages/rudalle/__init__.py in <module>
      6 from .ruclip import get_ruclip
      7 from .emojich_unet import get_emojich_unet
----> 8 from . import vae, dalle, tokenizer, realesrgan, pipelines, ruclip, image_prompts
      9 
     10 

~/.conda/envs/default/lib/python3.9/site-packages/rudalle/pipelines.py in <module>
      4 from os.path import join
      5 
----> 6 import cv2
      7 import torch
      8 import torchvision

~/.conda/envs/default/lib/python3.9/site-packages/cv2/__init__.py in <module>
      6 import sys
      7 
----> 8 from .cv2 import *
      9 from .cv2 import _registerMatType
     10 from . import mat_wrapper

ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

I cannot do any install since I'm running on Sagemaker Studio Lab (install with apt-get not supported apparently). This was not happening in the previous versions.

Finetuning script at the Kaggle notebook provides no output

There is error in the Kaggle notebook with example code for "Emojich" finetuning, that published here:
https://www.kaggle.com/shonenkov/emojich-rudall-e
When finetuning with a custom dataset, on the first try the training was interrupted early, because of the line:
EARLY_STOP = True
This value was changed to "False", but then training continues until the kernel stops after 6 hours and 34 minutes, due to limitation in Kaggle. After that, there is no output data available.
The question is: how to set training to stop after a certain time (or a certain number of steps or epochs) to make saved checkpoints available after the end of execution?

Hosted inference API at Hugging Face issue

Wanted to test out your model at Hugging Face but got the following

Do you guys plan to support deployed API endpoint there or anywhere else?

PS: demos at https://rudalle.ru/ are very entertaining, thanks for you work, guys :)

"Small" model weights can`t be downloaded

get_rudalle_model('small', pretrained=True, fp16=True, device=device)

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co//resolve/main

Where can be found this weights?

CUDA out of memory

При попытке запуститься говорит о нехватке памяти видеокарты RTX 3050ti.
When trying to start, it indicates a lack of memory of the RTX 3050ti video card.

◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality. x4 --> ready tokenizer --> ready Working with z of shape (1, 256, 32, 32) = 262144 dimensions. vae --> ready ruclip --> ready Traceback (most recent call last): File ".\Text Document.py", line 12, in <module> ruclip = ruclip.to(device) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 899, in to return self._apply(convert) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) [Previous line repeated 3 more times] File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 593, in _apply param_applied = fn(param) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 897, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 2.86 GiB already allocated; 0 bytes free; 2.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Kernel dies with GTX 1050 4GB

My Jupyeter notebook kernel dies (The kernel appears to have died. It will restart automatically.)when trying to load the main model after downloading it:

device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')

I have split cells for the vae, tokenizer and clip that all load fine. My nvidia-smi is the following:

Total GPU RAM: 3.94 Gb
CPU: 4
RAM GB: 7.8
PyTorch version: 1.10.1+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.8886
Tue Jan  4 18:22:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 45%   25C    P0    N/A /  75W |    849MiB /  4033MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1104      G   /usr/lib/xorg/Xorg                 84MiB |
|    0   N/A  N/A      1682      G   /usr/bin/gnome-shell               31MiB |
|    0   N/A  N/A     12024      G   ...AAAAAAAA== --shared-files       38MiB |
|    0   N/A  N/A     13204      C   /usr/bin/python                   689MiB |
+-----------------------------------------------------------------------------+

while system mem is

loreto@ombromanto:~/Projects/notebooks/rudalle$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7,8G        3,8G        2,8G         97M        1,1G        3,6G
Swap:          2,0G        993M        1,0G

while cpu unit is


loreto@ombromanto:~/Projects/notebooks/rudalle$ cat /proc/cpuinfo  | grep 'name'| uniq
model name	: Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz

With this configuration I'm able to load models like CLIP, GLIDE, LAMA, etc with minor limitations.

I have also tried to follow this approach:

device = 'cpu'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=False, device=device, cache_dir='./')
if has_cuda:
     device = 'cuda'
     dalle.to(device)

loading the model in cpu and moving to cuda, but still getting the notebook issue:

[D 18:22:25.471 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:25.476 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[D 18:22:25.477 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:22:30.023 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:30.024 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[I 18:23:41.356 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05 restarted
[D 18:23:41.831 NotebookApp] Starting kernel: ['/usr/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/loreto/.local/share/jupyter/runtime/kernel-464591fd-7e62-4cd7-80e8-0ac4f3f9ac05.json']
[D 18:23:42.303 NotebookApp] Connecting to: tcp://127.0.0.1:36147
[D 18:23:44.736 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (starting)
[D 18:23:44.759 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:23:44.761 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:23:45.040 NotebookApp] 200 GET /static/base/images/favicon-notebook.ico (127.0.0.1) 122.080000ms
[D 18:23:46.533 NotebookApp] 200 GET /api/contents/rudalle/Malevich_3_5GB_vRAM_usage.ipynb?content=0&_=1641316902647 (127.0.0.1) 19.390000ms
[D 18:23:54.294 NotebookApp] KernelRestarter: restart apparently succeeded

Of course in this case it would be necessary to convert to FP16 doing like dalle.convert_to_fp16() but I'm not sure how to do that.

XXL Model

Hey, good afternoon.
Do you guys plan to release the XXL model to the public in the near future?

image_prompts.py – borders crop not working properly

From an official documentation:

borders (dict[str] | int): borders that we croped from pil_image
example: {'up': 4, 'right': 0, 'left': 0, 'down': 0} (1 int eq 8 pixels)

Up crop works just fine. But if I will pass as a crop argument something other than "Up" in the result, I will get an AssertionError:

Thank you for a fantastic algo ✨

Auto cut pictures into separated images

Есть ли какие-нибудь параметры, которые автоматически нарежут и сохранят сгенерированные картинки по отдельности?

Are there any args that will automatically cut and save separated images?

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

i use default code and get error after generation 100%
please help i use windows and conda

`◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
x4 --> ready
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
100%|██████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:46<00:00, 22.14it/s]
Traceback (most recent call last):
File "gen.py", line 29, in
_pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\pipelines.py", line 60, in generate_images
images = vae.decode(codebooks)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\vae\model.py", line 38, in decode
img = self.model.decode(z)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\vae\model.py", line 98, in decode
quant = self.post_quant_conv(quant)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([3, 256, 32, 32], dtype=torch.float, device='cuda', requires_grad=True).to(memory_format=torch.channels_last)
net = torch.nn.Conv2d(256, 256, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float().to(memory_format=torch.channels_last)
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = true
allow_tf32 = true
input: TensorDescriptor 0000020481F094B0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 3, 256, 32, 32,
strideA = 262144, 1, 8192, 256,
output: TensorDescriptor 0000020481F09590
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 3, 256, 32, 32,
strideA = 262144, 1, 8192, 256,
weight: FilterDescriptor 000001FFD2E76AF0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NHWC
nbDims = 4
dimA = 256, 256, 1, 1,
Pointer addresses:
input: 0000001538C7D000
output: 000000153B87D000
weight: 00000014D3BB0000
`

Generated images have istock watermark

Many generated images have istock watermark. How to remove/prevent this from happening?

Question about hard code part in model.py

Hi there :)
First of all, I really want to thank you for sharing such a great work to the public. Because of your works, it was able to experiment and research text to image studies. Great Job!

I have a question about hard code part in model.py(from line 101 to line 116).
My question is 'What is the purpose of the hard code part?'
Is there a specific reason you guys put that part?

That is my question. I thank you again for the great works! I hope you guy early happy Christmas and a happy New Year!

missing VAE encoder with DWT

@shonenkov Great work everyone!
As far as I can tell, there is only VAE decoder with DWT and no corresponding encoder.
Encoding with get_vae(dwt=True) produces the same number of tokens as get_vae(dwt=False) on the same picture size but they are different. And the DWT decoder doubles the original image size. The result is large but blurry and I see quality loss even after reducing to the original image size. The image decoded-encoded with default VQ GAN model still seems to be better than the DWT model.
@bes-dev Is this due to the need of re-training the model end to end you mentioned in #42 ?
I would expect the compatible VAE DWT encoder encode 512x512 image into 1024 tokens and the decoder restore the image back to 512x512.
I think for now VAE with DWT needs 256x256 image prompts rather than 512x512 but then the resulting quality is unfortunately not worth the effort. Looking forward to see DALL-E trained end-to-end on 512 images.

Add pre-commit CI

Seeing how you already have a pre-commit config, you should add https://pre-commit.ci/ so that all PRs are automatically checked and formatted against the existing style config. (It even applies autofixes to PRs).

[Colab] Text embedding optimization

Input image: https://static8.depositphotos.com/1370441/848/i/600/depositphotos_8486144-stock-photo-beach-and-tropical-sea.jpg

Input text: 'elon musk'

Result: image and image

Colab that runs out of memory: https://colab.research.google.com/drive/1ancv6fQMrzaz67Ikvfv3wnjlwpWsoebO?usp=sharing

My method is to optimize the text embedding of the transformer, in order to make the output closer to the input image. Same thing as fine-tuning, but optimizing text embeddings, instead of model weights. I had to modify model's forward pass to make it retain the gradient. Sorry for the messy code

Also, I wonder if it's possible to generate the same picture every time? This may be a way to do text-based image modification. I tried removing temperature and filtering, didn't help. Seed is always the same(presumably).

New 512x with image inputs

Hi there, I just wanted to say this is fantastic work you've done here, I'm amazed at the results and really look forward to how your team progresses with this.

As for my query, I've been trying to work out how to use image prompts for the new 512x512 notebook (Malevich-3.5GB-vRAM-usage.ipynb) but I haven't managed to make it work, I was wondering if you could help?

Не запускается generate_images

Пытаюсь запустить на device = 'cpu'.
Пример из README самый первый

Падает с таким трейсбеком. Что я делаю не так?

◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
x4 --> ready
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
  0%|          | 0/1024 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\pipelines.py", line 46, in generate_images
    logits, has_cache = dalle(out, attention_mask,
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\fp16.py", line 51, in forward
    return fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs))
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\model.py", line 150, in forward
    transformer_output, present_has_cache = self.transformer(
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\transformer.py", line 76, in forward
    hidden_states, present_has_cache = layer(hidden_states, mask, has_cache=has_cache, use_cache=use_cache)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\transformer.py", line 146, in forward
    layernorm_output = self.input_layernorm(hidden_states)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\normalization.py", line 173, in forward
    return F.layer_norm(
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\functional.py", line 2346, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Smaller / Distilled model?

Will there be a smaller or a distilled model release? The problem with inferencing in google colab is the speeds. 4:32 for one image on a P100, and 2 hours+ for 3 images on K80.

Can you make cpu device readme please?

I use macbook air m1 and want to run this repo, but can not do it on 'cuda' device

Test results

Hello! I have tested your neuronet and reached this results. They are so good!

But no kidding, it's a good idea.

Commercial Use?

Fabulous work! Can we use your code and models for commercial purposes?

[1] Error: The system cannot find the path specified.

Ran this in the ruDALLE-image-prompts-A100.ipynb JupyterLab notebook and am getting this error:

!pip install rudalle==0.0.1rc2 > /dev/null
The system cannot find the path specified.

How to fix?

Updates decoder to allow generation of 512x512 images

Thanks for your excellent work!

So, as I investigated from the code, currently, your pipeline has a quality bottleneck: DALL-E can predict latent code of the VQVAE only 1024 tokens length. Vanilla VQVAE can decode this code to an image with a size of 256x256. But there are some tricks (described in my article MobileStyleGAN): instead of predicting the image in the pixel domain, we can predict discrete wavelet transform of the target image. DWT transforms image size of [3; H, W] to tensor [12; H/2, W/2] and there is inverse transform without accuracy drop. For the VQVAE, we can predict x2 larger images from the same latent codes size of 1024 tokens.

I built a pipeline (based on knowledge distillation) to train a modified VQVAE decoder that uses this trick. I got a baseline checkpoint that allows you to generate images 512x512 using your pre-trained ruDALL-E model. My results are available as open-source project here (link to baseline checkpoint also available): https://github.com/bes-dev/vqvae_dwt_distiller.pytorch.

I think that results can be better if we can train the whole DALL-E end-to-end using my version of the decoder (currently, I didn't modify codebook to save compatibility with ruDALL-E checkpoint).

The error in ruDall-e code that published in Kaggle

Execution of ruDall-e code in the Kaggle notebook (as is published), in GPU session ends with error:

ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_29/1914141142.py in <module>
----> 1 from rudalle.pipelines import generate_images, show, super_resolution, cherry_pick_by_clip
      2 from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_realesrgan, get_ruclip
      3 from rudalle.utils import seed_everything

ModuleNotFoundError: No module named 'rudalle'

The error message refers to this code:

!pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html > /dev/null
!pip install rudalle==0.0.1rc1 > /dev/null

Ru-Dalle with HuggingFace Accelarated Inference API - Query

Firstly, Thanks for developing this great model , the library and the demo at HuggingFace.

My question is :

Can we use Ru-dalle model using the HuggingFace Accelerated inference API now? I couldn't find any details from huggingface website. if yes, kindly provide the Inference API usage details.

Tokenizer decoding bug

It seems like the tokenizer ignores the first letter when it's uppercase, chaining encode_text + decode_text shows that. What could be the source of this bug? Is this the intended behavior?

image prompts

The model uses an uploaded image as a background image. Is it possible to use image as initial state?

D

ImageNet classification with ru-dalle?

Hi Team,
Thanks for the excellent contribution to open source.
I've been trying to adapt your code. I'm mostly focused on getting image embeddings from the given image and train a classifier on top of it. I guess dalle code is composed on text and image embeddings.
Any direction on generation image feature vector, what part of code I should modify?

Any help would be greatly appreciated.

Thanks.

Ru dalee

RuntimeError: CUDA error: out of memory

I've been running this code fine and now it doesn't work as of today: https://github.com/sberbank-ai/ru-dalle

Using Anaconda on windows 10. I have an RTX 3090 GPU. RuntimeError: CUDA error: out of memory

I get this message when I run from anaconda command line: python rdalle_generate_images.py

=====================
\anaconda3\lib\site-packages\rudalle\dalle\model.py:77: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

RuntimeError: CUDA error: out of memory

Any idea how to fix? Should I download the code repo and try to run again?

not working show(pil_images, 6)

pil_images = []
scores = []
for top_k, top_p, images_num in [
    (2048, 0.995, 3),
    (1536, 0.99, 3),
#    (1024, 0.99, 3),
#    (1024, 0.98, 3),
#    (512, 0.97, 3),
#    (384, 0.96, 3),
#    (256, 0.95, 3),
#    (128, 0.95, 3), 
]:
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
    pil_images += _pil_images
    scores += _scores

show(pil_images, 6)

Sparse attention support

Currently, the inference code creates the entire attention matrix and then masks it. Sparse attention implementations like Triton are more efficient. Does the pre-training code support sparse attention? Will it ever be released?

how to make better resolution and save images?

I dont undestand how to make better resolution or save directly? why command show([pil_image for pil_image, score in sorted(zip(pil_images, scores), key=lambda x: -x[1])] , 6) show pictrues so small ? and if i use
top_images, clip_scores = cherry_pick_by_clip(pil_images, text, ruclip, ruclip_processor, device=device, count=1)
show(top_images, 3)
there will be full resolution? how to make all generated photos in full ress or how to save them directly to file?

How to generate only 1 image? cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Setting images_num or bs to 1 results in following error

Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Software\Python39\Lib\site-packages\rudalle\vae\model.py", line 98, in decode
quant = self.post_quant_conv(quant)
File "D:\Software\Python39\Lib\site-packages\rudalle\vae\model.py", line 38, in decode
img = self.model.decode(z)

Problem about the PyTorch vision?

I have look for the issues but I can't find the same problem. So sorry to bother you.
GPU:

my python environment: pytorch=1.8.0&torchvision=0.9.0, cudatoolkit=11.3.1&cudnn =8.2.1. I have tried the rudalle=0.3.0 just following the readme.md, or 0.0.1rc5 by the RTX3090.ipynb, but I only got the following error!

So I wanna know if any problem in my environment? Waiting for your reply!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx

I have a GTX 1660 SUPER 6 gb vram, Ubuntu 20.04, Python 3.9, Driver Version: 460.91.03 , CUDA Version: 11.2

At this stage:
3%|███▏ | 27/1024
I get an error:

.../ru-dalle/main.py", line 31, in <module>
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
..........
.../ru-dalle/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

The video memory used at the time of the error:
4893MiB / 5936MiB

Also, at the very beginning of generation, I get a warning:
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

MultiGPU support ETA?

I'd like to finetune my own model with your code, but the DalleModel as it is now doesn't seem to support the pytorch DistributedDataParallel wrapper.
Do you plan to add such an option?

Im trying create loop to generate pictures how to set random seed?

What best way to generate infinity pictures and save them? i try create loop but cant set seed to random, random python function not random at all. Please help

Minor error in Finetuning example code

In 'freeze' method, there is trivial error i think.

When user intend to freeze attention layers, user will set freeze_attn=True.

But, there is no layers contained 'attn' in names of model.module.named_parameters()! (It can not filtered with condition "elif 'attn' in name:")
So, when user freezes 'other' layers, layers contained 'attention' will be freezed, too.

It will be proper to revise "'attn' in name" to "'attention' in name".

Anyway, thank you all developers of ruDALL-E for providing awesome pre-trained model!

Multiple texts lines for generation

Training other languages

Hello, how to train other languages? Thanks

What is the recent V2 model for?

As seen inside https://huggingface.co/sberbank-ai/rudalle-Malevich/tree/main

What does the new pytorch model (v2) bring?

RealESRGAN fp16 inference problem

After merging #54, super resolution in FP16 is no longer working for me.
I am trying to determine if this is a code issue, or yet another problem with using FP16 on a GTX 1660S (like #11).

Code to reproduce

https://gist.github.com/blue-fish/ae69ee32681a5be232538c96de15a1e4

Original image (128x128)

Results

	x2	x4
fp16
fp32

EXIF rotation and accessible colab for image completion with more features

hi, like others, i prepared a colab with more accessible GUI and presets here:
https://gist.github.com/eyaler/0cee9a71f5dd3fdfa9c0c03656ebdd4c
this is oriented for easily doing batch experiments and you can see some results here:
https://twitter.com/eyaler/status/1468682110860992521
there are some very simple things you may want to take from the notebook as fixing image rotation due to EXIF:
im = ImageOps.exif_transpose(Image.open(file))
and fixing the aspect ratio in post-process back to the original (although it would be better to use non-square encoding).
thanks fur rudalle!

Website Enhancement: Replace word captcha with different captcha

For example, captcha with russian words from yandex or captcha where you need to place some kind of figure (triangle, oval, square) into their place on the picture. Its much more handy for the users and your current captcha on website is hard to use, letters are often hard to detect.

RuntimeError: CUDA error: out of memory

I've been running this code fine and now it doesn't work as of today: https://github.com/sberbank-ai/ru-dalle

Using Anaconda on windows 10. I have an RTX 3090 GPU. RuntimeError: CUDA error: out of memory

I get this message when I run from anaconda command line: python rdalle_generate_images.py

=====================
UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, roundi
![ru-dalle error_1](https://user-images.githubusercontent.c

om/67176886/144449726-51f57001-e977-4259-80bc-4814b30d0a2c.jpg)
ng_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

RuntimeError: CUDA error: out of memory

Any idea how to fix? Should I download the code repo and try to run again?

What top_k, top_p parameters mean?

Good evening!
What do top_k, top_p parameters mean in:

for top_k, top_p, images_num in [
    (2048, 0.995, 3),
    (1536, 0.99, 3)

And what 'score' parameter shows?

Demo results

Hi, nice job with this implementation.
I just wanted to know if it is possible to tweak the demo to get better results with it because it seems it generates worse results in comparison with the notebooks.
Thank you for your time!

should use_cache=True be used for image prompts?

should we now use use_cache=True for image prompts? I was using the latest code but with use_cache=False. after changing results look the same but it is 10x faster...
is there any reason to use use_cache=False in some conditions?

ai-forever / ru-dalle Goto Github PK

ru-dalle's Introduction

ruDALL-E

Generate images from texts

🤗 HF Models:

Minimal Example:

generation by ruDALLE:

auto cherry-pick by ruCLIP:

super resolution:

Image Prompt

VideoDALL-E | ruCogVideo by @cene555

Aspect ratio images -->NEW<--

🚀 Contributors 🚀

Supported by

Social Media

ru-dalle's People

Contributors

Stargazers

Watchers

Forkers

ru-dalle's Issues

RuntimeError: CUDA error: out of memory

Code to reproduce

Original image (128x128)

Results

RuntimeError: CUDA error: out of memory

Recommend Projects

Recommend Topics

Recommend Org

Jobs