GithubHelp home page GithubHelp logo

ai-forever / ru-dalle Goto Github PK

View Code? Open in Web Editor NEW
1.6K 37.0 248.0 27.57 MB

Generate images from texts. In Russian

Home Page: https://rudalle.ru/

License: Apache License 2.0

Python 1.23% Jupyter Notebook 98.77%
image-generation text-to-image python pytorch dalle openai russian russian-language transformer

ru-dalle's Introduction

ruDALL-E

Generate images from texts

Apache license Downloads Coverage Status pipeline pre-commit.ci status

pip install rudalle==1.1.3

🤗 HF Models:

ruDALL-E Malevich (XL)
ruDALL-E Emojich (XL) (readme here)
ruDALL-E Surrealist (XL)
ruDALL-E Kandinsky (XXL) (soon)

Minimal Example:

Open In Colab Kaggle Hugging Face Spaces

Example usage ruDALL-E Malevich (XL) with 3.5GB vRAM! Open In Colab

Finetuning example Open In Colab

generation by ruDALLE:

import ruclip
from rudalle.pipelines import generate_images, show, super_resolution, cherry_pick_by_ruclip
from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_realesrgan
from rudalle.utils import seed_everything

# prepare models:
device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device)
tokenizer = get_tokenizer()
vae = get_vae(dwt=True).to(device)

# pipeline utils:
realesrgan = get_realesrgan('x2', device=device)
clip, processor = ruclip.load('ruclip-vit-base-patch32-384', device=device)
clip_predictor = ruclip.Predictor(clip, processor, device, bs=8)
text = 'радуга на фоне ночного города'

seed_everything(42)
pil_images = []
scores = []
for top_k, top_p, images_num in [
    (2048, 0.995, 24),
]:
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, bs=8, top_p=top_p)
    pil_images += _pil_images
    scores += _scores

show(pil_images, 6)

auto cherry-pick by ruCLIP:

top_images, clip_scores = cherry_pick_by_ruclip(pil_images, text, clip_predictor, count=6)
show(top_images, 3)

super resolution:

sr_images = super_resolution(top_images, realesrgan)
show(sr_images, 3)

text, seed = 'красивая тян из аниме', 6955

Image Prompt

see jupyters/ruDALLE-image-prompts-A100.ipynb

text, seed = 'Храм Василия Блаженного', 42
skyes = [red_sky, sunny_sky, cloudy_sky, night_sky]

VideoDALL-E | ruCogVideo by @cene555

Video generation example Open In Colab Finetuning example Open In Colab

Aspect ratio images -->NEW<--

Request access: Here

роботы акварелью в стиле ван гога

FID = 15.4 (COCO Valid)

🚀 Contributors 🚀

Supported by

Social Media

ru-dalle's People

Contributors

ak391 avatar alexwortega avatar andrewtrefilov avatar boomb0om avatar cene555 avatar denndimitrov avatar ivksu avatar minimaxir avatar nastyamittseva avatar neverix avatar ollmer avatar oribetelgeuse avatar pre-commit-ci[bot] avatar shonenkov avatar skylion007 avatar tatianashavrina avatar thedenk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ru-dalle's Issues

ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

I'm getting this error recently:

import transformers
import more_itertools
from tqdm.auto import tqdm

from rudalle.pipelines import show, cherry_pick_by_clip
from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_ruclip
from rudalle.utils import seed_everything, torch_tensors_to_pil_list

# prepare models:
device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')

causes

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_209/2921329298.py in <module>
      3 from tqdm.auto import tqdm
      4 
----> 5 from rudalle.pipelines import show, cherry_pick_by_clip
      6 from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_ruclip
      7 from rudalle.utils import seed_everything, torch_tensors_to_pil_list

~/.conda/envs/default/lib/python3.9/site-packages/rudalle/__init__.py in <module>
      6 from .ruclip import get_ruclip
      7 from .emojich_unet import get_emojich_unet
----> 8 from . import vae, dalle, tokenizer, realesrgan, pipelines, ruclip, image_prompts
      9 
     10 

~/.conda/envs/default/lib/python3.9/site-packages/rudalle/pipelines.py in <module>
      4 from os.path import join
      5 
----> 6 import cv2
      7 import torch
      8 import torchvision

~/.conda/envs/default/lib/python3.9/site-packages/cv2/__init__.py in <module>
      6 import sys
      7 
----> 8 from .cv2 import *
      9 from .cv2 import _registerMatType
     10 from . import mat_wrapper

ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

I cannot do any install since I'm running on Sagemaker Studio Lab (install with apt-get not supported apparently). This was not happening in the previous versions.

Finetuning script at the Kaggle notebook provides no output

There is error in the Kaggle notebook with example code for "Emojich" finetuning, that published here:
https://www.kaggle.com/shonenkov/emojich-rudall-e
When finetuning with a custom dataset, on the first try the training was interrupted early, because of the line:
EARLY_STOP = True
This value was changed to "False", but then training continues until the kernel stops after 6 hours and 34 minutes, due to limitation in Kaggle. After that, there is no output data available.
The question is: how to set training to stop after a certain time (or a certain number of steps or epochs) to make saved checkpoints available after the end of execution?

CUDA out of memory

При попытке запуститься говорит о нехватке памяти видеокарты RTX 3050ti.
When trying to start, it indicates a lack of memory of the RTX 3050ti video card.

◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality. x4 --> ready tokenizer --> ready Working with z of shape (1, 256, 32, 32) = 262144 dimensions. vae --> ready ruclip --> ready Traceback (most recent call last): File ".\Text Document.py", line 12, in <module> ruclip = ruclip.to(device) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 899, in to return self._apply(convert) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 570, in _apply module._apply(fn) [Previous line repeated 3 more times] File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 593, in _apply param_applied = fn(param) File "C:\Users\bropi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\nn\modules\module.py", line 897, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 2.86 GiB already allocated; 0 bytes free; 2.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Kernel dies with GTX 1050 4GB

My Jupyeter notebook kernel dies (The kernel appears to have died. It will restart automatically.)when trying to load the main model after downloading it:

device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')

I have split cells for the vae, tokenizer and clip that all load fine. My nvidia-smi is the following:

Total GPU RAM: 3.94 Gb
CPU: 4
RAM GB: 7.8
PyTorch version: 1.10.1+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.8886
Tue Jan  4 18:22:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 45%   25C    P0    N/A /  75W |    849MiB /  4033MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1104      G   /usr/lib/xorg/Xorg                 84MiB |
|    0   N/A  N/A      1682      G   /usr/bin/gnome-shell               31MiB |
|    0   N/A  N/A     12024      G   ...AAAAAAAA== --shared-files       38MiB |
|    0   N/A  N/A     13204      C   /usr/bin/python                   689MiB |
+-----------------------------------------------------------------------------+

while system mem is

loreto@ombromanto:~/Projects/notebooks/rudalle$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7,8G        3,8G        2,8G         97M        1,1G        3,6G
Swap:          2,0G        993M        1,0G

while cpu unit is


loreto@ombromanto:~/Projects/notebooks/rudalle$ cat /proc/cpuinfo  | grep 'name'| uniq
model name	: Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz

With this configuration I'm able to load models like CLIP, GLIDE, LAMA, etc with minor limitations.

I have also tried to follow this approach:

device = 'cpu'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=False, device=device, cache_dir='./')
if has_cuda:
     device = 'cuda'
     dalle.to(device)

loading the model in cpu and moving to cuda, but still getting the notebook issue:

[D 18:22:25.471 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:25.476 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[D 18:22:25.477 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:22:30.023 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:30.024 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[I 18:23:41.356 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05 restarted
[D 18:23:41.831 NotebookApp] Starting kernel: ['/usr/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/loreto/.local/share/jupyter/runtime/kernel-464591fd-7e62-4cd7-80e8-0ac4f3f9ac05.json']
[D 18:23:42.303 NotebookApp] Connecting to: tcp://127.0.0.1:36147
[D 18:23:44.736 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (starting)
[D 18:23:44.759 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:23:44.761 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:23:45.040 NotebookApp] 200 GET /static/base/images/favicon-notebook.ico (127.0.0.1) 122.080000ms
[D 18:23:46.533 NotebookApp] 200 GET /api/contents/rudalle/Malevich_3_5GB_vRAM_usage.ipynb?content=0&_=1641316902647 (127.0.0.1) 19.390000ms
[D 18:23:54.294 NotebookApp] KernelRestarter: restart apparently succeeded

Of course in this case it would be necessary to convert to FP16 doing like dalle.convert_to_fp16() but I'm not sure how to do that.

XXL Model

Hey, good afternoon.
Do you guys plan to release the XXL model to the public in the near future?

image_prompts.py – borders crop not working properly

From an official documentation:

borders (dict[str] | int): borders that we croped from pil_image
example: {'up': 4, 'right': 0, 'left': 0, 'down': 0} (1 int eq 8 pixels)

Up crop works just fine. But if I will pass as a crop argument something other than "Up" in the result, I will get an AssertionError:
telegram-cloud-photo-size-2-5197407051389712641-y

Thank you for a fantastic algo ✨

Auto cut pictures into separated images

Есть ли какие-нибудь параметры, которые автоматически нарежут и сохранят сгенерированные картинки по отдельности?


Are there any args that will automatically cut and save separated images?

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

i use default code and get error after generation 100%
please help i use windows and conda

`◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
x4 --> ready
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
100%|██████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:46<00:00, 22.14it/s]
Traceback (most recent call last):
File "gen.py", line 29, in
_pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\pipelines.py", line 60, in generate_images
images = vae.decode(codebooks)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\vae\model.py", line 38, in decode
img = self.model.decode(z)
File "C:\Users\1\anaconda3\lib\site-packages\rudalle\vae\model.py", line 98, in decode
quant = self.post_quant_conv(quant)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\1\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([3, 256, 32, 32], dtype=torch.float, device='cuda', requires_grad=True).to(memory_format=torch.channels_last)
net = torch.nn.Conv2d(256, 256, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float().to(memory_format=torch.channels_last)
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = true
allow_tf32 = true
input: TensorDescriptor 0000020481F094B0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 3, 256, 32, 32,
strideA = 262144, 1, 8192, 256,
output: TensorDescriptor 0000020481F09590
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 3, 256, 32, 32,
strideA = 262144, 1, 8192, 256,
weight: FilterDescriptor 000001FFD2E76AF0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NHWC
nbDims = 4
dimA = 256, 256, 1, 1,
Pointer addresses:
input: 0000001538C7D000
output: 000000153B87D000
weight: 00000014D3BB0000
`

Question about hard code part in model.py

Hi there :)
First of all, I really want to thank you for sharing such a great work to the public. Because of your works, it was able to experiment and research text to image studies. Great Job!

I have a question about hard code part in model.py(from line 101 to line 116).
My question is 'What is the purpose of the hard code part?'
Is there a specific reason you guys put that part?

That is my question. I thank you again for the great works! I hope you guy early happy Christmas and a happy New Year!
스크린샷 2021-12-03 오후 4 45 41

missing VAE encoder with DWT

@shonenkov Great work everyone!
As far as I can tell, there is only VAE decoder with DWT and no corresponding encoder.
Encoding with get_vae(dwt=True) produces the same number of tokens as get_vae(dwt=False) on the same picture size but they are different. And the DWT decoder doubles the original image size. The result is large but blurry and I see quality loss even after reducing to the original image size. The image decoded-encoded with default VQ GAN model still seems to be better than the DWT model.
@bes-dev Is this due to the need of re-training the model end to end you mentioned in #42 ?
I would expect the compatible VAE DWT encoder encode 512x512 image into 1024 tokens and the decoder restore the image back to 512x512.
I think for now VAE with DWT needs 256x256 image prompts rather than 512x512 but then the resulting quality is unfortunately not worth the effort. Looking forward to see DALL-E trained end-to-end on 512 images.

Add pre-commit CI

Seeing how you already have a pre-commit config, you should add https://pre-commit.ci/ so that all PRs are automatically checked and formatted against the existing style config. (It even applies autofixes to PRs).

[Colab] Text embedding optimization

Input image: https://static8.depositphotos.com/1370441/848/i/600/depositphotos_8486144-stock-photo-beach-and-tropical-sea.jpg

Input text: 'elon musk'

Result: image and image

Colab that runs out of memory: https://colab.research.google.com/drive/1ancv6fQMrzaz67Ikvfv3wnjlwpWsoebO?usp=sharing

My method is to optimize the text embedding of the transformer, in order to make the output closer to the input image. Same thing as fine-tuning, but optimizing text embeddings, instead of model weights. I had to modify model's forward pass to make it retain the gradient. Sorry for the messy code

Also, I wonder if it's possible to generate the same picture every time? This may be a way to do text-based image modification. I tried removing temperature and filtering, didn't help. Seed is always the same(presumably).

New 512x with image inputs

Hi there, I just wanted to say this is fantastic work you've done here, I'm amazed at the results and really look forward to how your team progresses with this.

As for my query, I've been trying to work out how to use image prompts for the new 512x512 notebook (Malevich-3.5GB-vRAM-usage.ipynb) but I haven't managed to make it work, I was wondering if you could help?

Не запускается generate_images

Пытаюсь запустить на device = 'cpu'.
Пример из README самый первый

Падает с таким трейсбеком. Что я делаю не так?

◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
x4 --> ready
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
  0%|          | 0/1024 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\pipelines.py", line 46, in generate_images
    logits, has_cache = dalle(out, attention_mask,
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\fp16.py", line 51, in forward
    return fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs))
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\model.py", line 150, in forward
    transformer_output, present_has_cache = self.transformer(
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\transformer.py", line 76, in forward
    hidden_states, present_has_cache = layer(hidden_states, mask, has_cache=has_cache, use_cache=use_cache)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\rudalle\dalle\transformer.py", line 146, in forward
    layernorm_output = self.input_layernorm(hidden_states)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\modules\normalization.py", line 173, in forward
    return F.layer_norm(
  File "%projectfolder%\test\venv\lib\site-packages\torch\nn\functional.py", line 2346, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Smaller / Distilled model?

Will there be a smaller or a distilled model release? The problem with inferencing in google colab is the speeds. 4:32 for one image on a P100, and 2 hours+ for 3 images on K80.

Test results

Снимок экрана от 2021-11-09 22-00-15



Hello! I have tested your neuronet and reached this results. They are so good!

But no kidding, it's a good idea.

Commercial Use?

Fabulous work! Can we use your code and models for commercial purposes?

Updates decoder to allow generation of 512x512 images

Thanks for your excellent work!

So, as I investigated from the code, currently, your pipeline has a quality bottleneck: DALL-E can predict latent code of the VQVAE only 1024 tokens length. Vanilla VQVAE can decode this code to an image with a size of 256x256. But there are some tricks (described in my article MobileStyleGAN): instead of predicting the image in the pixel domain, we can predict discrete wavelet transform of the target image. DWT transforms image size of [3; H, W] to tensor [12; H/2, W/2] and there is inverse transform without accuracy drop. For the VQVAE, we can predict x2 larger images from the same latent codes size of 1024 tokens.

I built a pipeline (based on knowledge distillation) to train a modified VQVAE decoder that uses this trick. I got a baseline checkpoint that allows you to generate images 512x512 using your pre-trained ruDALL-E model. My results are available as open-source project here (link to baseline checkpoint also available): https://github.com/bes-dev/vqvae_dwt_distiller.pytorch.

I think that results can be better if we can train the whole DALL-E end-to-end using my version of the decoder (currently, I didn't modify codebook to save compatibility with ruDALL-E checkpoint).

The error in ruDall-e code that published in Kaggle

Execution of ruDall-e code in the Kaggle notebook (as is published), in GPU session ends with error:

ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_29/1914141142.py in <module>
----> 1 from rudalle.pipelines import generate_images, show, super_resolution, cherry_pick_by_clip
      2 from rudalle import get_rudalle_model, get_tokenizer, get_vae, get_realesrgan, get_ruclip
      3 from rudalle.utils import seed_everything

ModuleNotFoundError: No module named 'rudalle'

The error message refers to this code:

!pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html > /dev/null
!pip install rudalle==0.0.1rc1 > /dev/null

Ru-Dalle with HuggingFace Accelarated Inference API - Query

Firstly, Thanks for developing this great model , the library and the demo at HuggingFace.

My question is :

Can we use Ru-dalle model using the HuggingFace Accelerated inference API now? I couldn't find any details from huggingface website. if yes, kindly provide the Inference API usage details.

Tokenizer decoding bug

It seems like the tokenizer ignores the first letter when it's uppercase, chaining encode_text + decode_text shows that. What could be the source of this bug? Is this the intended behavior?

image prompts

The model uses an uploaded image as a background image. Is it possible to use image as initial state?

ImageNet classification with ru-dalle?

Hi Team,
Thanks for the excellent contribution to open source.
I've been trying to adapt your code. I'm mostly focused on getting image embeddings from the given image and train a classifier on top of it. I guess dalle code is composed on text and image embeddings.
Any direction on generation image feature vector, what part of code I should modify?

Any help would be greatly appreciated.

Thanks.

RuntimeError: CUDA error: out of memory

I've been running this code fine and now it doesn't work as of today: https://github.com/sberbank-ai/ru-dalle

Using Anaconda on windows 10. I have an RTX 3090 GPU. RuntimeError: CUDA error: out of memory

I get this message when I run from anaconda command line: python rdalle_generate_images.py

=====================
\anaconda3\lib\site-packages\rudalle\dalle\model.py:77: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

RuntimeError: CUDA error: out of memory

Any idea how to fix? Should I download the code repo and try to run again?

not working show(pil_images, 6)

pil_images = []
scores = []
for top_k, top_p, images_num in [
    (2048, 0.995, 3),
    (1536, 0.99, 3),
#    (1024, 0.99, 3),
#    (1024, 0.98, 3),
#    (512, 0.97, 3),
#    (384, 0.96, 3),
#    (256, 0.95, 3),
#    (128, 0.95, 3), 
]:
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
    pil_images += _pil_images
    scores += _scores

show(pil_images, 6)

Sparse attention support

Currently, the inference code creates the entire attention matrix and then masks it. Sparse attention implementations like Triton are more efficient. Does the pre-training code support sparse attention? Will it ever be released?

how to make better resolution and save images?

I dont undestand how to make better resolution or save directly? why command show([pil_image for pil_image, score in sorted(zip(pil_images, scores), key=lambda x: -x[1])] , 6) show pictrues so small ? and if i use
top_images, clip_scores = cherry_pick_by_clip(pil_images, text, ruclip, ruclip_processor, device=device, count=1)
show(top_images, 3)
there will be full resolution? how to make all generated photos in full ress or how to save them directly to file?

How to generate only 1 image? cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Setting images_num or bs to 1 results in following error

Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\Software\Python39\Lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Software\Python39\Lib\site-packages\rudalle\vae\model.py", line 98, in decode
quant = self.post_quant_conv(quant)
File "D:\Software\Python39\Lib\site-packages\rudalle\vae\model.py", line 38, in decode
img = self.model.decode(z)

Problem about the PyTorch vision?

I have look for the issues but I can't find the same problem. So sorry to bother you.
GPU:
截屏2021-12-02 下午6 35 14
my python environment: pytorch=1.8.0&torchvision=0.9.0, cudatoolkit=11.3.1&cudnn =8.2.1. I have tried the rudalle=0.3.0 just following the readme.md, or 0.0.1rc5 by the RTX3090.ipynb, but I only got the following error!
截屏2021-12-02 下午6 38 49

So I wanna know if any problem in my environment? Waiting for your reply!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx

I have a GTX 1660 SUPER 6 gb vram, Ubuntu 20.04, Python 3.9, Driver Version: 460.91.03 , CUDA Version: 11.2

At this stage:
3%|███▏ | 27/1024
I get an error:

.../ru-dalle/main.py", line 31, in <module>
    _pil_images, _scores = generate_images(text, tokenizer, dalle, vae, top_k=top_k, images_num=images_num, top_p=top_p)
..........
.../ru-dalle/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

The video memory used at the time of the error:
4893MiB / 5936MiB

Also, at the very beginning of generation, I get a warning:
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

MultiGPU support ETA?

I'd like to finetune my own model with your code, but the DalleModel as it is now doesn't seem to support the pytorch DistributedDataParallel wrapper.
Do you plan to add such an option?

Minor error in Finetuning example code

In 'freeze' method, there is trivial error i think.

When user intend to freeze attention layers, user will set freeze_attn=True.

But, there is no layers contained 'attn' in names of model.module.named_parameters()! (It can not filtered with condition "elif 'attn' in name:")
So, when user freezes 'other' layers, layers contained 'attention' will be freezed, too.

It will be proper to revise "'attn' in name" to "'attention' in name".

Anyway, thank you all developers of ruDALL-E for providing awesome pre-trained model!

EXIF rotation and accessible colab for image completion with more features

hi, like others, i prepared a colab with more accessible GUI and presets here:
https://gist.github.com/eyaler/0cee9a71f5dd3fdfa9c0c03656ebdd4c
this is oriented for easily doing batch experiments and you can see some results here:
https://twitter.com/eyaler/status/1468682110860992521
there are some very simple things you may want to take from the notebook as fixing image rotation due to EXIF:
im = ImageOps.exif_transpose(Image.open(file))
and fixing the aspect ratio in post-process back to the original (although it would be better to use non-square encoding).
thanks fur rudalle!

Website Enhancement: Replace word captcha with different captcha

For example, captcha with russian words from yandex or captcha where you need to place some kind of figure (triangle, oval, square) into their place on the picture. Its much more handy for the users and your current captcha on website is hard to use, letters are often hard to detect.

RuntimeError: CUDA error: out of memory

I've been running this code fine and now it doesn't work as of today: https://github.com/sberbank-ai/ru-dalle

Using Anaconda on windows 10. I have an RTX 3090 GPU. RuntimeError: CUDA error: out of memory

I get this message when I run from anaconda command line: python rdalle_generate_images.py

=====================
UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, roundi
![ru-dalle error_1](https://user-images.githubusercontent.c
ru-dalle error_2
om/67176886/144449726-51f57001-e977-4259-80bc-4814b30d0a2c.jpg)
ng_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

RuntimeError: CUDA error: out of memory

Any idea how to fix? Should I download the code repo and try to run again?

What top_k, top_p parameters mean?

Good evening!
What do top_k, top_p parameters mean in:

for top_k, top_p, images_num in [
    (2048, 0.995, 3),
    (1536, 0.99, 3)

And what 'score' parameter shows?

Demo results

Hi, nice job with this implementation.
I just wanted to know if it is possible to tweak the demo to get better results with it because it seems it generates worse results in comparison with the notebooks.
Thank you for your time!

should use_cache=True be used for image prompts?

should we now use use_cache=True for image prompts? I was using the latest code but with use_cache=False. after changing results look the same but it is 10x faster...
is there any reason to use use_cache=False in some conditions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.