lucidrains / big-sleep Goto Github PK

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

License: MIT License

Python 100.00%

artificial-intelligence deep-learning text-to-image generative-adversarial-networks multimodality

big-sleep's Introduction

artificial intelligence

cosmic love and attention

fire in the sky

a pyramid made of ice

a lonely house in the woods

marriage in the mountains

lantern dangling from a tree in a foggy graveyard

a vivid dream

balloons over the ruins of a city

the death of the lonesome astronomer - by moirage

the tragic intimacy of the eternal conversation with oneself - by moirage

demon fire - by WiseNat

Big Sleep

Ryan Murdock has done it again, combining OpenAI's CLIP and the generator from a BigGAN! This repository wraps up his work so it is easily accessible to anyone who owns a GPU.

You will be able to have the GAN dream up images using natural language with a one-line command in the terminal.

Original notebook

Simplified notebook

User-made notebook with bugfixes and added features, like google drive integration

Install

$ pip install big-sleep

Usage

$ dream "a pyramid made of ice"

Images will be saved to wherever the command is invoked

Advanced

You can invoke this in code with

from big_sleep import Imagine

dream = Imagine(
    text = "fire in the sky",
    lr = 5e-2,
    save_every = 25,
    save_progress = True
)

dream()

You can now train more than one phrase using the delimiter "|"

Train on Multiple Phrases

In this example we train on three phrases:

an armchair in the form of pikachu
an armchair imitating pikachu
abstract

from big_sleep import Imagine

dream = Imagine(
    text = "an armchair in the form of pikachu|an armchair imitating pikachu|abstract",
    lr = 5e-2,
    save_every = 25,
    save_progress = True
)

dream()

Penalize certain prompts as well!

In this example we train on the three phrases from before,

and penalize the phrases:

blur
zoom

from big_sleep import Imagine

dream = Imagine(
    text = "an armchair in the form of pikachu|an armchair imitating pikachu|abstract",
    text_min = "blur|zoom",
)
dream()

You can also set a new text by using the .set_text(<str>) command

dream.set_text("a quiet pond underneath the midnight moon")

And reset the latents with .reset()

dream.reset()

To save the progression of images during training, you simply have to supply the --save-progress flag

$ dream "a bowl of apples next to the fireplace" --save-progress --save-every 100

Due to the class conditioned nature of the GAN, Big Sleep often steers off the manifold into noise. You can use a flag to save the best high scoring image (per CLIP critic) to {filepath}.best.png in your folder.

$ dream "a room with a view of the ocean" --save-best

Larger model

If you have enough memory, you can also try using a bigger vision model released by OpenAI for improved generations.

$ dream "storm clouds rolling in over a white barnyard" --larger-model

Experimentation

You can set the number of classes that you wish to restrict Big Sleep to use for the Big GAN with the --max-classes flag as follows (ex. 15 classes). This may lead to extra stability during training, at the cost of lost expressivity.

$ dream 'a single flower in a withered field' --max-classes 15

Alternatives

Deep Daze - CLIP and a deep SIREN network

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{brock2019large,
    title   = {Large Scale GAN Training for High Fidelity Natural Image Synthesis}, 
    author  = {Andrew Brock and Jeff Donahue and Karen Simonyan},
    year    = {2019},
    eprint  = {1809.11096},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

big-sleep's People

Contributors

Stargazers

Watchers

Forkers

enricoros davidsoong trendingtechnology deepandy htoyryla abhishek-choudharys ontyj bsmith418 sarmimeah elter-tef ml-and-ai-repo mikdra fastflair ucalyptus2 drjkl tevenlescao edmontdants marcofernandez007 mlcom alx afiaka87 walmsley hoossainalik deathnightmare2 bmolab putate catsvilles discordan amelv areebdurrani kfiasche nkkrustev austinkeller yangboz notnanton orimaker siberianbear agronomthe6th igouspoput punxkxkxkxkxk chikenstrip sanjayws darthnxs moringote dendrome hhy5277 mehmetdaskaya org-mars irevelle pandinosaurus gazay jelloeilou originalghost15 peterzhousz tysonbiegler ablankholm philippeters jing-wei jp-krow skols ltqxwyeg yahmed2503 drcrazygamer12 netruk44 tsasaki609 uur26 unlabeled001 exocamp filippocastelli neuralbending tonedeath soudescolado shotashimura koenvaneijk gsaveri pingyu-iris annaproxy joaopramos motraor3 wolfgangmeyers ispekhov calculator01 ioannoue boibuster8 orkaoceaniczna mostley lopho rasbot anomal dwazirl jonahmackey gmolabs akinoriosamura dberzon zhanghongyong123456 silenzio777 rafaelpuga olaviinha rogalag ezqmal

big-sleep's Issues

How did you train this?

Usually AIs train towards a tangible and absolute output but this does the complete opposite.
How?

The term 'dream' is not recognized as the name of a cmdlet, function, script file, or operable program.

When I run the command I get an error. Using Powershell on Windows 10.

dream : The term 'dream' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1

dream "a pyramid made of ice"

  + CategoryInfo          : ObjectNotFound: (dream:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CommandNotFoundException

Possible to add random seed value to output file name?

Would it be possible to add the random seed value generated by --random to the output file name to make it easier to re-use?

ResoloutionImpossible error when installing

When I try to run pip install big-sleep, it gives me an error:

ERROR: Cannot install big-sleep==0.0.1, big-sleep==0.0.2, big-sleep==0.1.0, big-sleep==0.1.1, big-sleep==0.1.2, big-sleep==0.1.4, big-sleep==0.2.0, big-sleep==0.2.2, big-sleep==0.2.3, big-sleep==0.2.4, big-sleep==0.2.5, big-sleep==0.2.6, big-sleep==0.2.7, big-sleep==0.2.8, big-sleep==0.2.9, big-sleep==0.3.0, big-sleep==0.3.1, big-sleep==0.3.2, big-sleep==0.3.3, big-sleep==0.3.4, big-sleep==0.3.5, big-sleep==0.3.6, big-sleep==0.3.7, big-sleep==0.3.8, big-sleep==0.4.0, big-sleep==0.4.1, big-sleep==0.4.10, big-sleep==0.4.11, big-sleep==0.4.2, big-sleep==0.4.3, big-sleep==0.4.4, big-sleep==0.4.5, big-sleep==0.4.6, big-sleep==0.4.7, big-sleep==0.4.8, big-sleep==0.4.9, big-sleep==0.5.0, big-sleep==0.5.1, big-sleep==0.5.2, big-sleep==0.5.3, big-sleep==0.6.0, big-sleep==0.6.1, big-sleep==0.6.2, big-sleep==0.7.0 and big-sleep==0.7.1 because these package versions have conflicting dependencies.

The conflict is caused by:
big-sleep 0.7.1 depends on torchvision>=0.8.2
big-sleep 0.7.0 depends on torchvision>=0.8.2
big-sleep 0.6.2 depends on torchvision>=0.8.2
big-sleep 0.6.1 depends on torchvision>=0.8.2
big-sleep 0.6.0 depends on torchvision>=0.8.2
big-sleep 0.5.3 depends on torchvision>=0.8.2
big-sleep 0.5.2 depends on torchvision>=0.8.2
big-sleep 0.5.1 depends on torchvision>=0.8.2
big-sleep 0.5.0 depends on torchvision>=0.8.2
big-sleep 0.4.11 depends on torchvision>=0.8.2
big-sleep 0.4.10 depends on torchvision>=0.8.2
big-sleep 0.4.9 depends on torchvision>=0.8.2
big-sleep 0.4.8 depends on torchvision>=0.8.2
big-sleep 0.4.7 depends on torchvision>=0.8.2
big-sleep 0.4.6 depends on torchvision>=0.8.2
big-sleep 0.4.5 depends on torchvision>=0.8.2
big-sleep 0.4.4 depends on torchvision>=0.8.2
big-sleep 0.4.3 depends on torchvision>=0.8.2
big-sleep 0.4.2 depends on torchvision>=0.8.2
big-sleep 0.4.1 depends on torchvision>=0.8.2
big-sleep 0.4.0 depends on torchvision>=0.8.2
big-sleep 0.3.8 depends on torchvision>=0.8.2
big-sleep 0.3.7 depends on torchvision>=0.8.2
big-sleep 0.3.6 depends on torchvision>=0.8.2
big-sleep 0.3.5 depends on torchvision>=0.8.2
big-sleep 0.3.4 depends on torchvision>=0.8.2
big-sleep 0.3.3 depends on torchvision>=0.8.2
big-sleep 0.3.2 depends on torchvision>=0.8.2
big-sleep 0.3.1 depends on torchvision>=0.8.2
big-sleep 0.3.0 depends on torchvision>=0.8.2
big-sleep 0.2.9 depends on torchvision>=0.8.2
big-sleep 0.2.8 depends on torchvision>=0.8.2
big-sleep 0.2.7 depends on torch>=1.7.1
big-sleep 0.2.6 depends on torch>=1.7.1
big-sleep 0.2.5 depends on torch>=1.7.1
big-sleep 0.2.4 depends on torch>=1.7.1
big-sleep 0.2.3 depends on torch>=1.7.1
big-sleep 0.2.2 depends on torch>=1.7.1
big-sleep 0.2.0 depends on torch>=1.7.1
big-sleep 0.1.4 depends on torch>=1.7.1
big-sleep 0.1.2 depends on torch>=1.7.1
big-sleep 0.1.1 depends on torch>=1.7.1
big-sleep 0.1.0 depends on torch>=1.7.1
big-sleep 0.0.2 depends on torch>=1.7.1
big-sleep 0.0.1 depends on torch>=1.7.1

Fresh install. Received this error: "ValueError: Expected tensor to be a tensor image of size (C, H, W). Got tensor.size() = torch.Size([128, 3, 224, 224])"

I got this very specific error that is preventing the command line tool (dream 'query') from running with every query I have tried. I am not sure if we are supposed to input dimensions, but the Read Me did not specify anything.

This is from "installing" this program with the pip install command.

Is it possible this is a torch version issue?

According to the command 'print(torch.version)' my version is 1.7.1+cu110 .

Please advise!

AssertionError: CUDA must be available in order to use Deep Daze

Traceback (most recent call last):
  File "reeeeeeeeeeeeeeeeeeeeeeeeeee.py", line 1, in <module>
    from big_sleep import Imagine
  File "C:\Users\FlashlightBulbton\anaconda3\envs\bigsleep\lib\site-packages\big_sleep\__init__.py", line 1, in <module>
    from big_sleep.big_sleep import BigSleep, Imagine
  File "C:\Users\FlashlightBulbton\anaconda3\envs\bigsleep\lib\site-packages\big_sleep\big_sleep.py", line 22, in <module>
    assert torch.cuda.is_available(), 'CUDA must be available in order to use Deep Daze'
AssertionError: CUDA must be available in order to use Deep Daze

(bigsleep) c:\frart>nvidia-smi
Thu Feb 25 09:57:47 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.48       Driver Version: 451.48       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166... WDDM  | 00000000:08:00.0  On |                  N/A |
| 40%   50C    P2    35W / 125W |   2038MiB /  6144MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

literally every other ai i use (i use a lot of them, stylegan, wav2lip, etc etc) works fine on this pc (theyre all in different conda environments)

Too many iteration result in looping behavior/results

See: https://www.youtube.com/watch?v=G7AC4zBQ3Co

The video consists of 780 pictures made with the following configuration of Big Sleep: 15600 iterations, 1 epoch, save an image every 20 iterations.
lr= 0.03, max_classes= 200 (or 100), non deterministic

Even if this is expected behavior, there should be a safety mechanism to stop the execution if the results repeat.

Any possible way to preview and re-roll random seed before dreaming?

I'd love to have the option to preview 1-5 random seeds and choose one before I begin iterating over it.

I'm interested in seeing how much seeds impact the final image... but I'm also fairly sure it makes a big difference, and because of that, it'd be awesome to roll through a few seeds until I find one with an interesting composition.

Trouble in uninstall big-sleep

hello,
I`m tying to uninstal big-sleep, this is answer in terminal:
"ERROR: Exception:
Traceback (most recent call last):
File "/usr/lib/python3.8/shutil.py", line 788, in move
os.rename(src, real_dst)
PermissionError: [Errno 13] Brak dostępu: '/usr/local/bin/dream' -> '/tmp/pip-uninstall-emiszm_g/dream'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/base_command.py", line 180, in _main
status = self.run(options, args)
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/commands/uninstall.py", line 85, in run
uninstall_pathset = req.uninstall(
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/req/req_install.py", line 672, in uninstall
uninstalled_pathset.remove(auto_confirm, verbose)
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/req/req_uninstall.py", line 386, in remove
moved.stash(path)
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/req/req_uninstall.py", line 275, in stash
renames(path, new_path)
File "/usr/local/lib/python3.8/dist-packages/pip/_internal/utils/misc.py", line 324, in renames
shutil.move(old, new)
File "/usr/lib/python3.8/shutil.py", line 803, in move
os.unlink(src)
PermissionError: [Errno 13] Brak dostępu: '/usr/local/bin/dream'"

did you have solution

CUDA must be available in order to use Big Sleep

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

It's possible to use it in another languaje?

what could be the way to make it work on another languaje? thanks a lot

Stability of 'seed'

I've tried specifying Imagine(seed = SOME_CONST) a few times and I get different results all the time. Is the same happening for you, @lucidrains?

I don't know where other sources of random are; I was thinking it would be good to start many different random generations and save the seeds, and then select the most promising output and restart the generation of higher quality outputs with a larger number of iterations.

Confused of what: "then restart and rerun everything" means.

In both Big Sleep and Deep Gaze google colabs, there is a line that tells you to change the PYTorch version n such.

You must run this cell and then restart and rerun everything for the PyTorch version to be correct. Otherwise the model will run but not produce any meaningful output.

I'm confused on the part saying to restart, does it mean to restart the runtime?
Runtime -> Restart Runtime or CTRL + M . at the top.

Or does it mean to do something else?

Any way to see a live feed of the image that updates?

I would love to have a live feed of what modifications are being done to the image without having to re-open the image over and over again, anyone have any ideas?

Errors when calling the CLI on Colab

I have tried to run the CLI from within Colab (with GPU activated), as follows:

%pip install big-sleep

!dream "a pyramid made of ice"

First, CUDA is version 10.1 on Colab, so I encounter an error regarding PyTorch version, which can be fixed with:

%pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 \
 -f https://download.pytorch.org/whl/torch_stable.html

Second, I encounter an other error:

100%|██████████████████████| 353976522/353976522 [00:02<00:00, 135419788.41it/s]
Traceback (most recent call last):
[...]
  File "/usr/local/lib/python3.6/dist-packages/big_sleep/big_sleep.py", line 82, in __init__
    assert image_size in (128, 256, 512), 'image size must be one of 128, 256, or 512'
AssertionError: image size must be one of 128, 256, or 512

Better differentiable interpolation

I notice that on line 179 of big_sleep.py you use F.interpolate() to downsample: https://github.com/lucidrains/big-sleep/blob/main/big_sleep/big_sleep.py#L179

I've observed that F.interpolate() is bad at downsampling because it doesn't prefilter its input and this leaves weird blocky artifacts in the gradient of the original-sized image. Is that why you are doing 128 random-sized cutouts? I wrote a prefiltering resampling function today that uses F.interpolate() for upsampling but which low-pass filters its input (with a Lanczos 3 kernel it computes) before calling it for downsampling. It's at https://gist.github.com/crowsonkb/a905773ba4d7aa5cd7671315e464369c. Hope this helps!

Question about Colab environments affecting results

This isn't really an issue, but I've been using this package to morph from one text input to the next by updating the input while its running and storing all intermediate output. I'm using variations of the notebook I've checked in here: https://github.com/lots-of-things/Story2Hallucination

When doing this I've gotten to really see inside how the algorithm converges on its solution. And I've noticed that there are at least two distinct modes. Sometimes, the algorithm quickly converges and sticks and other times the algorithm wobbles around an image and then much more easily warps to something new.

Here's two images with the same input and framerate:
jerky/sticky

wobbly/warpy

If these two modes happened randomly I'd understand it, but here is the really strange part. This behavior will be consistent in the same colab environment. If I pull up a colab env and run the notebook and it does the sticky way. Then no matter how many times I restart the run, it will always be sticky. I have to factory reset the env to get it to change. Then if it starts to do it the wobbly way, it'll stay wobbly.

It sounds bizarre but is there any reason this would be possible? I know there are different CUDA environments, but not sure if/why that would make it so different.

Some problem with nautilus

First, thanks for your work.

When I run the code as your example, I obtain the output as follows:

(nautilus:51266): Gtk-WARNING **: Failed to register client: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files

** (nautilus:51266): WARNING **: Can not get _NET_WORKAREA

** (nautilus:51266): WARNING **: Can not determine workarea, guessing at layout

Can you help me to figure out what is happening? Thanks a lot

Failure on first epoch

opens folder where picture should be saved, but this error shows up immediately:

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

torch version: 1.7.1
torch.cuda.is_available() == true

what am i missing?

dream.reset() = RuntimeError: Tensor for 'out' is on CPU... but expected them to be on GPU

Hi,

If I try to call dream.reset() I get the following error:

Traceback (most recent call last):
File "testme.py", line 21, in
dream()
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/github/big-sleep/big_sleep/big_sleep.py", line 341, in forward
self.model(self.encoded_text) # one warmup step due to issue with CLIP and CUDA
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/github/big-sleep/big_sleep/big_sleep.py", line 173, in forward
out = self.model()
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/github/big-sleep/big_sleep/big_sleep.py", line 136, in forward
out = self.biggan(*self.latents(), 1)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/github/big-sleep/big_sleep/biggan.py", line 574, in forward
embed = self.embeddings(class_label)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear
output = input.matmul(weight.t())
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

My file looks like this:

dream.set_text("A photo of a photo")
dream()
dream.reset()
dream.set_text("A picture of a photo")
dream()

Some requirements are missing (for the CLI use)

Hi, thanks for developing a wonderful tool. I found

* fire
* regex
* torchvision

are requiredd, but not written in the requirements in setup.py.

A few observations

Not so much an issue maybe but a few observations I made while playing a bit with this repo (pretty awesome stuff btw)

1.) Default iterations and number of epochs are way too high. Usually I noticed that after 500 iterations at least with that default learning rate is tends to collapse or suddenly runs of into some weird state. That's in the first epoch, so 1 instead of 20 is probably enough.
2.) Default learning rate is probably too high too saw more stable conversions with 0.03
3.) Still not quiet clear what the best prompting strategy seems to be I tried a photo, photo of, picture of or just the object description and it tends to produce complete different results. So maybe it is the number of tokens that has a bigger influence than the token itself.

And now a few questions if you don't mind

What's the benefit of having so many many region crops to feed into clip? Seems a bit excessive and I wonder if that's the reason why a lot of results tend to look like collages after a while.

Is the gradient accumulation really necessary or does it make a difference?

Question about this BigGAN implementation

When I have used a BigGAN earlier in my code, it has been enough to input two vectors, of dimensions [1, 128] for the z vector and [1, 1000] for the class vector. This one here appears to require a separate conditioning vector (derived from the z vector) for each layer, see here

big-sleep/big_sleep/biggan.py

Line 520 in 8a7fd76

z = layer(z, cond_vector[i+1].unsqueeze(0), truncation)

I noticed this when I experimented with storing the latents together with each generated image and the interpolating between two generated images using the stored latents. I initially (naively) assumed I could pick up one of the 32 vectors using the value of best in the Latents, but this results in an error on the line quoted above.

OK, I got my experiment to work by using the entire [32, ....] tensors as latent and class vectors. Just interested to understand what is going on here. It is apparent to me now that the whole set of 32 is necessary to arrive at the correct image. Taking just one, any one of the 32 and feeding it into my plain old bigger will not result in anything like the right image.

I guess this has to do with mixing the text part somehow in, actually inside the BigGAN. New things to me.

Pretrained Big Sleep

When I run !dream "something" on the given colab, it seems to start the training process and displays epochs / losses. Is there a way to simply look at the output of the model using pretrained weights and instead of retraining everything from scratch?

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

I found an error
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)``
when attempting to call the run the sample code provided in the ReadMe,:

`from big_sleep import Imagine

dream = Imagine(
    text = "fire in the sky",
    lr = 5e-2,
    save_every = 50,
    save_progress = True
)

dream()`

The error seems to have occurred here:
File "D:\Anaconda\lib\site-packages\torch\nn\functional.py", line 1753, in linear return torch._C._nn.linear(input, weight, bias)

Does anyone know why this would occur? I am running my code on a 64 bit Windows 10 laptop with a GeForce GTX 1660 Ti Graphics Card

CUDA out of memory

Not sure how i ran out of memory given this is the only time ive tried running something like this myself rather than on a colab. Doing nvidia-smi shows processes with "N/A" GPU Memory Usage, and i don't know how to kill any of these (they don't go away when python quits). error is as follows:

  File "c:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\dream.exe\__main__.py", line 7, in <module>
  File "c:\python\lib\site-packages\big_sleep\cli.py", line 65, in main
    fire.Fire(train)
  File "c:\python\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\python\lib\site-packages\fire\core.py", line 471, in _Fire
    target=component.__name__)
  File "c:\python\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\cli.py", line 62, in train
    imagine()
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 407, in forward
    loss = self.train_step(epoch, i, image_pbar)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 357, in train_step
    losses = self.model(self.encoded_texts["max"], self.encoded_texts["min"])
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\big_sleep.py", line 216, in forward
    image_embed = perceptor.encode_image(into)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 519, in encode_image
    return self.visual(image.type(self.dtype))
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 410, in forward
    x = self.transformer(x)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 381, in forward
    return self.resblocks(x)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 369, in forward
    x = x + self.mlp(self.ln_2(x))
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\python\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\python\lib\site-packages\big_sleep\clip.py", line 346, in forward
    return x * torch.sigmoid(1.702 * x)
RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 8.00 GiB total capacity; 5.32 GiB already allocated; 28.04 MiB free; 5.53 GiB reserved in total by PyTorch)

Not Working with Nemo File Manager

Installed successfully on Arch 5.11.16 with pip install big-sleep and it worked great but after a reboot it fails to generate images. After running any commands prime-run dream --num-cutouts=25 --save-progress --save-every 100 "whatever" or just dream --num-cutouts 25 "whatever" it opens the file manager (Nemo) to the directory but no images are generated. It sits while still using RAM but no GPU or CPU. Previously I was able to get it to work by reinstalling torch and big-sleep but that doesn't solve the problem anymore.

Edit: After letting it sit for a few minutes, cancelling the command with CTRL+C shows detecting keyboard interrupt, gracefully exiting with empty progess bars before exiting.

Running with the --open-folder False option fixes this

I fixed and improved the Colab Notebook overall immensely, with new features (mount gdrive) and explanations (afaik)

Talking about this notebook: https://colab.research.google.com/drive/1MEWKbm-driRNF8PrU7ogS5o3se-ePyPb?usp=sharing

There were issues when using symbols like "\" (for multiple phrases to be trained on) or any other symbol in the TEXT variable.
I also added a couple more useful things and explanations as far as I understood the inner workings of everything. (Easily mounting your google drive, automatically saving images to a folder in it, checking what kind of GPU is used, made variables more descriptive etc)

My Colab Notebook: https://colab.research.google.com/drive/1zVHK4t3nXQTsu5AskOOOf3Mc9TnhltUO?usp=sharing

Thank you for providing all this!

Edit: Updated the notebook link since I have made many changes

Sample results, Tooling on top of big-sleep

big-sleep is GORGEOUS. We need to explore what it can do, where it shines, and what to avoid.

Adding a few pics down below, but I'm still in early experimentation - will update the thread later.

Puppies

« a colorful cartoon of a dog »

seed=553905700049900, iteration=160, lr=.07, size=256
same, iteration=490

« a colorful cartoon of a dog with blue eyes and a heart »

seed=555169003382600, iteration=400, lr=.07, size=256

Clouds

« clouds in the shape of a donut »

seed=581307748222100, iteration=360, lr=.07, size=256
seed=583134047383400, iteration=390, lr=.07, size=256

This post will be edited to add new samples

Add the ability to train multiple phrases and penalize multiple phrases

I've implemented this here #40

Please review my code and run the test file I've included in the ./test folder before merging.

What GPU is required to run it ?

I tried executing the dream command on my laptop computer with a Quadro P2000 with 4 GB of vram. I got a CUDA out of memory error.

Method 'forward' is not defined

I installed the module via

$ pip install deep-daze
and just tried the provided example with

$ imagine "a house in the forest"
but after it loaded something for a few minutes (the first time I run the command) it throws this error

Traceback (most recent call last):

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-32b6fbd8f807> in <module>()
----> 1 from deep_daze import Imagine
     2 
     3 imagine = Imagine(
     4     text = 'cosmic love and attention',
     5     num_layers = 24,

E:\Anaconda\lib\site-packages\deep_daze\__init__.py in <module>()
----> 1 from deep_daze.deep_daze import DeepDaze, Imagine

E:\Anaconda\lib\site-packages\deep_daze\deep_daze.py in <module>()
    37 signal.signal(signal.SIGINT, signal_handling)
    38 
---> 39 perceptor, normalize_image = load()
    40 
    41 # Helpers

E:\Anaconda\lib\site-packages\deep_daze\clip.py in load()
   190                     node.copyAttributes(device_node)
   191 
--> 192     model.apply(patch_device)
   193     patch_device(model.encode_image)
   194     patch_device(model.encode_text)

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   471         """
   472         for module in self.children():
--> 473             module.apply(fn)
   474         fn(self)
   475         return self

E:\Anaconda\lib\site-packages\torch\nn\modules\module.py in apply(self, fn)
   472         for module in self.children():
   473             module.apply(fn)
--> 474         fn(self)
   475         return self
   476 

E:\Anaconda\lib\site-packages\deep_daze\clip.py in patch_device(module)
   181 
   182     def patch_device(module):
--> 183         graphs = [module.graph] if hasattr(module, "graph") else []
   184         if hasattr(module, "forward1"):
   185             graphs.append(module.forward1.graph)

E:\Anaconda\lib\site-packages\torch\jit\_script.py in graph(self)
   447             ``forward`` method. See :ref:`interpreting-graphs` for details.
   448             """
--> 449             return self._c._get_method("forward").graph
   450 
   451         @property

RuntimeError: Method 'forward' is not defined.

My system is:
Windows 10
GeForce GTX1060 6G
pytorch 1.8.0+cu111
python 3.7.0

Crashes

It was just working a few hours ago and I went to start it up again then it got to 2% and crashed. Im not sure why its doing this but its annoying because I have a list of requests to get through.

Possible to change behavior of save-best flag to avoid duplicate files?

Using the save-best flag results in a lot of duplicate png files being saved. Would it be possible to make it only save an additional "best" file if the image is different from the one generated by the final iteration?

Currently, it is necessary to compare the two output files by eye to determine if they are the same before deleting any duplicates.

What does the 'img' flag do?

Something to do with passing a reference image?

ERROR: torchtext 0.9.0 has requirement torch==1.8.0, but you'll have torch 1.7.1+cu110 which is incompatible.

I get this error when I try to run the first cell, and when I run big-sleep it gives me a white image as output.

Installation environment?

After running the example code in the README, I am encountering this error:

  File "C:\Users\bengu\AppData\Local\Programs\Python\Python39\lib\site-packages\big_sleep\clip.py", line 189, in patch_device
    if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
TypeError: 'torch._C.Node' object is not subscriptable

I am running CUDA11.2.1 (cuda_11.2.r11.2/compiler.29558016_0)
I am using Python 3.9.1
I am on Windows 10
This is a clean installation of win10, python3.9, and cuda11.2.1
GPU: RTX 3070

I installed pytorch off of their site, instead of letting big-sleep install the requirements. I used this command:

pip install torch===1.7.1+cu110 torchvision===0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

What is the proper way to use this? Following the instructions did not yield results.

dream.set_text("a quiet pond underneath the midnight moon") = object has no attribute 'set_text'

Looks like def set_text is now def set_clip_encoding?
Also, various settings aren't honoured using dream.set_clip_encoding then dream (such as save_best = True). Would it be possible / make sense to retain settings from the previous dream?

CUDA out of memory error, can i use shared GPU memory instead of dedicated?

Im running on an RTX 2060 with 6GB Vram which is probably not enough as generating any image size causes an out of memory error, people said that 8GB is roughly the minimum required to run this AI, i have 8GB of shared video memory, so can the AI access the shared video memory instead of strictly staying on the dedicated memory?

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

I found an error
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)
when trying to imagining the text.

The error seems to have occurred right here
~/miniconda3/envs/env/lib/python3.7/site-packages/torch/nn/functional.py in linear(input, weight, bias) 1751 if has_torch_function_variadic(input, weight): 1752 return handle_torch_function(linear, (input, weight), input, weight, bias=bias) -> 1753 return torch._C._nn.linear(input, weight, bias)

Can somebody tell the reason why this happens?

Image Size Increase

When I try to edit image size (input a value larger than 512) in cli.py, I get an Issue saying that the image size must be 128, 256, or 512. Is there any way to increase image size?

Run on multiply gpus

Hi ive tried to add multiply gpu support by nn.DataParallel
Now its looks like this
`
import os
import sys
import subprocess
import signal
import string
import re

from datetime import datetime
from pathlib import Path
import random

import torch
import torch.nn.functional as F
from torch import nn
from torch.optim import Adam
from torchvision.utils import save_image
import torchvision.transforms as T
from PIL import Image
from tqdm import tqdm, trange

from ema import EMA
from resample import resample
from biggan import BigGAN
from clip import load, tokenize

assert torch.cuda.is_available(), 'CUDA must be available in order to use Big Sleep'

graceful keyboard interrupt

terminate = False

def signal_handling(signum,frame):
global terminate
terminate = True

signal.signal(signal.SIGINT,signal_handling)

helpers

def exists(val):
return val is not None

def open_folder(path):
if os.path.isfile(path):
path = os.path.dirname(path)

if not os.path.isdir(path):
    return

cmd_list = None
if sys.platform == 'darwin':
    cmd_list = ['open', '--', path]
elif sys.platform == 'linux2' or sys.platform == 'linux':
    cmd_list = ['xdg-open', path]
elif sys.platform in ['win32', 'win64']:
    cmd_list = ['explorer', path.replace('/','\\')]
if cmd_list == None:
    return

try:
    subprocess.check_call(cmd_list)
except subprocess.CalledProcessError:
    pass
except OSError:
    pass

def create_text_path(text=None, img=None, encoding=None):
input_name = ""
if text is not None:
input_name += text
if img is not None:
if isinstance(img, str):
img_name = "".join(img.split(".")[:-1]) # replace spaces by underscores, remove img extension
img_name = img_name.split("/")[-1] # only take img name, not path
else:
img_name = "PIL_img"
input_name += "" + img_name
if encoding is not None:
input_name = "your_encoding"
return input_name.replace("-", "").replace(",", "").replace(" ", "").replace("|", "--").strip('-')[:255]

tensor helpers

def differentiable_topk(x, k, temperature=1.):
n, dim = x.shape
topk_tensors = []

for i in range(k):
    is_last = i == (k - 1)
    values, indices = (x / temperature).softmax(dim=-1).topk(1, dim=-1)
    topks = torch.zeros_like(x).scatter_(-1, indices, values)
    topk_tensors.append(topks)
    if not is_last:
        x = x.scatter(-1, indices, float('-inf'))

topks = torch.cat(topk_tensors, dim=-1)
return topks.reshape(n, k, dim).sum(dim = 1)

def create_clip_img_transform(image_width):
clip_mean = [0.48145466, 0.4578275, 0.40821073]
clip_std = [0.26862954, 0.26130258, 0.27577711]
transform = T.Compose([
#T.ToPILImage(),
T.Resize(image_width),
T.CenterCrop((image_width, image_width)),
T.ToTensor(),
T.Normalize(mean=clip_mean, std=clip_std)
])
return transform

def rand_cutout(image, size, center_bias=False, center_focus=2):
width = image.shape[-1]
min_offset = 0
max_offset = width - size
if center_bias:
# sample around image center
center = max_offset / 2
std = center / center_focus
offset_x = int(random.gauss(mu=center, sigma=std))
offset_y = int(random.gauss(mu=center, sigma=std))
# resample uniformly if over boundaries
offset_x = random.randint(min_offset, max_offset) if (offset_x > max_offset or offset_x < min_offset) else offset_x
offset_y = random.randint(min_offset, max_offset) if (offset_y > max_offset or offset_y < min_offset) else offset_y
else:
offset_x = random.randint(min_offset, max_offset)
offset_y = random.randint(min_offset, max_offset)
cutout = image[:, :, offset_x:offset_x + size, offset_y:offset_y + size]
return cutout

load clip

perceptor, normalize_image = load('ViT-B/32', jit = False)

load biggan

class Latents(torch.nn.Module):
def init(
self,
num_latents = 15,
num_classes = 1000,
z_dim = 128,
max_classes = None,
class_temperature = 2.
):
super().init()
self.normu = torch.nn.Parameter(torch.zeros(num_latents, z_dim).normal_(std = 1))
self.cls = torch.nn.Parameter(torch.zeros(num_latents, num_classes).normal_(mean = -3.9, std = .3))
self.register_buffer('thresh_lat', torch.tensor(1))

    assert not exists(max_classes) or max_classes > 0 and max_classes <= num_classes, f'max_classes must be between 0 and {num_classes}'
    self.max_classes = max_classes
    self.class_temperature = class_temperature

def forward(self):
    if exists(self.max_classes):
        classes = differentiable_topk(self.cls, self.max_classes, temperature = self.class_temperature)
    else:
        classes = torch.sigmoid(self.cls)

    return self.normu, classes

class Model(nn.Module):
def init(
self,
image_size,
max_classes = None,
class_temperature = 2.,
ema_decay = 0.99
):
super().init()
assert image_size in (128, 256, 512), 'image size must be one of 128, 256, or 512'
self.biggan = BigGAN.from_pretrained(f'biggan-deep-{image_size}')
self.max_classes = max_classes
self.class_temperature = class_temperature
self.ema_decay
= ema_decay

    self.init_latents()

def init_latents(self):
    latents = Latents(
        num_latents = len(self.biggan.config.layers) + 1,
        num_classes = self.biggan.config.num_classes,
        z_dim = self.biggan.config.z_dim,
        max_classes = self.max_classes,
        class_temperature = self.class_temperature
    )
    self.latents = EMA(latents, self.ema_decay)

def forward(self):
    self.biggan.eval()
    out = self.biggan(*self.latents(), 1)
    return (out + 1) / 2

class BigSleep(nn.Module):
def init(
self,
num_cutouts = 128,
loss_coef = 100,
image_size = 512,
bilinear = False,
max_classes = None,
class_temperature = 2.,
experimental_resample = False,
ema_decay = 0.99,
center_bias = False,
):
super().init()
self.loss_coef = loss_coef
self.image_size = image_size
self.num_cutouts = num_cutouts
self.experimental_resample = experimental_resample
self.center_bias = center_bias

    self.interpolation_settings = {'mode': 'bilinear', 'align_corners': False} if bilinear else {'mode': 'nearest'}

    self.model =torch.nn.DataParallel(Model(
        image_size = image_size,
        max_classes = max_classes,
        class_temperature = class_temperature,
        ema_decay = ema_decay
    ).cuda()) 

def reset(self):
    self.model.init_latents()

def sim_txt_to_img(self, text_embed, img_embed, text_type="max"):
    sign = -1
    if text_type == "min":
        sign = 1
    return sign * self.loss_coef * torch.cosine_similarity(text_embed, img_embed, dim = -1).mean()

def forward(self, text_embeds, text_min_embeds=[], return_loss = True):
    width, num_cutouts = self.image_size, self.num_cutouts

    out = self.model()

    if not return_loss:
        return out

    pieces = []
    for ch in range(num_cutouts):
        # sample cutout size
        size = int(width * torch.zeros(1,).normal_(mean=.8, std=.3).clip(.5, .95))
        # get cutout
        apper = rand_cutout(out, size, center_bias=self.center_bias)
        if (self.experimental_resample):
            apper = resample(apper, (224, 224))
        else:
            apper = F.interpolate(apper, (224, 224), **self.interpolation_settings)
        pieces.append(apper)

    into = torch.cat(pieces)
    into = normalize_image(into)

    image_embed = perceptor.encode_image(into)

    latents, soft_one_hot_classes = self.model.latents()
    num_latents = latents.shape[0]
    latent_thres = self.model.latents.model.thresh_lat

    lat_loss =  torch.abs(1 - torch.std(latents, dim=1)).mean() + \
                torch.abs(torch.mean(latents, dim = 1)).mean() + \
                4 * torch.max(torch.square(latents).mean(), latent_thres)


    for array in latents:
        mean = torch.mean(array)
        diffs = array - mean
        var = torch.mean(torch.pow(diffs, 2.0))
        std = torch.pow(var, 0.5)
        zscores = diffs / std
        skews = torch.mean(torch.pow(zscores, 3.0))
        kurtoses = torch.mean(torch.pow(zscores, 4.0)) - 3.0

        lat_loss = lat_loss + torch.abs(kurtoses) / num_latents + torch.abs(skews) / num_latents

    cls_loss = ((50 * torch.topk(soft_one_hot_classes, largest = False, dim = 1, k = 999)[0]) ** 2).mean()

    results = []
    for txt_embed in text_embeds:
        results.append(self.sim_txt_to_img(txt_embed, image_embed))
    for txt_min_embed in text_min_embeds:
        results.append(self.sim_txt_to_img(txt_min_embed, image_embed, "min"))
    sim_loss = sum(results).mean()
    return out, (lat_loss, cls_loss, sim_loss)

class Imagine(nn.Module):
def init(
self,
*,
text=None,
img=None,
encoding=None,
text_min = "",
lr = .07,
image_size = 512,
gradient_accumulate_every = 1,
save_every = 50,
epochs = 20,
iterations = 1050,
save_progress = False,
bilinear = False,
open_folder = True,
seed = None,
append_seed = False,
torch_deterministic = False,
max_classes = None,
class_temperature = 2.,
save_date_time = False,
save_best = False,
experimental_resample = False,
ema_decay = 0.99,
num_cutouts = 128,
center_bias = False,
):
super().init()

    if torch_deterministic:
        assert not bilinear, 'the deterministic (seeded) operation does not work with interpolation (PyTorch 1.7.1)'
        torch.set_deterministic(True)

    self.seed = seed
    self.append_seed = append_seed

    if exists(seed):
        print(f'setting seed of {seed}')
        if seed == 0:
            print('you can override this with --seed argument in the command line, or --random for a randomly chosen one')
        torch.manual_seed(seed)

    self.epochs = epochs
    self.iterations = iterations

    model = torch.nn.DataParallel(BigSleep(
        image_size = image_size,
        bilinear = bilinear,
        max_classes = max_classes,
        class_temperature = class_temperature,
        experimental_resample = experimental_resample,
        ema_decay = ema_decay,
        num_cutouts = num_cutouts,
        center_bias = center_bias,
    ).cuda())

    self.model =  (model)

    self.lr = lr
    self.optimizer = Adam(model.model.latents.model.parameters(), lr)
    self.gradient_accumulate_every = gradient_accumulate_every
    self.save_every = save_every

    self.save_progress = save_progress
    self.save_date_time = save_date_time

    self.save_best = save_best
    self.current_best_score = 0

    self.open_folder = open_folder
    self.total_image_updates = (self.epochs * self.iterations) / self.save_every
    self.encoded_texts = {
        "max": [],
        "min": []
    }
    # create img transform
    self.clip_transform = create_clip_img_transform(224)
    # create starting encoding
    self.set_clip_encoding(text=text, img=img, encoding=encoding, text_min=text_min)

@property
def seed_suffix(self):
    return f'.{self.seed}' if self.append_seed and exists(self.seed) else ''

def set_text(self, text):
    self.set_clip_encoding(text = text)

def create_clip_encoding(self, text=None, img=None, encoding=None):
    self.text = text
    self.img = img
    if encoding is not None:
        encoding = torch.nn.DataParallel(encoding.cuda())
    #elif self.create_story:
    #    encoding = self.update_story_encoding(epoch=0, iteration=1)
    elif text is not None and img is not None:
        encoding = (self.create_text_encoding(text) + self.create_img_encoding(img)) / 2
    elif text is not None:
        encoding = self.create_text_encoding(text)
    elif img is not None:
        encoding = self.create_img_encoding(img)
    return encoding

def create_text_encoding(self, text):
    tokenized_text = torch.nn.DataParallel(tokenize(text).cuda())
    with torch.no_grad():
        text_encoding = perceptor.encode_text(tokenized_text).detach()
    return text_encoding

def create_img_encoding(self, img):
    if isinstance(img, str):
        img = Image.open(img)
    normed_img = torch.nn.DataParallel(self.clip_transform(img).unsqueeze(0).cuda())
    with torch.no_grad():
        img_encoding = perceptor.encode_image(normed_img).detach()
    return img_encoding


def encode_multiple_phrases(self, text, img=None, encoding=None, text_type="max"):
    if text is not None and "|" in text:
        self.encoded_texts[text_type] = [self.create_clip_encoding(text=prompt_min, img=img, encoding=encoding) for prompt_min in text.split("|")]
    else:
        self.encoded_texts[text_type] = [self.create_clip_encoding(text=text, img=img, encoding=encoding)]

def encode_max_and_min(self, text, img=None, encoding=None, text_min=""):
    self.encode_multiple_phrases(text, img=img, encoding=encoding)
    if text_min is not None and text_min != "":
        self.encode_multiple_phrases(text_min, img=img, encoding=encoding, text_type="min")

def set_clip_encoding(self, text=None, img=None, encoding=None, text_min=""):
    self.current_best_score = 0
    self.text = text
    self.text_min = text_min
    
    if len(text_min) > 0:
        text = text + "_wout_" + text_min[:255] if text is not None else "wout_" + text_min[:255]
    text_path = create_text_path(text=text, img=img, encoding=encoding)
    if self.save_date_time:
        text_path = datetime.now().strftime("%y%m%d-%H%M%S-") + text_path

    self.text_path = text_path
    self.filename = Path(f'./{text_path}{self.seed_suffix}.png')
    self.encode_max_and_min(text, img=img, encoding=encoding, text_min=text_min) # Tokenize and encode each prompt

def reset(self):
    self.model.reset()
    self.model =  torch.nn.DataParallel(self.model.cuda())
    self.optimizer = Adam(self.model.model.latents.parameters(), self.lr)

def train_step(self, epoch, i, pbar=None):
    total_loss = 0

    for _ in range(self.gradient_accumulate_every):
        out, losses = self.model(self.encoded_texts["max"], self.encoded_texts["min"])
        loss = sum(losses) / self.gradient_accumulate_every
        total_loss += loss
        loss.backward()

    self.optimizer.step()
    self.model.model.latents.update()
    self.optimizer.zero_grad()

    if (i + 1) % self.save_every == 0:
        with torch.no_grad():
            self.model.model.latents.eval()
            out, losses = self.model(self.encoded_texts["max"], self.encoded_texts["min"])
            top_score, best = torch.topk(losses[2], k=1, largest=False)
            image = self.model.model()[best].cpu()
            self.model.model.latents.train()

            save_image(image, str(self.filename))
            if pbar is not None:
                pbar.update(1)
            else:
                print(f'image updated at "./{str(self.filename)}"')

            if self.save_progress:
                total_iterations = epoch * self.iterations + i
                num = total_iterations // self.save_every
                save_image(image, Path(f'./{self.text_path}.{num}{self.seed_suffix}.png'))

            if self.save_best and top_score.item() < self.current_best_score:
                self.current_best_score = top_score.item()
                save_image(image, Path(f'./{self.text_path}{self.seed_suffix}.best.png'))

    return out, total_loss

def forward(self):
    penalizing = ""
    if len(self.text_min) > 0:
        penalizing = f'penalizing "{self.text_min}"'
    print(f'Imagining "{self.text_path}" {penalizing}...')
    
    with torch.no_grad():
        self.model(self.encoded_texts["max"][0]) # one warmup step due to issue with CLIP and CUDA

    if self.open_folder:
        open_folder('./')
        self.open_folder = False

    image_pbar = tqdm(total=self.total_image_updates, desc='image update', position=2, leave=True)
    for epoch in trange(self.epochs, desc = '      epochs', position=0, leave=True):
        pbar = trange(self.iterations, desc='   iteration', position=1, leave=True)
        image_pbar.update(0)
        for i in pbar:
            out, loss = self.train_step(epoch, i, image_pbar)
            pbar.set_description(f'loss: {loss.item():04.2f}')

            if terminate:
                print('detecting keyboard interrupt, gracefully exiting')
                return

But i recive this error
`UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
" silence this warning)", UserWarning)
Traceback (most recent call last):
File "/home/jovyan/SberArtist/big-sleep/big_sleep/clip.py", line 101, in load
model = torch.jit.load(model_path, map_location=device if jit else "cpu").eval()
File "/home/user/conda/lib/python3.7/site-packages/torch/jit/init.py", line 275, in load
cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
RuntimeError:

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor):
Expected at most 12 arguments but found 13 positional arguments.`

Backslash character in file names.

I'm now getting file names containing "\" which were previously removed automatically. This is causing havoc when I attempt to use Windows to read files over the network which were created on my Linux system.

Difference between DALL-E and a clip-steered GAN e.g. big-sleep

Can someone explain to me the difference between both approaches?

They both are generating image content given a text input do they not?

And what is then the difference between big sleep and deep daze for that matter?

Implementing StyleGAN2

How should I go about to replace BigGAN with your StyleGAN2 package?

Possible to train on custom dataset?

Is that possible to train the model using custom dataset with pretrained model?

Possible bug in latent vector loss calculation?

I'm confused by this, and wondering if it could be a bug? It seems as though latents is of size (32,128), which means that for array in latents: iterates 32 times. However, the results from these iterations aren't stored anywhere, so they are at best a waste of time and at worst causing a miscalculation. Perhaps the intention was to accumulate the kurtoses and skews for each array in latents, and then computing lat_loss using all the accumulated values?

for array in latents:
    mean = torch.mean(array)
    diffs = array - mean
    var = torch.mean(torch.pow(diffs, 2.0))
    std = torch.pow(var, 0.5)
    zscores = diffs / std
    skews = torch.mean(torch.pow(zscores, 3.0))
    kurtoses = torch.mean(torch.pow(zscores, 4.0)) - 3.0

lat_loss = lat_loss + torch.abs(kurtoses) / num_latents + torch.abs(skews) / num_latents

Occurs at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/big_sleep.py#L211

RuntimeError: Method 'forward' is not defined

Hi, I've just started with big_sleep but just trying to run the demo code:

from big_sleep import Imagine

dream = Imagine(
text = "fire in the sky",
lr = 5e-2,
save_every = 25,
save_progress = True
)

dream()

I keep getting the error "method forward is not defined"

I looked it up and found this thread but it seemed the solution was to update to 0.7.1, and I'm using 0.8.5, any ideas? Thanks!

Segmentation fault (Core dumped)

Every install results in this issue. Other installs seem to run fine.

Trying to run in a Jetson AGX Xavier w/Ubuntu 18.04

Any insight?

Version error importing urllib3 in first cell of Simplified Colab notebook

In the first cell of the "Simplified" Colab notebook, there is a complaint about the version of urllib3 which is being imported:

ERROR: botocore 1.20.51 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.

Thanks for your great work.