GithubHelp home page GithubHelp logo

kjsman / stable-diffusion-pytorch Goto Github PK

View Code? Open in Web Editor NEW
463.0 6.0 57.0 26 KB

Yet another PyTorch implementation of Stable Diffusion (probably easy to read)

License: MIT License

Python 89.27% Jupyter Notebook 10.73%
diffusion image-generation pytorch stable-diffusion

stable-diffusion-pytorch's People

Contributors

kjsman avatar mspronesti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stable-diffusion-pytorch's Issues

How are the models in data.zip made?

Thank you for making this repo its very educational. This minimal implementation is brilliant. The bigger SD repos are very hard to understand.

Did you have a script to convert them for official models like this one: https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.ckpt to the format you use in this repo?

Or are you using a model from some other source?

Are you using SD 1.5 model?

How hard is it to make this repo to use models trained by others? Like Inkpunk for example? https://huggingface.co/Envvi/Inkpunk-Diffusion/blob/main/Inkpunk-Diffusion-v2.ckpt

Image in latent space gets shifted during encoding.

I am using a simple red image as input:

red

from stable_diffusion_pytorch import pipeline
from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('red.png')]
images = pipeline.generate(prompts, input_images=input_images)
images[0].save('output.png')

But I am getting the input image shifted down 8px,8px and it generates ugly brown border:

output

I am pretty sure it happens during the Encode pass as its already shifter in latent space. Here is custom dumping of latent space to image:

encodeDecode

Some thing in the Encode pass that is shifting it by a pixel in the latent space. And I can't figure out what.

Should i scale input image

Hello, @kjsman, thanks for this easily readable implementation. I have a question: am I correct that I should scale the input with a sampler before passing the image to the U-Net model during the training process?

Is 4GB VRAM too small for this program?

Thanks for the implementation!
I got
OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 3.31 GiB already allocated; 0 bytes free; 3.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
for running demo.ipynb. Is there some solution to it?

query

I hope this message finds you well. I recently came across your repository for Stable Diffusion in PyTorch and I must say, your effort in making the codebase minimal, and easy to read is commendable. I am new to generative models and your implementation has piqued my interest.
I was wondering if you could provide some insights into the training process of your Stable Diffusion model. Specifically, I am curious about the following:

  1. Training Data: Could you please let me know on which dataset you trained your model? Understanding the dataset used would help me get a better understanding of the capabilities and limitations of the model.

  2. Training Time: I'm also interested to know how much time it took for your model to train. This information will help me gauge the computational requirements and plan accordingly for any experiments or projects involving Stable Diffusion.

Moreover, I would like to know more about your approach to writing this code. Did you primarily refer to research papers, or did you take inspiration from other implementations? For instance, you mentioned using Andrej Karpathy's miniGPT. Could you share your thought process behind choosing this reference or any other methods you considered during your implementation?
I greatly appreciate your assistance and expertise in this matter. Thank you for your time and for sharing your work with the community. I look forward to your response.

[Enhancement] automate weights download without user action

Hello @kjsman,
this is more a feature proposal than an actual issue. Instead of requiring the user to download and open the tar file containing the weights and the vocabulary from your huggingface hub repository, one can directly make the model_loader and the Tokenizer download and cache them.

For the first part, it only requires replacing torch.load(...) here (and for the other 3 functions in the same file) with

torch.hub.load_state_dict_from_url(weights_url, check_hash=True)

All it takes on your side is to upload on hugginface hub the 4 pt files (not in a zipped file) and thats' it.

As regards the tokenizer, just takes to add a default_bpe() method / function

@lru_cache()
def default_bpe():
    p = os.path.join(
        os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz"
    )

    if os.path.exists(p):
        return p
    else:
        p = urlretrieve(
            "https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw=true",
            "bpe_simple_vocab_16e6.txt.gz",
        )
        if len(p) != 1:
            # if it also contains the
            # HTTP message as second entry
            return p[0]
        else:
            return p

Another option is, if you prefer to keep your vocab.json and merges.txt, to upload them as well to Hugginface hub (not in a tar file) or directly to GitHub like the original reposiotry does with its vocab.

If you like it, I will open a new PR, otherwise please let me know if you have any better idea or close this issue if you are not interested in this feature 😄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.