auto1111sdk / auto1111sdk Goto Github PK

An SDK/Python library for Automatic 1111 to run state-of-the-art diffusion models

License: GNU Affero General Public License v3.0

Python 96.50% Roff 0.14% C++ 1.11% Cuda 2.05% Shell 0.05% Dockerfile 0.01% CMake 0.08% CSS 0.05%

ai ai-art api automatic1111 deep-learning diffusers diffusion image-generation image-to-image img2img python pytorch stable-diffusion stable-diffusion-webui text-to-image torch txt2img unstable upscaling web

auto1111sdk's Introduction

Auto 1111 SDK: Stable Diffusion Python library

Auto 1111 SDK is a lightweight Python library for using Stable Diffusion generating images, upscaling images, and editing images with diffusion models. It is designed to be a modular, light-weight Python client that encapsulates all the main features of the [Automatic 1111 Stable Diffusion Web Ui](https://github.com/AUTOMATIC1111/stable-diffusion-webui). Auto 1111 SDK offers 3 main core features currently:

Text-to-Image, Image-to-Image, Inpainting, and Outpainting pipelines. Our pipelines support the exact same parameters as the Stable Diffusion Web UI, so you can easily replicate creations from the Web UI on the SDK.
Upscaling Pipelines that can run inference for any Esrgan or Real Esrgan upscaler in a few lines of code.
An integration with Civit AI to directly download models from the website.

Join our Discord!!

Demo

We have a colab demo where you can run many of the operations of Auto 1111 SDK. Check it out here!!

Installation

We recommend installing Auto 1111 SDK in a virtual environment from PyPI. Right now, we do not have support for conda environments yet.

pip3 install auto1111sdk

To install the latest version of Auto 1111 SDK (with controlnet now included), run:

pip3 install git+https://github.com/saketh12/Auto1111SDK.git

Quickstart

Generating images with Auto 1111 SDK is super easy. To run inference for Text-to-Image, Image-to-Image, Inpainting, Outpainting, or Stable Diffusion Upscale, we have 1 pipeline that can support all these operations. This saves a lot of RAM from having to create multiple pipeline objects with other solutions.

from auto1111sdk import StableDiffusionPipeline

pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>")

prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)

output[0].save("image.png")

Controlnet

Right now, Controlnet only works with fp32. We are adding support for fp16 very soon.

from auto1111sdk import StableDiffusionPipeline
from auto1111sdk import ControlNetModel

model = ControlNetModel(model="<THE CONTROLNET MODEL FILE NAME (WITHOUT EXTENSION)>", 
                   image="<PATH TO IMAGE>")

pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>", controlnet=model)

prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)

output[0].save("image.png")

Running on Windows

Find the instructions here. Contributed by by Marco Guardigli, [email protected]

Documentation

We have more detailed examples/documentation of how you can use Auto 1111 SDK here. For a detailed comparison between us and Huggingface diffusers, you can read this.

For a detailed guide on how to use SDXL, we recommend reading this

Features

Original txt2img and img2img modes
Real ESRGAN upscale and Esrgan Upscale (compatible with any pth file)
Outpainting
Inpainting
Stable Diffusion Upscale
Attention, specify parts of text that the model should pay more attention to
- a man in a ((tuxedo)) - will pay more attention to tuxedo
- a man in a (tuxedo:1.21) - alternative syntax
- select text and press Ctrl+Up or Ctrl+Down (or Command+Up or Command+Down if you're on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
Composable Diffusion: a way to use multiple prompts at once
- separate prompts using uppercase AND
- also supports weights for prompts: a cat :1.2 AND a dog AND a penguin :2.2
Works with a variety of samplers
Download models directly from Civit AI and RealEsrgan checkpoints
Set custom VAE: works for any model including SDXL
Support for SDXL with Stable Diffusion XL Pipelines
Pass in custom arguments to the models
No 77 prompt token limit (unlike Huggingface Diffusers, which has this limit)

Roadmap

Adding support Hires Fix and Refiner parameters for inference.
Adding support for Lora's
Adding support for Face restoration
Adding support for Dreambooth training script.
Adding support for custom extensions like Controlnet.

We will be adding support for these features very soon. We also accept any contributions to work on these issues!

Contributing

Auto1111 SDK is continuously evolving, and we appreciate community involvement. We welcome all forms of contributions - bug reports, feature requests, and code contributions.

Report bugs and request features by opening an issue on Github. Contribute to the project by forking/cloning the repository and submitting a pull request with your changes.

Credits

Licenses for borrowed code can be found in Settings -> Licenses screen, and also in html/licenses.html file.

Automatic 1111 Stable Diffusion Web UI - https://github.com/AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion - https://github.com/Stability-AI/stablediffusion, https://github.com/CompVis/taming-transformers
k-diffusion - https://github.com/crowsonkb/k-diffusion.git
ESRGAN - https://github.com/xinntao/ESRGAN
MiDaS - https://github.com/isl-org/MiDaS
Ideas for optimizations - https://github.com/basujindal/stable-diffusion
Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
Sub-quadratic Cross Attention layer optimization - Alex Birch (Birch-san/diffusers#1), Amin Rezaei (https://github.com/AminRezaei0x443/memory-efficient-attention)
Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
xformers - https://github.com/facebookresearch/xformers
Sampling in float32 precision from a float16 UNet - marunine for the idea, Birch-san for the example Diffusers implementation (https://github.com/Birch-san/diffusers-play/tree/92feee6)

auto1111sdk's People

Contributors

Stargazers

Watchers

Forkers

gargantua-voided tuhinmallick conanak99 trucka312 siddhantx0 mgua elaina117 fastrocket pedx78 grayox thesea-ai gschleusner1972

auto1111sdk's Issues

Please use semver and add a CHANGELOG

Hi, it would be better if you keep a CHANGELOG.md markdown file to include the new features added to each version.
As a new user, I don't know which version I need to update to to use SDXL.

You can make it simple like this

# 0.10.0
- Add new featuer
- Fix blah blah

# 0.9
- Fix blah blah
- Add blah blah

I recommend using semver for versioning the package: https://semver.org/

Given a version number MAJOR.MINOR.PATCH, increment the:
- MAJOR version when you make incompatible API changes
- MINOR version when you add functionality in a backward compatible manner
- PATCH version when you make backward compatible bug fixes
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

feature request: seed travel / interpolation

It would be useful to be able to do seed travel, as in "orange tree" as the input prompt, with X number of frames interpolating between seed = 1 and seed = 2.

Furthermore, it would be great to have the ability to interpolate between different prompts over X number of frames.

I appreciate your work on this project regardless, thanks.

Controlnet support?

Any plan for it?
This project is great but without controlnet, many advanced users might be disappointed.

Possible pytorch-cuda compatibility issue on windows.

I am trying to reproduce your jupyter document (colabs) on my machine, using a local environment with ipython, interacted thru vscode. It works but does not use nvidia gpu acceleration because of issues with pytorch accessing nvidia cuda libraries. (I get a warning exception about 'torch not compiled with cuda enabled'). I have cuda 12.2 (as reported by nvidia-smi). Are there any cross-compat version requirements for pytorch and cuda?

How to get random seed value?

In diffusers, we can do the following to get the actual seed value if we pass in -1:

output = pipe.generate_txt2img(
    num_images=num_images,
    cfg_scale=cfg_scale,
    sampler_name=sampler_name,
    seed=seed,  # -1 for random
    prompt=llm_prompt + append_prompt,
    height=height,
    width=width,
    negative_prompt=negative_prompt,
    steps=steps
)

# After generation, retrieve the seed used
used_seed = pipe.scheduler.random_generator.seed

This doesn't work with this pipeline. How can we get the seed value?

Unity integration (GUI support for VR and Desktop)

I am working on a Unity VR project (modified Depthviewer repo) for viewing stable diffusion images in VR for the Quest 3 (uses Depthviewer modified with meta all-in-one sdk for the motion controls and Virtual Desktop) in mixed reality mode. Depthviewer converts 2d images/viseos with depth estimation models to 3d.

With this Github release if would it be possible for me to implement a sort if UI in Unity for prompting in VR/AR, that would be a game changer.

The new tik-tok "depth-anything" model (not shown in video) is pretty mind blowing, but its a huge pain to take the headset off in depthviewer to prompt again in automatic1111 or comfyui for a new image as it is now. (Ive gotten the process down to about 30 seconds now though)

the gpu vram stays same after certain generations

OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 101.06 MiB is free. Process 20868 has 14.65 GiB memory in use. Of the allocated memory 14.32 GiB is allocated by PyTorch, and 191.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

it stays in 14gb doesnt go down after certain generations

API Key support for civit_download

First off, I wanted to say that I really appreciate what you guys are doing! This is much easier to implement in projects than trying to use GUI's APIs. Now, onto the request:

Some models from CivitAI require an API key in order to download them. An option for including your own api key would be a nice thing to have to easily download any of these models.

Current errors when downloading these "restricted" models are:
ConnectionError: Error downloading the file: 401 Client Error: Unauthorized for url: https://civitai.com/api/download/models/329420

Thanks!

Use https://github.com/lllyasviel/stable-diffusion-webui-forge as backend?

Controlnet developer lllyasiviel just released this, which has better performance.

https://github.com/lllyasviel/stable-diffusion-webui-forge

Will you take a look?

Is there a way to load a VAE?

Nice work on the library. Both the upscaler and StableDiffusionPipeline works as expected 👌 .

I don't see VAE in the roadmap or limitation, will they be supported too?

Switching models within the same script

I have found that when I am trying to run a script that switches models at some point, the output images do not change. I believe this is because the next model isn't being loaded. Here is a simple script I made to test this:

from auto1111sdk import StableDiffusionXLPipeline

prompt = ("macro photo, a beautiful translucent (glass transparent [cat:frog"
          " macro:0.5]) that glows within made out of multicolored transparent, "
          "glowing lights, beautiful waterfall , magical sparkles,vibrant whimsical colors")

pipe = StableDiffusionXLPipeline("models/checkpoints/albedobase-sdxl.safetensors", "--skip-torch-cuda-test --medvram")
output = pipe.generate_txt2img(prompt,
                               num_images=1,
                               height=1024,
                               width=1024,
                               steps=25,
                               cfg_scale=4,
                               sampler_name="DPM++ 2M SDE Karras",
                               seed=1
                               )
output[0].save(f"temp_image_albedobase-sdxl.png")
print(f"Saved temp_image_albedobase-sdxl.png")
del pipe, output

pipe = StableDiffusionXLPipeline("models/checkpoints/DreamShaper-sdxl.safetensors", "--skip-torch-cuda-test --medvram")
output = pipe.generate_txt2img(prompt,
                               num_images=1,
                               height=1024,
                               width=1024,
                               steps=25,
                               cfg_scale=4,
                               sampler_name="DPM++ 2M SDE Karras",
                               seed=1
                               )
output[0].save(f"temp_image_DreamShaper-sdxl.png")
print(f"Saved temp_image_DreamShaper-sdxl.png")
del pipe, output
print("Finished")

Here are the result images:

When switching the order of the model loading and generating, I see the same behavior:

If this is intended behavior, I can open a feature request, as I think it would be nice to be able to swap models without having to stop the "application". Otherwise, if this is just a poor understanding of how this library works or bad programming on my part, let me know! I am pretty new to all of this.