Light

samedii / perceptor Goto Github PK

View Code? Open in Web Editor NEW

5.0 2.0 1.0 4.12 MB

Modular image generation library

License: Other

Python 91.59% C++ 2.05% Cuda 6.36%

pytorch text-to-image-synthesis vq-vae guided-diffusion style-transfer stable-diffusion

perceptor's Issues

Let velocity_diffusion.schedule_ts take timesteps as arguments instead of sigma

Watercolor diffusion model

https://github.com/KaliYuga-ai/Pixel-Art-Diffusion/blob/main/Watercolor_Diffusion_v1_0.ipynb

OpenCLIP ViT-B/16+ 240x240 model

Other checkpoints are already added but a larger model was trained with a larger resolution too
https://github.com/mlfoundations/open_clip

Pixel art diffusion model

https://github.com/KaliYuga-ai/Pixel-Art-Diffusion

StyleGAN Human models

https://stylegan-human.github.io/

ViT-L aesthetic embeddings

Used here
https://github.com/multimodalart/majesty-diffusion

GroupViT model

https://huggingface.co/docs/transformers/model_doc/groupvit

Panini face restoration model

Port pytorch model https://github.com/jianzhangcs/panini

Text off vectors are not included in package

FileNotFoundError: [Errno 2] No such file or directory: 'perceptor/losses/clip/vectors/textoff.json'

Add diffusers face model

https://github.com/huggingface/diffusers

from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline

model_id = "google/ddpm-celebahq-256"

# load model and scheduler
ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference

Latent diffusion inpainting model

Three other models from this repo have already been added (super resolution, text2image, and text2image finetuned on no watermarks)
https://github.com/CompVis/latent-diffusion

OWL-ViT perceptor

https://huggingface.co/docs/transformers/model_doc/owlvit

Stable diffusion model

https://github.com/pesser/stable-diffusion

Deep image prior drawer

https://github.com/crowsonkb/deep-image-prior

CLOOB perceptor

Choose between
https://github.com/crowsonkb/cloob-training
and
https://github.com/ml-jku/cloob

Optimize stable diffusion

Might be merged into diffusers
huggingface/diffusers#532

https://github.com/neonsecret/stable-diffusion

https://github.com/TheLastBen/fast-stable-diffusion

Only inference x2.4
https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion

FLAVA model

https://huggingface.co/docs/transformers/model_doc/flava

from PIL import Image
import requests
from transformers import FlavaProcessor, FlavaModel

model = FlavaModel.from_pretrained("facebook/flava-full")
processor = FlavaProcessor.from_pretrained("facebook/flava-full")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.contrastive_logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

DeFILIP perceptor

https://github.com/Sense-GVT/DeCLIP

FILIP and DeCLIP may as well be added at the same time.

Simulacra aesthetic model

https://github.com/crowsonkb/simulacra-aesthetic-models

CLIP style transfer

https://github.com/cyclomon/CLIPstyler

Investigate if decorator.py or wrapt help intellisense with cache decorator

https://github.com/micheles/decorator/

https://github.com/GrahamDumpleton/wrapt

UViM model

https://paperswithcode.com/paper/uvim-a-unified-modeling-approach-for-vision
https://github.com/google-research/big_vision

Improved VQ diffusion model

https://github.com/cientgu/VQ-Diffusion/tree/Improved_VQ-Diffusion

Use predicted noise as base when taking steps in latent diffusion

We get numerical errors when alpha/sigma is close to zero otherwise

Additionally

Refactor and rename eps to "predicted noise"
Refactor to use "prediction" class as output

LANCZOS default when downsampling

LANCZOS is supposed to be good for downsampling

Do not load stable diffusion via pipeline

Diff2x model

https://github.com/peterwilli/Diff2X

Background removal models

https://github.com/xuebinqin/U-2-Net
https://github.com/PeterL1n/RobustVideoMatting
https://colab.research.google.com/drive/1bLn5FOR6YSTl0ca3YL3RcDdI2rcvF9Ll

Dynamic thresholding

Compare thresholding implementation with majesty diffusion
https://github.com/multimodalart/majesty-diffusion

Replace stable diffusion decoder

https://huggingface.co/stabilityai/sd-vae-ft-mse

Migrate to transformers implementation of CLIP

https://github.com/huggingface/transformers

Allows returning last hidden layer that is used in stable diffusion.

Composable stable diffusion

https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/

https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch

https://huggingface.co/spaces/Shuang59/Composable-Diffusion

Multilingual CLIP model

https://huggingface.co/M-CLIP/XLM-Roberta-Large-Vit-B-16Plus

spherical loss for blip

I noticed the use of the spherical loss in the BLIP files. What is the gain you get from this loss?

CLIP interrogator

Pulp sci-fi diffusion model

https://github.com/KaliYuga-ai/Pulp-Sci-Fi-Diffusion/blob/main/Pulp_Sci_Fi_Diffusion_v1_0.ipynb

Noised CLIP

Add CLIP model from here
https://github.com/openai/glide-text2im

that has been trained to handle noisy images.

StyleGAN XL 1024x1024 model

256x256 model already added. Should add 512 and 1024 checkpoints for imagenet and ffhq.

https://github.com/autonomousvision/stylegan_xl

Conditioned prior model

https://huggingface.co/nousr/conditioned-prior
https://github.com/laion-ai/deep-image-diffusion-prior

Allows converting text embeddings to image embeddings

LAION aesthetic predictor models

https://github.com/LAION-AI/aesthetic-predictor

Handpainted CG diffusion model

https://github.com/FeiArt-Ai/Handpainted-CG-Diffusion/tree/474669d3fa42b0b9666e75eb716bb4efbef8caa9

CLOOB latent diffusion models

https://github.com/JD-P/cloob-latent-diffusion

Stable diffusion inpainting model

Prompt-to-prompt with stable diffusion / cross attention control

https://github.com/google/prompt-to-prompt

Add guided diffusion portrait model

alembics/disco-diffusion@d6d68b0#commitcomment-79429271
https://huggingface.co/felipe3dartist/portrait_generator_v1.5/blob/main/ema_0.9999_165000.pt

LiT perceptor

Port models from this jax repo https://github.com/google-research/vision_transformer#lit-models

Openclip ViT-L/14 perceptor

Added in v1.3.0
https://github.com/mlfoundations/open_clip

Waifu stable diffusion

https://huggingface.co/hakurei/waifu-diffusion/tree/main

Another aesthetic predictor

https://github.com/christophschuhmann/improved-aesthetic-predictor

CLIPSeg

https://github.com/timojl/clipseg

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs