DSL-23-1-modeling-Stable-Diffusion

Member: 남승우, 신소연, 안세정, 정건우

Overview

task

Reverse the typical direction of a generative text-to-image model(Stable Diffusion 2.0)

Create the model that can predict the text prompt with (text, image) pairs generated by Stable Diffusion 2.0

Make the prediction and compare the cosine similarity with the (text, image) pairs

Model

BLIP-2

#### pretrained models(Colab) : blip2-flan-t5-xl

pretrained models(Kaggle) : blip2-opt-2.7b - limited RAM capacity

Usage

!pip install transformers #(Colab)
from transformers import Blip2Processor, BlipForConditionalGeneration
processor = Blip2Processor.from_pretrained('Salesforce/blip-flan-t5-xl')
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-flan-t5-xl')

CLIP

pretrained models : CLIP-ViT-H-14-laion-s32B-b79k

Usage

!pip install open_clip_torch
!pip install clip-interrogator==0.6.0
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained = 'laion2b-s32B-b79k)
tokenizer = open_clip.get_tokenizer('ViT-H-14')

Results

Predicted prompt : there is a picture of a circular shaped object in the middle of a pictureconcept art, conceptual art, crater, studying a hell open rift portal, abstract holescape

Predicted prompt : drawing of a robot toy robot with a robot on it's sidea screenprint, art brut, ((robot)), robot cat, robot design

images = os.listdir('/kaggle/input/stable-diffusion-image-to-prompts/images/')
imgIds = [i.split('.')[0] for i in images]
EMBEDDING_LENGTH = 384
eIds = list(range(EMBEDDING_LENGTH))

imgId_eId = [
    '_'.join(map(str, i)) for i in zip(
        np.repeat(imgIds, EMBEDDING_LENGTH),
        np.tile(range(EMBEDDING_LENGTH), len(imgIds)))]

assert sorted(imgId_eId) == sorted(submission.imgId_eId)
ground_truth = pd.read_csv('/kaggle/input/stable-diffusion-image-to-prompts/prompts.csv')
ground_truth = pd.merge(pd.DataFrame(imgIds, columns = ['imgId']), ground_truth, 
                        on = 'imgId', how = 'left')
ground_truth_embeddings = st_model.encode(ground_truth.prompt).flatten()
gte = pd.DataFrame(
    index = imgId_eId,
    data = ground_truth_embeddings,
    columns = ['val']
).rename_axis('imgId_eId')

from scipy import spatial
vec1 = gte['val']
vec2 = submission['val']
cos_sim = 1 - spatial.distance.cosine(vec1, vec2)
print(cos_sim)

Similarity with the Dataset pairs : 0.5303403735160828

File Description

models

data

images - example images of Stable Diffusion competition
sample_submission.csv - example submission of the competition
prompts.csv - prompts images are made from

modules

BLIP2_CLIP_model_c.ipynb - for Colab environment, selecting the pretrained model of BLIP2, CLIP
BLIP2_CLIP_model_k.ipynb - for kaggle competition

results

submission.csv - predicted prompts from using the manipulated modules

namu-tree / dsl-23-1-modeling--stable-diffusion- Goto Github PK

dsl-23-1-modeling--stable-diffusion-'s Introduction

DSL-23-1-modeling-Stable-Diffusion

Member: 남승우, 신소연, 안세정, 정건우

Overview

task

Model

BLIP-2

pretrained models(Kaggle) : blip2-opt-2.7b - limited RAM capacity

Usage

CLIP

pretrained models : CLIP-ViT-H-14-laion-s32B-b79k

Usage

Results

File Description

models

results

dsl-23-1-modeling--stable-diffusion-'s People

Contributors

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

Jobs