GithubHelp home page GithubHelp logo

dsl-23-1-modeling--stable-diffusion-'s Introduction

DSL-23-1-modeling-Stable-Diffusion

Member: 남승우, 신소연, 안세정, 정건우

Overview

task

  • Reverse the typical direction of a generative text-to-image model(Stable Diffusion 2.0)
    • Create the model that can predict the text prompt with (text, image) pairs generated by Stable Diffusion 2.0
    • Make the prediction and compare the cosine similarity with the (text, image) pairs

Model

BLIP-2

drawing

#### pretrained models(Colab) : blip2-flan-t5-xl

pretrained models(Kaggle) : blip2-opt-2.7b - limited RAM capacity

Usage

!pip install transformers #(Colab)
from transformers import Blip2Processor, BlipForConditionalGeneration
processor = Blip2Processor.from_pretrained('Salesforce/blip-flan-t5-xl')
model = BlipForConditionalGeneration.from_pretrained('Salesforce/blip-flan-t5-xl')

CLIP

CLIP

pretrained models : CLIP-ViT-H-14-laion-s32B-b79k

Usage

!pip install open_clip_torch
!pip install clip-interrogator==0.6.0
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained = 'laion2b-s32B-b79k)
tokenizer = open_clip.get_tokenizer('ViT-H-14')

Results

  • Predicted prompt : there is a picture of a circular shaped object in the middle of a pictureconcept art, conceptual art, crater, studying a hell open rift portal, abstract holescape

  • Predicted prompt : drawing of a robot toy robot with a robot on it's sidea screenprint, art brut, ((robot)), robot cat, robot design
images = os.listdir('/kaggle/input/stable-diffusion-image-to-prompts/images/')
imgIds = [i.split('.')[0] for i in images]
EMBEDDING_LENGTH = 384
eIds = list(range(EMBEDDING_LENGTH))

imgId_eId = [
    '_'.join(map(str, i)) for i in zip(
        np.repeat(imgIds, EMBEDDING_LENGTH),
        np.tile(range(EMBEDDING_LENGTH), len(imgIds)))]

assert sorted(imgId_eId) == sorted(submission.imgId_eId)
ground_truth = pd.read_csv('/kaggle/input/stable-diffusion-image-to-prompts/prompts.csv')
ground_truth = pd.merge(pd.DataFrame(imgIds, columns = ['imgId']), ground_truth, 
                        on = 'imgId', how = 'left')
ground_truth_embeddings = st_model.encode(ground_truth.prompt).flatten()
gte = pd.DataFrame(
    index = imgId_eId,
    data = ground_truth_embeddings,
    columns = ['val']
).rename_axis('imgId_eId')

from scipy import spatial
vec1 = gte['val']
vec2 = submission['val']
cos_sim = 1 - spatial.distance.cosine(vec1, vec2)
print(cos_sim)
  • Similarity with the Dataset pairs : 0.5303403735160828

File Description

models

  1. data
  • images - example images of Stable Diffusion competition
  • sample_submission.csv - example submission of the competition
  • prompts.csv - prompts images are made from
  1. modules
  • BLIP2_CLIP_model_c.ipynb - for Colab environment, selecting the pretrained model of BLIP2, CLIP
  • BLIP2_CLIP_model_k.ipynb - for kaggle competition

results

  • submission.csv - predicted prompts from using the manipulated modules

dsl-23-1-modeling--stable-diffusion-'s People

Contributors

namu-tree avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.