GithubHelp home page GithubHelp logo

castorini / w1kp Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 194 KB

w1kp: Toolkit for analyzing perceptual variability in text-to-image generation.

Home Page: http://w1kp.com

License: MIT License

Python 100.00%
ai diffusion foundation-models genai ml t2i

w1kp's Introduction

: An Image Set Variability Metric

Website Citation PyPi version Downloads

As proposed in our paper, the "Words of a Thousand Pictures" metric (W1KP) measures perceptual variability for sets of images in text-to-image generation, bootstrapped from existing perceptual distances such as DreamSim.

Getting Started

Installation

  1. Install PyTorch for your Python 3.10+ environment.

  2. Install W1KP: pip install w1kp

  3. Download the calibration data file.

  4. You're done!

Sample Library Usage

We recommend $\text{DreamSim}_{\ell_2}$, the best-performing perceptual distance backbone in our paper.

import asyncio

import torch
from w1kp import StableDiffusionXLImageGenerator, DreamSimDistanceMeasure, query_inverted_cdf


async def amain():
  # Generate 10 SDXL images for a prompt
  prompt = 'cat'
  images = []
  image_gen = StableDiffusionXLImageGenerator()

  for seed in range(10):
    ret = await image_gen.generate_image(prompt, seed=seed)
    images.append(ret['image'])

  # Compute and normalize the W1KP score
  dreamsim_l2 = DreamSimDistanceMeasure().to_listwise()
  w1kp_score = dreamsim_l2.measure(images)
  cdf_x, cdf_y = torch.load('cdf-xy.pt')  # download this data file from the repo

  dist = dreamsim_l2.measure(prompt, images)
  dist = query_inverted_cdf(cdf_x, cdf_y, dist)  # normalize to U[0, 1]
  w1kp_score = 1 - dist  # invert for the W1KP score

  for im in images:
    im.show()

  print(f'The W1KP score for the images are {w1kp_score}')
  

if __name__ == '__main__':
  asyncio.run(amain())

Citation

@article{tang2024w1kp,
  title={Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation},
  author={Tang, Raphael and Zhang, Xinyu and Xu, Lixinyu and Lu, Yao and Li, Wenyan and Stenetorp, Pontus and Lin, Jimmy and Ture, Ferhan},
  journal={arXiv:2210.04885},
  year={2024}
}

w1kp's People

Contributors

daemon avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.