GithubHelp home page GithubHelp logo

calculusoflambdas / aphantasia Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eps696/aphantasia

0.0 1.0 0.0 22.37 MB

CLIP+FFT text-to-image

License: MIT License

Jupyter Notebook 63.42% Python 36.58%

aphantasia's Introduction

Aphantasia

Open In Colab

This is a text-to-image tool, part of the artwork of the same name.
Based on CLIP model, with FFT parameterizer from Lucent library as a generator.
Updated: Check also colabs below, with VQGAN and SIREN+FFM generators.
Tested on Python 3.7, required PyTorch 1.7.1.

Aphantasia is the inability to visualize mental images, the deprivation of visual dreams.
The image in the header is generated by the tool from this word.

Features

  • generating massive detailed textures, a la deepdream
  • fast convergence!
  • fullHD/4K resolutions and above
  • various CLIP models (including multi-language from SBERT)
  • complex queries:
    • text and/or image as main prompts
    • additional text prompts for fine details and to subtract (avoid) topics
    • criteria inversion (show "the opposite")
  • continuous mode to process phrase lists (e.g. illustrating lyrics)
  • saving/loading parameters to resume processing

Setup CLIP et cetera:

pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

Operations

  • Generate an image from the text prompt (set the size as you wish):
python clip_fft.py -t "the text" --size 1280-720
  • Reproduce an image:
python clip_fft.py -i theimage.jpg --sync 0.4

If --sync X argument > 0, LPIPS loss is added to keep the composition similar to the original image.

You can combine both text and image prompts.
For non-English languages use either --multilang (multi-language CLIP model, trained with ViT) or --translate (Google translation, works with any visual model).

  • Set more specific query like this:
python clip_fft.py -t "macro figures" -t2 "micro details" -t0 "avoid this" --size 1280-720 
  • Other options:
    --model M selects one of the released CLIP visual models: ViT-B/32 (default), RN50, RN50x4, RN101.
    --align XX option is about composition (or sampling distribution, to be more precise): uniform is maybe the most adequate; overscan can make semi-seamless tileable textures.
    --overscan mode processes double-padded image to produce more uniform (and probably seamlessly tileable) textures. Omit it, if you need more centered composition.
    --steps N sets iterations count. 50-100 is enough for a starter; 500-1000 would elaborate it more thoroughly.
    --samples N sets amount of the image cuts (samples), processed at one step. With more samples you can set fewer iterations for similar result (and vice versa). 200/200 is a good guess. NB: GPU memory is mostly eaten by this count (not resolution)!
    --fstep N tells to save every Nth frame (useful with high iterations, default is 1).
    --decay X (compositional softness), --sharp X (sharpness), --colors X (saturation) and --contrast X may be useful, especially for ResNet models (they tend to burn the colors). try --decay 1.5 --colors 1.3 --contrast 1.1 or --decay 1.5 --colors 1.5 --contrast 0.9 --sharp 0.3 to see the difference.
    may be useful to increase sharpness, if the image became "myopic" after inducing decay. good start is ~0.3. affects other color parameters, better tweak them all together!
    --enhance X boosts training consistency (of simultaneous samples) and steps progress. good start is ~0.2.
    --notext X tries to remove "graffiti" by subtracting plotted text prompt. good start is ~0.1.
    --noise X adds some noise to the parameters, possibly making composition less clogged (in a degree).
    --lrate controls learning rate. The range is quite wide (tested at least within 0.001 to 10).
    --macro X (from 0 to 1) shifts generation to bigger forms and less disperse composition. should not be too close to 1, since the quality depends on the variety of samples.
    --prog sets progressive learning rate (from 0.1x to 2x of the one, set by lrate). it may boost macro forms creation in some cases (more here).
    --invert negates the whole criteria, if you fancy checking "totally opposite".
    --save_pt myfile.pt will save FFT parameters, to resume for next query with --resume myfile.pt.
    --verbose ('on' by default) enables some printouts and realtime image preview.

Continuous mode

Open In Colab

  • Make video from a text file, processing it line by line in one shot:
python illustra.py -i mysong.txt --size 1280-720 --length 155

This will first generate and save images for every text line (with sequences and training videos, as in single-image mode above), then render final video from those (mixing images in FFT space) of the length duration in seconds.

There is --keep X parameter, controlling how well the next line/image generation follows the previous. By default X = 0, and every frame is produced independently (i.e. randomly initiated). Setting it higher starts each generation closer to the average of previous runs, effectively keeping the compositions more similar and the transitions smoother. Safe values are < 0.5 (higher numbers may cause the imagery getting stuck). This behaviour depends on the input, so test with your prompts and see what's better in your case.

  • Make video from a directory with saved *.pt snapshots (just interpolate them):
python interpol.py -i mydir --length 155

Other generators

  • VQGAN from Taming Transformers
    Limited resolution (~800x600 max on Colab), but good coloring with fine details, one of the best methods available.
    Open In Colab

Credits

Based on CLIP model by OpenAI, the paper
Used FFT encoding from Lucent library

Thanks to Ryan Murdock, Jonathan Fly, Hannu Toyryla, @eduwatch2, torridgristle for ideas.

aphantasia's People

Contributors

calculusoflambdas avatar eps696 avatar interfect avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.