GithubHelp home page GithubHelp logo

dangeng / visual_anagrams Goto Github PK

View Code? Open in Web Editor NEW
802.0 802.0 76.0 420.87 MB

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"

License: MIT License

Python 0.91% Shell 0.06% Jupyter Notebook 99.03%

visual_anagrams's People

Contributors

dangeng avatar jbochi avatar tmzh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visual_anagrams's Issues

Issue creating conda environment with freshly cloned repository

Hello, I'm running into an issue when cloning the repository and following the steps to install the conda environment. I've pasted the output below. Any help that you can provide is appreciated!

  • os: macos/arm64
  • conda version: 23.10.0
  • python: 3.11.5
conda env create -f environment.yml            
Channels:
 - pytorch
 - nvidia
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - _libgcc_mutex==0.1=main
  - _openmp_mutex==5.1=1_gnu
  - blas==1.0=mkl
  - bottleneck==1.3.5=py39h7deecbd_0
  - brotli==1.0.9=h9c3ff4c_4
  - brotlipy==0.7.0=py39h27cfd23_1003
  - bzip2==1.0.8=h7b6447c_0
  - ca-certificates==2023.7.22=hbcca054_0
  - cffi==1.15.1=py39h5eee18b_3
  - contourpy==1.0.5=py39hdb19cb5_0
  - cryptography==41.0.3=py39hdda0065_0
  - cuda-cudart==12.1.105=0
  - cuda-cupti==12.1.105=0
  - cuda-libraries==12.1.0=0
  - cuda-nvrtc==12.1.105=0
  - cuda-nvtx==12.1.105=0
  - cuda-opencl==12.2.140=0
  - cuda-runtime==12.1.0=0
  - cyrus-sasl==2.1.28=h52b45da_1
  - dbus==1.13.18=hb2f20db_0
  - expat==2.5.0=h6a678d5_0
  - ffmpeg==4.3=hf484d3e_0
  - fontconfig==2.14.1=h4c34cd2_2
  - freetype==2.12.1=h4a9f257_0
  - giflib==5.2.1=h5eee18b_3
  - glib==2.69.1=he621ea3_2
  - gmp==6.2.1=h295c915_3
  - gmpy2==2.1.2=py39heeb90bb_0
  - gnutls==3.6.15=he1e5248_0
  - gst-plugins-base==1.14.1=h6a678d5_1
  - gstreamer==1.14.1=h5eee18b_1
  - icu==58.2=hf484d3e_1000
  - idna==3.4=py39h06a4308_0
  - intel-openmp==2023.1.0=hdb19cb5_46305
  - jinja2==3.1.2=py39h06a4308_0
  - jpeg==9e=h5eee18b_1
  - kiwisolver==1.4.4=py39h6a678d5_0
  - krb5==1.20.1=h143b758_1
  - lame==3.100=h7b6447c_0
  - lcms2==2.12=h3be6417_0
  - ld_impl_linux-64==2.38=h1181459_1
  - lerc==3.0=h295c915_0
  - libclang==14.0.6=default_hc6dbbc7_1
  - libclang13==14.0.6=default_he11475f_1
  - libcublas==12.1.0.26=0
  - libcufft==11.0.2.4=0
  - libcufile==1.7.2.10=0
  - libcups==2.4.2=h2d74bed_1
  - libcurand==10.3.3.141=0
  - libcusolver==11.4.4.55=0
  - libcusparse==12.0.2.55=0
  - libdeflate==1.17=h5eee18b_1
  - libedit==3.1.20221030=h5eee18b_0
  - libevent==2.1.12=hdbd6064_1
  - libffi==3.4.4=h6a678d5_0
  - libgcc-ng==11.2.0=h1234567_1
  - libgfortran-ng==13.2.0=h69a702a_0
  - libgfortran5==13.2.0=ha4646dd_0
  - libgomp==11.2.0=h1234567_1
  - libiconv==1.16=h7f8727e_2
  - libidn2==2.3.4=h5eee18b_0
  - libjpeg-turbo==2.0.0=h9bf148f_0
  - libllvm14==14.0.6=hdb19cb5_3
  - libnpp==12.0.2.50=0
  - libnvjitlink==12.1.105=0
  - libnvjpeg==12.1.1.14=0
  - libpng==1.6.39=h5eee18b_0
  - libpq==12.15=hdbd6064_1
  - libstdcxx-ng==11.2.0=h1234567_1
  - libtasn1==4.19.0=h5eee18b_0
  - libtiff==4.5.1=h6a678d5_0
  - libunistring==0.9.10=h27cfd23_0
  - libuuid==1.41.5=h5eee18b_0
  - libwebp==1.3.2=h11a3e52_0
  - libwebp-base==1.3.2=h5eee18b_0
  - libxcb==1.15=h7f8727e_0
  - libxkbcommon==1.0.1=h5eee18b_1
  - libxml2==2.10.4=hcbfbd50_0
  - libxslt==1.1.37=h2085143_0
  - llvm-openmp==14.0.6=h9e868ea_0
  - lz4-c==1.9.4=h6a678d5_0
  - matplotlib==3.7.2=py39h06a4308_0
  - matplotlib-base==3.7.2=py39h1128e8f_0
  - mkl==2023.1.0=h213fc3f_46343
  - mkl-service==2.4.0=py39h5eee18b_1
  - mkl_fft==1.3.8=py39h5eee18b_0
  - mkl_random==1.2.4=py39hdb19cb5_0
  - mpc==1.1.0=h10f8cd9_1
  - mpfr==4.0.2=hb69a4c5_1
  - mpmath==1.3.0=py39h06a4308_0
  - mysql==5.7.24=h721c034_2
  - ncurses==6.4=h6a678d5_0
  - nettle==3.7.3=hbbd107a_1
  - networkx==3.1=py39h06a4308_0
  - numexpr==2.8.7=py39h85018f9_0
  - numpy==1.26.0=py39h5f9d8c6_0
  - numpy-base==1.26.0=py39hb5e798b_0
  - openh264==2.1.1=h4ff587b_0
  - openjpeg==2.4.0=h3ad879b_0
  - openssl==3.0.11=h7f8727e_2
  - pandas==2.1.1=py39h1128e8f_0
  - pcre==8.45=h9c3ff4c_0
  - pillow==10.0.1=py39ha6cbd5a_0
  - pip==23.2.1=py39h06a4308_0
  - pyopenssl==23.2.0=py39h06a4308_0
  - pyqt==5.15.7=py39h6a678d5_1
  - pyqt5-sip==12.11.0=py39h6a678d5_1
  - pysocks==1.7.1=py39h06a4308_0
  - python==3.9.18=h955ad1f_0
  - pytorch==2.1.0=py3.9_cuda12.1_cudnn8.9.2_0
  - pytorch-cuda==12.1=ha16c6d3_5
  - qt-main==5.15.2=h7358343_9
  - qt-webengine==5.15.9=hbbf29b9_6
  - qtwebkit==5.212=h3fafdc1_5
  - readline==8.2=h5eee18b_0
  - requests==2.31.0=py39h06a4308_0
  - scipy==1.11.3=py39h5f9d8c6_0
  - setuptools==68.0.0=py39h06a4308_0
  - sip==6.6.2=py39h6a678d5_0
  - sqlite==3.41.2=h5eee18b_0
  - statsmodels==0.14.0=py39ha9d4c09_0
  - tbb==2021.8.0=hdb19cb5_0
  - tk==8.6.12=h1ccaba5_0
  - torchaudio==2.1.0=py39_cu121
  - torchtriton==2.1.0=py39
  - torchvision==0.16.0=py39_cu121
  - tornado==6.1=py39hb9d737c_3
  - typing_extensions==4.7.1=py39h06a4308_0
  - wheel==0.41.2=py39h06a4308_0
  - xz==5.4.2=h5eee18b_0
  - yaml==0.2.5=h7b6447c_0
  - zlib==1.2.13=h5eee18b_0
  - zstd==1.5.5=hc292b87_0

Current channels:

  - https://conda.anaconda.org/pytorch/osx-arm64
  - https://conda.anaconda.org/nvidia/osx-arm64
  - https://conda.anaconda.org/conda-forge/osx-arm64
  - https://repo.anaconda.com/pkgs/main/osx-arm64
  - https://repo.anaconda.com/pkgs/r/osx-arm64

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

animate.py fails in example case

Running locally on an NVIDIA 3080 Max-Q with 16 GB VRAM

python generate.py --name rotate_cw.village.horse --prompts "a snowy mountain village" "a horse" --style "an oil painting of" --views identity rotate_cw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0

runs as expected but

python animate.py --im_path results/rotate_cw.village.horse/0000/sample_256.png --metadata_path results/rotate_cw.village.horse/metadata.pkl

fails with the error

100%|████████████████████████████████████████████████████████████████████████| 45/45 [00:00<00:00, 352.56it/s]
Making video...
Traceback (most recent call last):
File "/home/dan/anagram/visual_anagrams/animate.py", line 169, in
animate_two_view(
File "/home/dan/anagram/visual_anagrams/animate.py", line 123, in animate_two_view
imageio.mimsave(save_video_path, image_array, fps=30)
File "/home/dan/miniconda3/envs/visual_anagrams/lib/python3.9/site-packages/imageio/v2.py", line 494, in mimwrite
with imopen(uri, "wI", **imopen_args) as file:
File "/home/dan/miniconda3/envs/visual_anagrams/lib/python3.9/site-packages/imageio/core/imopen.py", line 281, in imopen
raise err_type(err_msg)
ValueError: Could not find a backend to open results/rotate_cw.village.horse/0000/sample_256.mp4`` with iomode wI`.
Based on the extension, the following plugins might add capable backends:
FFMPEG: pip install imageio[ffmpeg]
pyav: pip install imageio[pyav]

despite having imageio with the ffmpeg and pyav plugins installed in the conda environment.

NameError: name 'stage_1' is not defined

When running the cell:

image_64 = sample_stage_1(stage_1, prompt_embeds, negative_prompt_embeds, views, num_inference_steps=30, guidance_scale=10.0, reduction='mean', generator=None) mp.show_images([im_to_np(view.view(image_64[0])) for view in views])

There is an error:

NameError Traceback (most recent call last)
in <cell line: 1>()
----> 1 image_64 = sample_stage_1(stage_1,
2 prompt_embeds,
3 negative_prompt_embeds,
4 views,
5 num_inference_steps=30,

NameError: name 'stage_1' is not defined

requirement `clip=-1.0` causing installation failure

When installing this repository with

conda env create -f environment.yml

I get the error

ERROR: Could not find a version that satisfies the requirement clip==1.0 (from versions: 0.0.1, 0.1.0, 0.2.0)

When removing the line - clip==1.0 from environment.yml, it works fine.

For pip, by default, clip refers to an old clipboard manager: https://pypi.org/project/clip/
If you mean OpenAI CLIP, maybe the line should be changed to reference git+https://github.com/openai/CLIP.git.

However, because it doesn't seem to be needed, I would suggest removing this line.

Negative prompts?

Hi! Firstly, many thanks for sharing this project - it's fascinating! :)

I'm trying to understand how/if negative prompts can be added to better guide the generation of each view... but I'm having some issues with my understanding. As far as I can see, the generate.py script generates a list of matching positive and negative prompt embeddings from the supplied command line prompts:

prompts = [f'{args.style} {p}'.strip() for p in args.prompts]
prompt_embeds = [stage_1.encode_prompt(p) for p in prompts]
prompt_embeds, negative_prompt_embeds = zip(*prompt_embeds)
prompt_embeds = torch.cat(prompt_embeds)
negative_prompt_embeds = torch.cat(negative_prompt_embeds)  # These are just null embeds

(

prompts = [f'{args.style} {p}'.strip() for p in args.prompts]
prompt_embeds = [stage_1.encode_prompt(p) for p in prompts]
prompt_embeds, negative_prompt_embeds = zip(*prompt_embeds)
prompt_embeds = torch.cat(prompt_embeds)
negative_prompt_embeds = torch.cat(negative_prompt_embeds) # These are just null embeds
)

The final comment suggests that the negative prompt embedding is null, but the diffusers library states:

negative_prompt_embeds (`torch.Tensor`, *optional*):
  Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
  weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
  argument.

https://github.com/huggingface/diffusers/blob/fdb1baa05c8da5b4ed3e7a62200f406dcb26ba79/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L333C1-L336C26

So... should I be providing negative_prompt input to sample_stage_1 instead? Any advice, or an example of how to add negative prompts would be gratefully appreciated!

Higher resolution output

256px seems to be the maximum size of an image. Is there any argument that we can pass for higher resolution output?

Optimal parameters?

Hi!
First of all, thanks for open-sourcing this amazing work!

I am currently trying the Colab notebook and the results obtained so far don't match the fidelty of the results on the X announcement thread, any tips/tweaks to get the same quality of results?

image

Dataset details

Interesting work!

Is there any plan to release the training dataset and code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.