dangeng / visual_anagrams Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
License: MIT License
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
License: MIT License
I can run the example scripts but I only get static dual image results.
How can I generate the animated/rotating gif examples you show in the git repo and https://dangeng.github.io/visual_anagrams/ page.
Hi,
Is there an alternative we can use instead of DeepFloyd? thanks
Can you please share the code on Random Orthogonal transformation in Figure 7 in the paper?
Many thanks
Hello, I'm running into an issue when cloning the repository and following the steps to install the conda environment. I've pasted the output below. Any help that you can provide is appreciated!
conda env create -f environment.yml
Channels:
- pytorch
- nvidia
- conda-forge
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- _libgcc_mutex==0.1=main
- _openmp_mutex==5.1=1_gnu
- blas==1.0=mkl
- bottleneck==1.3.5=py39h7deecbd_0
- brotli==1.0.9=h9c3ff4c_4
- brotlipy==0.7.0=py39h27cfd23_1003
- bzip2==1.0.8=h7b6447c_0
- ca-certificates==2023.7.22=hbcca054_0
- cffi==1.15.1=py39h5eee18b_3
- contourpy==1.0.5=py39hdb19cb5_0
- cryptography==41.0.3=py39hdda0065_0
- cuda-cudart==12.1.105=0
- cuda-cupti==12.1.105=0
- cuda-libraries==12.1.0=0
- cuda-nvrtc==12.1.105=0
- cuda-nvtx==12.1.105=0
- cuda-opencl==12.2.140=0
- cuda-runtime==12.1.0=0
- cyrus-sasl==2.1.28=h52b45da_1
- dbus==1.13.18=hb2f20db_0
- expat==2.5.0=h6a678d5_0
- ffmpeg==4.3=hf484d3e_0
- fontconfig==2.14.1=h4c34cd2_2
- freetype==2.12.1=h4a9f257_0
- giflib==5.2.1=h5eee18b_3
- glib==2.69.1=he621ea3_2
- gmp==6.2.1=h295c915_3
- gmpy2==2.1.2=py39heeb90bb_0
- gnutls==3.6.15=he1e5248_0
- gst-plugins-base==1.14.1=h6a678d5_1
- gstreamer==1.14.1=h5eee18b_1
- icu==58.2=hf484d3e_1000
- idna==3.4=py39h06a4308_0
- intel-openmp==2023.1.0=hdb19cb5_46305
- jinja2==3.1.2=py39h06a4308_0
- jpeg==9e=h5eee18b_1
- kiwisolver==1.4.4=py39h6a678d5_0
- krb5==1.20.1=h143b758_1
- lame==3.100=h7b6447c_0
- lcms2==2.12=h3be6417_0
- ld_impl_linux-64==2.38=h1181459_1
- lerc==3.0=h295c915_0
- libclang==14.0.6=default_hc6dbbc7_1
- libclang13==14.0.6=default_he11475f_1
- libcublas==12.1.0.26=0
- libcufft==11.0.2.4=0
- libcufile==1.7.2.10=0
- libcups==2.4.2=h2d74bed_1
- libcurand==10.3.3.141=0
- libcusolver==11.4.4.55=0
- libcusparse==12.0.2.55=0
- libdeflate==1.17=h5eee18b_1
- libedit==3.1.20221030=h5eee18b_0
- libevent==2.1.12=hdbd6064_1
- libffi==3.4.4=h6a678d5_0
- libgcc-ng==11.2.0=h1234567_1
- libgfortran-ng==13.2.0=h69a702a_0
- libgfortran5==13.2.0=ha4646dd_0
- libgomp==11.2.0=h1234567_1
- libiconv==1.16=h7f8727e_2
- libidn2==2.3.4=h5eee18b_0
- libjpeg-turbo==2.0.0=h9bf148f_0
- libllvm14==14.0.6=hdb19cb5_3
- libnpp==12.0.2.50=0
- libnvjitlink==12.1.105=0
- libnvjpeg==12.1.1.14=0
- libpng==1.6.39=h5eee18b_0
- libpq==12.15=hdbd6064_1
- libstdcxx-ng==11.2.0=h1234567_1
- libtasn1==4.19.0=h5eee18b_0
- libtiff==4.5.1=h6a678d5_0
- libunistring==0.9.10=h27cfd23_0
- libuuid==1.41.5=h5eee18b_0
- libwebp==1.3.2=h11a3e52_0
- libwebp-base==1.3.2=h5eee18b_0
- libxcb==1.15=h7f8727e_0
- libxkbcommon==1.0.1=h5eee18b_1
- libxml2==2.10.4=hcbfbd50_0
- libxslt==1.1.37=h2085143_0
- llvm-openmp==14.0.6=h9e868ea_0
- lz4-c==1.9.4=h6a678d5_0
- matplotlib==3.7.2=py39h06a4308_0
- matplotlib-base==3.7.2=py39h1128e8f_0
- mkl==2023.1.0=h213fc3f_46343
- mkl-service==2.4.0=py39h5eee18b_1
- mkl_fft==1.3.8=py39h5eee18b_0
- mkl_random==1.2.4=py39hdb19cb5_0
- mpc==1.1.0=h10f8cd9_1
- mpfr==4.0.2=hb69a4c5_1
- mpmath==1.3.0=py39h06a4308_0
- mysql==5.7.24=h721c034_2
- ncurses==6.4=h6a678d5_0
- nettle==3.7.3=hbbd107a_1
- networkx==3.1=py39h06a4308_0
- numexpr==2.8.7=py39h85018f9_0
- numpy==1.26.0=py39h5f9d8c6_0
- numpy-base==1.26.0=py39hb5e798b_0
- openh264==2.1.1=h4ff587b_0
- openjpeg==2.4.0=h3ad879b_0
- openssl==3.0.11=h7f8727e_2
- pandas==2.1.1=py39h1128e8f_0
- pcre==8.45=h9c3ff4c_0
- pillow==10.0.1=py39ha6cbd5a_0
- pip==23.2.1=py39h06a4308_0
- pyopenssl==23.2.0=py39h06a4308_0
- pyqt==5.15.7=py39h6a678d5_1
- pyqt5-sip==12.11.0=py39h6a678d5_1
- pysocks==1.7.1=py39h06a4308_0
- python==3.9.18=h955ad1f_0
- pytorch==2.1.0=py3.9_cuda12.1_cudnn8.9.2_0
- pytorch-cuda==12.1=ha16c6d3_5
- qt-main==5.15.2=h7358343_9
- qt-webengine==5.15.9=hbbf29b9_6
- qtwebkit==5.212=h3fafdc1_5
- readline==8.2=h5eee18b_0
- requests==2.31.0=py39h06a4308_0
- scipy==1.11.3=py39h5f9d8c6_0
- setuptools==68.0.0=py39h06a4308_0
- sip==6.6.2=py39h6a678d5_0
- sqlite==3.41.2=h5eee18b_0
- statsmodels==0.14.0=py39ha9d4c09_0
- tbb==2021.8.0=hdb19cb5_0
- tk==8.6.12=h1ccaba5_0
- torchaudio==2.1.0=py39_cu121
- torchtriton==2.1.0=py39
- torchvision==0.16.0=py39_cu121
- tornado==6.1=py39hb9d737c_3
- typing_extensions==4.7.1=py39h06a4308_0
- wheel==0.41.2=py39h06a4308_0
- xz==5.4.2=h5eee18b_0
- yaml==0.2.5=h7b6447c_0
- zlib==1.2.13=h5eee18b_0
- zstd==1.5.5=hc292b87_0
Current channels:
- https://conda.anaconda.org/pytorch/osx-arm64
- https://conda.anaconda.org/nvidia/osx-arm64
- https://conda.anaconda.org/conda-forge/osx-arm64
- https://repo.anaconda.com/pkgs/main/osx-arm64
- https://repo.anaconda.com/pkgs/r/osx-arm64
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
Running locally on an NVIDIA 3080 Max-Q with 16 GB VRAM
python generate.py --name rotate_cw.village.horse --prompts "a snowy mountain village" "a horse" --style "an oil painting of" --views identity rotate_cw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
runs as expected but
python animate.py --im_path results/rotate_cw.village.horse/0000/sample_256.png --metadata_path results/rotate_cw.village.horse/metadata.pkl
fails with the error
100%|████████████████████████████████████████████████████████████████████████| 45/45 [00:00<00:00, 352.56it/s]
Making video...
Traceback (most recent call last):
File "/home/dan/anagram/visual_anagrams/animate.py", line 169, in
animate_two_view(
File "/home/dan/anagram/visual_anagrams/animate.py", line 123, in animate_two_view
imageio.mimsave(save_video_path, image_array, fps=30)
File "/home/dan/miniconda3/envs/visual_anagrams/lib/python3.9/site-packages/imageio/v2.py", line 494, in mimwrite
with imopen(uri, "wI", **imopen_args) as file:
File "/home/dan/miniconda3/envs/visual_anagrams/lib/python3.9/site-packages/imageio/core/imopen.py", line 281, in imopen
raise err_type(err_msg)
ValueError: Could not find a backend to openresults/rotate_cw.village.horse/0000/sample_256.mp4`` with iomode
wI`.
Based on the extension, the following plugins might add capable backends:
FFMPEG: pip install imageio[ffmpeg]
pyav: pip install imageio[pyav]
despite having imageio with the ffmpeg and pyav plugins installed in the conda environment.
When running the cell:
image_64 = sample_stage_1(stage_1, prompt_embeds, negative_prompt_embeds, views, num_inference_steps=30, guidance_scale=10.0, reduction='mean', generator=None) mp.show_images([im_to_np(view.view(image_64[0])) for view in views])
There is an error:
NameError Traceback (most recent call last)
in <cell line: 1>()
----> 1 image_64 = sample_stage_1(stage_1,
2 prompt_embeds,
3 negative_prompt_embeds,
4 views,
5 num_inference_steps=30,
NameError: name 'stage_1' is not defined
When installing this repository with
conda env create -f environment.yml
I get the error
ERROR: Could not find a version that satisfies the requirement clip==1.0 (from versions: 0.0.1, 0.1.0, 0.2.0)
When removing the line - clip==1.0
from environment.yml
, it works fine.
For pip, by default, clip
refers to an old clipboard manager: https://pypi.org/project/clip/
If you mean OpenAI CLIP, maybe the line should be changed to reference git+https://github.com/openai/CLIP.git
.
However, because it doesn't seem to be needed, I would suggest removing this line.
Hi! Firstly, many thanks for sharing this project - it's fascinating! :)
I'm trying to understand how/if negative prompts can be added to better guide the generation of each view... but I'm having some issues with my understanding. As far as I can see, the generate.py script generates a list of matching positive and negative prompt embeddings from the supplied command line prompts:
prompts = [f'{args.style} {p}'.strip() for p in args.prompts]
prompt_embeds = [stage_1.encode_prompt(p) for p in prompts]
prompt_embeds, negative_prompt_embeds = zip(*prompt_embeds)
prompt_embeds = torch.cat(prompt_embeds)
negative_prompt_embeds = torch.cat(negative_prompt_embeds) # These are just null embeds
(
Lines 50 to 54 in 491b76b
The final comment suggests that the negative prompt embedding is null, but the diffusers library states:
negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument.
So... should I be providing negative_prompt input to sample_stage_1 instead? Any advice, or an example of how to add negative prompts would be gratefully appreciated!
Thanks for a very interesting paper.
When do you expect to release the code for Facterized Diffusion?
I am very much looking forward to it!
256px seems to be the maximum size of an image. Is there any argument that we can pass for higher resolution output?
Interesting work!
Is there any plan to release the training dataset and code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.