This repository is an experimental implementation of Visual Style Prompting. This is an unofficial implementation.
This method seems to be able to extract and reflect the style of an image by swapping the key and value of the self-attention with the key and value of the reference image after the 24th layer of UpBlock in Unet. From my experiments, it seems to be able to reflect some styles, but it may reflect excessive color schemes or broken details. It is possible that I may have skipped over something in the paper's implementation, so this is just an experimental implementation.
Reference
Generated
a cat sitting in a city
Python 3.10.9 CUDA 12.2
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Command
python inference_sdxl.py --guidance_scale 7.0 --num_inference_steps 50 --reference_image sample/ref2.png --prompt "low-poly stile cat, low-poly game art, polygon mesh, jagged blocky, wireframe edges, cnetered composition, simple" --resolution 768 --num_samples 5
Reference
Generated(cat)
Generated(motorcycle)
Reference
Generated(cat)
Generated(motorcycle)