A hot-pluggable tool for visualizing LLaVA's attention.
The attention on the image of the first 5 words which are "The image features a woman".
- Install LLaVA from Link.
- Put the attention.py into:
LLaVA-main/llava/eval
- Run by this command:
cd LLaVA-main
python -m llava.eval.attention \
--checkpoint path/to/llava/checkpoint \
--image path/to/image \
--layer 32 \
--output path/to/output/result \
--max-length 64
https://github.com/junyangwang0410/Attention-LLaVA/blob/main/Mobile-Agent.mp4