GithubHelp home page GithubHelp logo

s9roll7 / animatediff-cli-prompt-travel Goto Github PK

View Code? Open in Web Editor NEW

This project forked from neggles/animatediff-cli

1.2K 20.0 105.0 133.94 MB

animatediff prompt travel

License: Apache License 2.0

Shell 0.27% Python 99.70% Batchfile 0.03%

animatediff-cli-prompt-travel's Introduction

AnimateDiff prompt travel

AnimateDiff with prompt travel + ControlNet + IP-Adapter

I added a experimental feature to animatediff-cli to change the prompt in the middle of the frame.

It seems to work surprisingly well!

Example

  • context_schedule "composite"
  • pros : more stable animation
  • cons : ignore prompts that require compositional changes
  • "uniform"(default) / "composite"
composite_sample1.mp4

  • controlnet for region
  • controlnet_openpose for fg
  • controlnet_tile(0.7) for bg
controlnet_for_region_sample1.mp4

  • added new controlnet animatediff-controlnet
  • It works like ip2p and is very useful for replacing characters
  • (This sample is generated at high resolution using the gradual latent hires fix)
  • more example here
sample_animatediff_controlnet1.mp4

  • gradual latent hires fix
  • sd15 512x856 / sd15 768x1280 / sd15 768x1280 with gradual latent hires fix
  • more example here
gradual_latent_hires_fix_sample.mp4

sdxl_turbo_sample.mp4


Click here to see old samples.



Installation(for windows)

Same as the original animatediff-cli
Python 3.10 and git client must be installed

git clone https://github.com/s9roll7/animatediff-cli-prompt-travel.git
cd animatediff-cli-prompt-travel
py -3.10 -m venv venv
venv\Scripts\activate.bat
set PYTHONUTF8=1
python -m pip install --upgrade pip
# Torch installation must be modified to suit the environment. (https://pytorch.org/get-started/locally/)
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -m pip install -e .

# If you want to use the 'stylize' command, you will also need
python -m pip install -e .[stylize]

# If you want to use use dwpose as a preprocessor for controlnet_openpose, you will also need
python -m pip install -e .[dwpose]
# (DWPose is a more powerful version of Openpose)

# If you want to use the 'stylize create-mask' and 'stylize composite' command, you will also need
python -m pip install -e .[stylize_mask]

(https://www.reddit.com/r/StableDiffusion/comments/157c0wl/working_animatediff_cli_windows_install/)

I found a detailed tutorial
(https://www.reddit.com/r/StableDiffusion/comments/16vlk9j/guide_to_creating_videos_with/)
(https://www.youtube.com/watch?v=7_hh3wOD81s)

How To Use

Almost same as the original animatediff-cli, but with a slight change in config format.

{
  "name": "sample",
  "path": "share/Stable-diffusion/mistoonAnime_v20.safetensors",  # Specify Checkpoint as a path relative to /animatediff-cli/data
  "lcm_map":{     # lcm-lora
    "enable":false,
    "start_scale":0.15,
    "end_scale":0.75,
    "gradient_start":0.2,
    "gradient_end":0.75
  },
  "gradual_latent_hires_fix_map":{ # gradual latent hires fix
    # This is an option to address the problem of chaos being generated when the model is generated beyond its proper size.
    # It also has the effect of increasing generation speed.
    "enable": false,    # enable/disable
    "scale": {    # "DENOISE PROGRESS" : LATENT SCALE format
      # In this example, Up to 70% of the total denoise, latent is halved to the specified size.
      # From 70% to the end, calculate the size as specified.
      "0": 0.5,
      "0.7": 1.0
    },
    "reverse_steps": 5,          # Number of reversal steps at latent size switching timing
    "noise_add_count":3          # Additive amount of noise at latent size switching timing
  },
  "vae_path":"share/VAE/vae-ft-mse-840000-ema-pruned.ckpt",       # Specify vae as a path relative to /animatediff-cli/data
  "motion_module": "models/motion-module/mm_sd_v14.ckpt",         # Specify motion module as a path relative to /animatediff-cli/data
  "context_schedule":"uniform",          # "uniform" or "composite"
  "compile": false,
  "seed": [
    341774366206100,-1,-1         # -1 means random. If "--repeats 3" is specified in this setting, The first will be 341774366206100, the second and third will be random.
  ],
  "scheduler": "ddim",      # "ddim","euler","euler_a","k_dpmpp_2m", etc...
  "steps": 40,
  "guidance_scale": 20,     # cfg scale
  "clip_skip": 2,
  "prompt_fixed_ratio": 0.5,
  "head_prompt": "masterpiece, best quality, a beautiful and detailed portriat of muffet, monster girl,((purple body:1.3)),humanoid, arachnid, anthro,((fangs)),pigtails,hair bows,5 eyes,spider girl,6 arms,solo",
  "prompt_map": {           # "FRAME" : "PROMPT" format / ex. prompt for frame 32 is "head_prompt" + prompt_map["32"] + "tail_prompt"
    "0":  "smile standing,((spider webs:1.0))",
    "32":  "(((walking))),((spider webs:1.0))",
    "64":  "(((running))),((spider webs:2.0)),wide angle lens, fish eye effect",
    "96":  "(((sitting))),((spider webs:1.0))"
  },
  "tail_prompt": "clothed, open mouth, awesome and detailed background, holding teapot, holding teacup, 6 hands,detailed hands,storefront that sells pastries and tea,bloomers,(red and black clothing),inside,pouring into teacup,muffetwear",
  "n_prompt": [
    "(worst quality, low quality:1.4),nudity,simple background,border,mouth closed,text, patreon,bed,bedroom,white background,((monochrome)),sketch,(pink body:1.4),7 arms,8 arms,4 arms"
  ],
  "lora_map": {             # "PATH_TO_LORA" : STRENGTH format
    "share/Lora/muffet_v2.safetensors" : 1.0,                     # Specify lora as a path relative to /animatediff-cli/data
    "share/Lora/add_detail.safetensors" : 1.0                     # Lora support is limited. Not all formats can be used!!!
  },
  "motion_lora_map": {      # "PATH_TO_LORA" : STRENGTH format
    "models/motion_lora/v2_lora_RollingAnticlockwise.ckpt":0.5,   # Currently, the officially distributed lora seems to work only for v2 motion modules (mm_sd_v15_v2.ckpt).
    "models/motion_lora/v2_lora_ZoomIn.ckpt":0.5
  },
  "ip_adapter_map": {       # config for ip-adapter
      # enable/disable (important)
      "enable": true,
      # Specify input image directory relative to /animatediff-cli/data (important! No need to specify frames in the config file. The effect on generation is exactly the same logic as the placement of the prompt)
      "input_image_dir": "ip_adapter_image/test",
      "prompt_fixed_ratio": 0.5,
      # save input image or not
      "save_input_image": true,
      # Ratio of image prompt vs text prompt (important). Even if you want to emphasize only the image prompt in 1.0, do not leave prompt/neg prompt empty, but specify a general text such as "best quality".
      "scale": 0.5,
      # IP-Adapter/IP-Adapter Full Face/IP-Adapter Plus Face/IP-Adapter Plus/IP-Adapter Light (important) It would be a completely different outcome. Not always PLUS a superior result.
      "is_full_face": false,
      "is_plus_face": false,
      "is_plus": true,
      "is_light": false
  },
  "img2img_map": {
      # enable/disable
      "enable": true,
      # Directory where the initial image is placed
      "init_img_dir": "..\\stylize\\2023-10-27T19-43-01-sample-mistoonanime_v20\\00_img2img",
      "save_init_image": true,
      # The smaller the value, the closer the result will be to the initial image.
      "denoising_strength": 0.7
  },
  "region_map": {
      # setting for region 0. You can also add regions if necessary.
      # The region added at the back will be drawn at the front.
      "0": {
          # enable/disable
          "enable": true,
          # If you want to draw a separate object for each region, enter a value of 0.1 or higher.
          "crop_generation_rate": 0.1,
          # Directory where mask images are placed
          "mask_dir": "..\\stylize\\2023-10-27T19-43-01-sample-mistoonanime_v20\\r_fg_00_2023-10-27T19-44-08\\00_mask",
          "save_mask": true,
          # If true, the initial image will be drawn as is (inpaint)
          "is_init_img": false,
          # conditions for region 0
          "condition": {
              # text prompt for region 0
              "prompt_fixed_ratio": 0.5,
              "head_prompt": "",
              "prompt_map": {
                  "0": "(masterpiece, best quality:1.2), solo, 1girl, kusanagi motoko, looking at viewer, jacket, leotard, thighhighs, gloves, cleavage"
               },
              "tail_prompt": "",
              # image prompt(ip adapter) for region 0
              # It is not possible to change lora for each region, but you can do something similar using an ip adapter.
              "ip_adapter_map": {
                  "enable": true,
                  "input_image_dir": "..\\stylize\\2023-10-27T19-43-01-sample-mistoonanime_v20\\r_fg_00_2023-10-27T19-44-08\\00_ipadapter",
                  "prompt_fixed_ratio": 0.5,
                  "save_input_image": true,
                  "resized_to_square": false
              }
          }
      },
      # setting for background
      "background": {
          # If true, the initial image will be drawn as is (inpaint)
          "is_init_img": true,
          "hint": "background's condition refers to the one in root"
      }
  },
  "controlnet_map": {       # config for controlnet(for generation)
    "input_image_dir" : "controlnet_image/test",    # Specify input image directory relative to /animatediff-cli/data (important! Please refer to the directory structure of sample. No need to specify frames in the config file.)
    "max_samples_on_vram" : 200,    # If you specify a large number of images for controlnet and vram will not be enough, reduce this value. 0 means that everything should be placed in cpu.
    "max_models_on_vram" : 3,       # Number of controlnet models to be placed in vram
    "save_detectmap" : true,        # save preprocessed image or not
    "preprocess_on_gpu": true,      # run preprocess on gpu or not (It probably does not affect vram usage at peak, so it should always set true.)
    "is_loop": true,                # Whether controlnet effects consider loop

    "controlnet_tile":{    # config for controlnet_tile
      "enable": true,              # enable/disable (important)
      "use_preprocessor":true,      # Whether to use a preprocessor for each controlnet type
      "preprocessor":{     # If not specified, the default preprocessor is selected.(Most of the time the default should be fine.)
        # none/blur/tile_resample/upernet_seg/ or key in controlnet_aux.processor.MODELS
        # https://github.com/patrickvonplaten/controlnet_aux/blob/2fd027162e7aef8c18d0a9b5a344727d37f4f13d/src/controlnet_aux/processor.py#L20
        "type" : "tile_resample",
        "param":{
          "down_sampling_rate":2.0
        }
      },
      "guess_mode":false,
      # control weight (important)
      "controlnet_conditioning_scale": 1.0,
      # starting control step
      "control_guidance_start": 0.0,
      # ending control step
      "control_guidance_end": 1.0,
      # list of influences on neighboring frames (important)
      # This means that there is an impact of 0.5 on both neighboring frames and 0.4 on the one next to it. Try lengthening, shortening, or changing the values inside.
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1],
      # list of regions where controlnet works
      # In this example, it only affects region "0", but not "background".
      "control_region_list": ["0"]
    },
    "controlnet_ip2p":{
      "enable": true,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1],
      # In this example, all regions are affected
      "control_region_list": []
    },
    "controlnet_lineart_anime":{
      "enable": true,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1],
      # In this example, it only affects region "background", but not "0".
      "control_region_list": ["background"]
    },
    "controlnet_openpose":{
      "enable": true,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1],
      # In this example, all regions are affected (since these are the only two regions defined)
      "control_region_list": ["0", "background"]
    },
    "controlnet_softedge":{
      "enable": true,
      "use_preprocessor":true,
      "preprocessor":{
        "type" : "softedge_pidsafe",
        "param":{
        }
      },
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_ref": {
        "enable": false,            # enable/disable (important)
        "ref_image": "ref_image/ref_sample.png",     # path to reference image.
        "attention_auto_machine_weight": 1.0,
        "gn_auto_machine_weight": 1.0,
        "style_fidelity": 0.5,                # control weight-like parameter(important)
        "reference_attn": true,               # [attn=true , adain=false] means "reference_only"
        "reference_adain": false,
        "scale_pattern":[0.5]                 # Pattern for applying controlnet_ref to frames
    }                                         # ex. [0.5] means [0.5,0.5,0.5,0.5,0.5 .... ]. All frames are affected by 50%
                                              # ex. [1, 0] means [1,0,1,0,1,0,1,0,1,0,1 ....]. Only even frames are affected by 100%.
  },
  "upscale_config": {       # config for tile-upscale
    "scheduler": "ddim",
    "steps": 20,
    "strength": 0.5,
    "guidance_scale": 10,
    "controlnet_tile": {    # config for controlnet tile
      "enable": true,       # enable/disable (important)
      "controlnet_conditioning_scale": 1.0,     # control weight (important)
      "guess_mode": false,
      "control_guidance_start": 0.0,      # starting control step
      "control_guidance_end": 1.0         # ending control step
    },
    "controlnet_line_anime": {  # config for controlnet line anime
      "enable": false,
      "controlnet_conditioning_scale": 1.0,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_ip2p": {  # config for controlnet ip2p
      "enable": false,
      "controlnet_conditioning_scale": 0.5,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_ref": {   # config for controlnet ref
      "enable": false,             # enable/disable (important)
      "use_frame_as_ref_image": false,   # use original frames as ref_image for each upscale (important)
      "use_1st_frame_as_ref_image": false,   # use 1st original frame as ref_image for all upscale (important)
      "ref_image": "ref_image/path_to_your_ref_img.jpg",   # use specified image file as ref_image for all upscale (important)
      "attention_auto_machine_weight": 1.0,
      "gn_auto_machine_weight": 1.0,
      "style_fidelity": 0.25,       # control weight-like parameter(important)
      "reference_attn": true,       # [attn=true , adain=false] means "reference_only"
      "reference_adain": false
    }
  },
  "output":{   # output format 
    "format" : "gif",   # gif/mp4/webm
    "fps" : 8,
    "encode_param":{
      "crf": 10
    }
  }
}
cd animatediff-cli-prompt-travel
venv\Scripts\activate.bat

# with this setup, it took about a minute to generate in my environment(RTX4090). VRAM usage was 6-7 GB
# width 256 / height 384 / length 128 frames / context 16 frames
animatediff generate -c config/prompts/prompt_travel.json -W 256 -H 384 -L 128 -C 16
# 5min / 9-10GB
animatediff generate -c config/prompts/prompt_travel.json -W 512 -H 768 -L 128 -C 16

# upscale using controlnet (tile, line anime, ip2p, ref)
# specify the directory of the frame generated in the above step
# default config path is 'frames_dir/../prompt.json'
# here, width=512 is specified, but even if the original size is 512, it is effective in increasing detail
animatediff tile-upscale PATH_TO_TARGET_FRAME_DIRECTORY -c config/prompts/prompt_travel.json -W 512

# upscale width to 768 (smoother than tile-upscale)
animatediff refine PATH_TO_TARGET_FRAME_DIRECTORY -W 768
# If generation takes an unusually long time, there is not enough vram.
# Give up large size or reduce the size of the context.
animatediff refine PATH_TO_TARGET_FRAME_DIRECTORY -W 1024 -C 6

# change lora and prompt to make minor changes to the video.
animatediff refine PATH_TO_TARGET_FRAME_DIRECTORY -c config/prompts/some_minor_changed.json

Video Stylization

cd animatediff-cli-prompt-travel
venv\Scripts\activate.bat

# If you want to use the 'stylize' command, additional installation required
python -m pip install -e .[stylize]

# create config file from src video
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4

# create config file from src video (img2img)
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4 -i2i

# If you have less than 12GB of vram, specify low vram mode
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4 -lo

# Edit the config file by referring to the hint displayed in the log when the command finishes
# It is recommended to specify a short length for the test run

# generate(test run)
# 16 frames
animatediff stylize generate STYLYZE_DIR -L 16
# 16 frames from the 200th frame
animatediff stylize generate STYLYZE_DIR -L 16 -FO 200

# If generation takes an unusually long time, there is not enough vram.
# Give up large size or reduce the size of the context.

# generate
animatediff stylize generate STYLYZE_DIR

Video Stylization with region

cd animatediff-cli-prompt-travel
venv\Scripts\activate.bat

# If you want to use the 'stylize create-region' command, additional installation required
python -m pip install -e .[stylize_mask]

# [1] create config file from src video
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4
# for img2img
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4 -i2i

# If you have less than 12GB of vram, specify low vram mode
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4 -lo
# in prompt.json (generated in [1])
# [2] write the object you want to mask
# ex.) If you want to mask a person
    "stylize_config": {
        "create_mask": [
            "person"
        ],
        "composite": {
# [3] generate region
animatediff stylize create-region STYLYZE_DIR

# If you have less than 12GB of vram, specify low vram mode
animatediff stylize create-region STYLYZE_DIR -lo

("animatediff stylize create-region -h" for help)
# in prompt.json (generated in [1])
[4] edit region_map,prompt,controlnet setting. Put the image you want to reference in the ip adapter directory (both background and region)
  "region_map": {
      "0": {
          "enable": true,
          "mask_dir": "..\\stylize\\2023-10-27T19-43-01-sample-mistoonanime_v20\\r_fg_00_2023-10-27T19-44-08\\00_mask",
          "save_mask": true,
          "is_init_img": false, # <----------
          "condition": {
              "prompt_fixed_ratio": 0.5,
              "head_prompt": "",  # <----------
              "prompt_map": {  # <----------
                  "0": "(masterpiece, best quality:1.2), solo, 1girl, kusanagi motoko, looking at viewer, jacket, leotard, thighhighs, gloves, cleavage"
               },
              "tail_prompt": "",  # <----------
              "ip_adapter_map": {
                  "enable": true,
                  "input_image_dir": "..\\stylize\\2023-10-27T19-43-01-sample-mistoonanime_v20\\r_fg_00_2023-10-27T19-44-08\\00_ipadapter",
                  "prompt_fixed_ratio": 0.5,
                  "save_input_image": true,
                  "resized_to_square": false
              }
          }
      },
      "background": {
          "is_init_img": false,  # <----------
          "hint": "background's condition refers to the one in root"
      }
  },
# [5] generate
animatediff stylize generate STYLYZE_DIR

Video Stylization with mask

cd animatediff-cli-prompt-travel
venv\Scripts\activate.bat

# If you want to use the 'stylize create-mask' command, additional installation required
python -m pip install -e .[stylize_mask]

# [1] create config file from src video
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4

# If you have less than 12GB of vram, specify low vram mode
animatediff stylize create-config YOUR_SRC_MOVIE_FILE.mp4 -lo
# in prompt.json (generated in [1])
# [2] write the object you want to mask
# ex.) If you want to mask a person
    "stylize_config": {
        "create_mask": [
            "person"
        ],
        "composite": {
# ex.) person, dog, cat
    "stylize_config": {
        "create_mask": [
            "person", "dog", "cat"
        ],
        "composite": {
# ex.) boy, girl
    "stylize_config": {
        "create_mask": [
            "boy", "girl"
        ],
        "composite": {
# [3] generate mask
animatediff stylize create-mask STYLYZE_DIR

# If you have less than 12GB of vram, specify low vram mode
animatediff stylize create-mask STYLYZE_DIR -lo

# The foreground is output to the following directory (FG_STYLYZE_DIR)
# STYLYZE_DIR/fg_00_timestamp_str
# The background is output to the following directory (BG_STYLYZE_DIR)
# STYLYZE_DIR/bg_timestamp_str

("animatediff stylize create-mask -h" for help)

# [4] generate foreground
animatediff stylize generate FG_STYLYZE_DIR

# Same as normal generate.
# The default is controlnet_tile, so if you want to make a big style change,
# such as changing the character, change to openpose, etc.

# Of course, you can also generate the background here.
# in prompt.json (generated in [1])
# [5] composite setup
# enter the directory containing the frames generated in [4] in "fg_list".
# In the "mask_prompt" field, write the object you want to extract from the generated foreground frame.
# If you prepared the mask yourself, specify it in mask_path. If a valid path is set, use it.
# If the shape has not changed when the foreground is generated, FG_STYLYZE_DIR/00_mask can be used
# enter the directory containing the background frames separated in [3] in "bg_frame_dir".
        "composite": {
            "fg_list": [
                {
                    "path": "FG_STYLYZE_DIR/time_stamp_str/00-341774366206100",
                    "mask_path": " absolute path to mask dir (this is optional) ",
                    "mask_prompt": "person"
                },
                {
                    "path": " absolute path to frame dir ",
                    "mask_path": " absolute path to mask dir (this is optional) ",
                    "mask_prompt": "cat"
                }
            ],
            "bg_frame_dir": "BG_STYLYZE_DIR/00_controlnet_image/controlnet_tile",
            "hint": ""
        },
# [6] composite
animatediff stylize composite STYLYZE_DIR

# By default, "sam hq" and "groundingdino" are used for cropping, but it is not always possible to crop the image well.
# In that case, you can try "rembg" or "anime-segmentation".
# However, when using "rembg" and "anime-segmentation", you cannot specify the target text to be clipped.
animatediff stylize composite STYLYZE_DIR -rem
animatediff stylize composite STYLYZE_DIR -anim

# See help for detailed options. (animatediff stylize composite -h)

Auto config generation for Stable-Diffusion-Webui-Civitai-Helper user

# This command parses the *.civitai.info files and automatically generates config files
# See "animatediff civitai2config -h" for details
animatediff civitai2config PATH_TO_YOUR_A111_LORA_DIR

Wildcard

  • you can pick wildcard up at civitai. then, put them in /wildcards.
  • Usage is the same as a1111.( __WILDCARDFILENAME__ format, ex. __animal__ for animal.txt. __background-color__ for background-color.txt.)
  "prompt_map": {           # __WILDCARDFILENAME__
    "0":  "__character-posture__, __character-gesture__, __character-emotion__, masterpiece, best quality, a beautiful and detailed portriat of muffet, monster girl,((purple body:1.3)), __background__",

Recommended setting

  • checkpoint : mistoonAnime_v20 for anime, xxmix9realistic_v40 for photoreal
  • scheduler : "k_dpmpp_sde"
  • upscale : Enable controlnet_tile and controlnet_ip2p only.
  • lora and ip adapter

Recommended settings for 8-12 GB of vram

  • max_samples_on_vram : 0
  • max_models_on_vram : 0
  • Generate at lower resolution and upscale to higher resolution with lower the value of context.
  • In the latest version, the amount of vram used during generation has been reduced.
animatediff generate -c config/prompts/your_config.json -W 384 -H 576 -L 48 -C 16
animatediff tile-upscale output/2023-08-25T20-00-00-sample-mistoonanime_v20/00-341774366206100 -W 512

Limitations

  • lora support is limited. Not all formats can be used!!!
  • It is not possible to specify lora in the prompt.

Related resources






Below is the original readme.


animatediff

pre-commit.ci status

animatediff refactor, because I can. with significantly lower VRAM usage.

Also, infinite generation length support! yay!

LoRA loading is ABSOLUTELY NOT IMPLEMENTED YET!

This can theoretically run on CPU, but it's not recommended. Should work fine on a GPU, nVidia or otherwise, but I haven't tested on non-CUDA hardware. Uses PyTorch 2.0 Scaled-Dot-Product Attention (aka builtin xformers) by default, but you can pass --xformers to force using xformers if you really want.

How To Use

  1. Lie down
  2. Try not to cry
  3. Cry a lot

but for real?

Okay, fine. But it's still a little complicated and there's no webUI yet.

git clone https://github.com/neggles/animatediff-cli
cd animatediff-cli
python3.10 -m venv .venv
source .venv/bin/activate
# install Torch. Use whatever your favourite torch version >= 2.0.0 is, but, good luck on non-nVidia...
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# install the rest of all the things (probably! I may have missed some deps.)
python -m pip install -e '.[dev]'
# you should now be able to
animatediff --help
# There's a nice pretty help screen with a bunch of info that'll print here.

From here you'll need to put whatever checkpoint you want to use into data/models/sd, copy one of the prompt configs in config/prompts, edit it with your choices of prompt and model (model paths in prompt .json files are relative to data/, e.g. models/sd/vanilla.safetensors), and off you go.

Then it's something like (for an 8GB card):

animatediff generate -c 'config/prompts/waifu.json' -W 576 -H 576 -L 128 -C 16

You may have to drop -C down to 8 on cards with less than 8GB VRAM, and you can raise it to 20-24 on cards with more. 24 is max.

N.B. generating 128 frames is slow...

RiFE!

I have added experimental support for rife-ncnn-vulkan using the animatediff rife interpolate command. It has fairly self-explanatory help, and it has been tested on Linux, but I've no idea if it'll work on Windows.

Either way, you'll need ffmpeg installed on your system and present in PATH, and you'll need to download the rife-ncnn-vulkan release for your OS of choice from the GitHub repo (above). Unzip it, and place the extracted folder at data/rife/. You should have a data/rife/rife-ncnn-vulkan executable, or data\rife\rife-ncnn-vulkan.exe on Windows.

You'll also need to reinstall the repo/package with:

python -m pip install -e '.[rife]'

or just install ffmpeg-python manually yourself.

Default is to multiply each frame by 8, turning an 8fps animation into a 64fps one, then encode that to a 60fps WebM. (If you pick GIF mode, it'll be 50fps, because GIFs are cursed and encode frame durations as 1/100ths of a second).

Seems to work pretty well...

TODO:

In no particular order:

  • Infinite generation length support
  • RIFE support for motion interpolation (rife-ncnn-vulkan isn't the greatest implementation)
  • Export RIFE interpolated frames to a video file (webm, mp4, animated webp, hevc mp4, gif, etc.)
  • Generate infinite length animations on a 6-8GB card (at 512x512 with 8-frame context, but hey it'll do)
  • Torch SDP Attention (makes xformers optional)
  • Support for clip_skip in prompt config
  • Experimental support for torch.compile() (upstream Diffusers bugs slow this down a little but it's still zippy)
  • Batch your generations with --repeat! (e.g. --repeat 10 will repeat all your prompts 10 times)
  • Call the animatediff.cli.generate() function from another Python program without reloading the model every time
  • Drag remaining old Diffusers code up to latest (mostly)
  • Add a webUI (maybe, there are people wrapping this already so maybe not?)
  • img2img support (start from an existing image and continue)
  • Stop using custom modules where possible (should be able to use Diffusers for almost all of it)
  • Automatic generate-then-interpolate-with-RIFE mode

Credits:

see guoyww/AnimateDiff (very little of this is my work)

n.b. the copyright notice in COPYING is missing the original authors' names, solely because the original repo (as of this writing) has no name attached to the license. I have, however, used the same license they did (Apache 2.0).

animatediff-cli-prompt-travel's People

Contributors

jcbrouwer avatar neggles avatar pre-commit-ci[bot] avatar s9roll7 avatar skquark avatar threeal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

animatediff-cli-prompt-travel's Issues

Any tips for upscaling?

Whenever I run an uspcale (usually with controlnet_tile) it seems to introduce a lot of jittering and variance frame to frame. Curious if anyone has had success with upscale settings to polish up generations

[Feature Request]Options for advance controlnet settings

The controlnet extension in webui allow use of different control mode, from the repo wiki:
"control_mode" : see the related issue for usage. defaults to 0. Accepted values:

0 or "Balanced" : balanced, no preference between prompt and control model
1 or "My prompt is more important" : the prompt has more impact than the model
2 or "ControlNet is more important" : the controlnet model has more impact than the prompt

This would allow more control over how the controlnet influence the output
The pixel perfect function is also handy as it allow sizing the controlnet picture to the output size of the rendering

New motion lora support?

Curious if anyone's taking a stab at implementing the new motion LoRas for camera movements? It strikes me that this project may handle those movements even better than base repo since you could in theory transition from camera movement to camera movement

[Test] Stylize Video

  1. Tile -> upscale
    motion_module : mm_sd_v14.ckpt
    steps : 20
    guidance_scale : 10
    [0]
    ip_adapter_plus ("is_plus_face": false, "is_plus": true) / scale : 0.5
    controlnet_tile / controlnet_conditioning_scale : 0.75
    size : 512x512
    context : 16
    [1]
    ip_adapter_plus / scale : 0.5
    controlnet_tile / controlnet_conditioning_scale : 1.0
    size : 1024x1024
    context : 8
style_tile_sample.mp4

  1. Lineart -> upscale
    motion_module : mm_sd_v15_v2.ckpt
    steps : 20
    guidance_scale : 10
    [0]
    ip_adapter_plus / scale : 0.5
    controlnet_lineart / controlnet_conditioning_scale : 1.0
    controlnet_ip2p / controlnet_conditioning_scale : 0.5
    size : 512x512
    context : 16
    [1]
    ip_adapter_plus / scale : 0.5
    controlnet_tile / controlnet_conditioning_scale : 1.0
    controlnet_ip2p / controlnet_conditioning_scale : 0.5
    size : 768x768
    context : 8
lineart_style_sample.mp4

  1. Openpose -> upscale
    motion_module : mm_sd_v15_v2.ckpt
    steps : 20
    guidance_scale : 10
    [0]
    ip_adapter_plus / scale : 0.5
    controlnet_openpose / controlnet_conditioning_scale : 1.0
    size : 512x512
    context : 16
    [1]
    ip_adapter_plus / scale : 0.5
    controlnet_tile / controlnet_conditioning_scale : 1.0
    size : 1024x1024
    context : 8
openpose_style_sample.mp4

  1. Softedge -> upscale
    motion_module : mm-Stabilized_high.pth
    steps : 20
    guidance_scale : 10
    [0]
    ip_adapter_plus / scale : 0.5
    controlnet_softedge / controlnet_conditioning_scale : 1.0
    controlnet_ip2p / controlnet_conditioning_scale : 0.5
    size : 512x760
    context : 16
    [1]
    ip_adapter_plus / scale : 0.5
    controlnet_tile / controlnet_conditioning_scale : 1.0
    size : 768x1136
    context : 8
softedge_style_sample.mp4

mm_sd_v15_v2.safetensors does not exist or is not a file!

In generate.py, I am getting this error

Motion module /home/foo/apps/animatediff-cli-prompt-travel/data/models/Motion_Module/mm_sd_v15_v2.safetensors does not exist or is not a file!

Coming from this line of code:

            if not (motion_module.exists() and motion_module.is_file()):
                # this should never happen, but just in case...
                raise FileNotFoundError(f"Motion module {motion_module} does not exist or is not a file!")

Looks like the logic is checking for the file to be safetensors but I have a ckpt file. Thanks

[Test] Guess mode test!

I tested the newly added Guess mode support!

When Guess mode is used, the effect on the background seems to be stronger. I guess more testing is needed.

26.mp4

image

Where should I put the controlnet model

I already have ControlNet 1.1 models in my WebUI project,model extension is .pth, I don't see the settings path in the config, and if I generate,it will download controlnet models, but the extension is .safetensors. how do I use my controlnet files,
image

pad_to_multiple_of

using negative embeds give this in terminal, I'm just ignoring it but
You are resizing the embedding layer without providing apad_to_multiple_of parameter. This means that the new embeding dimension will be 49499. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

Color change problems when using controlnet inpaint

I tested using the inpaint controlnet. It was an attempt to duplicate the input image, not for inpaining. The attempt was some success, but the color was changed. I tried various ways, but I couldn't find the cause of the problem. Is there a way to solve it? Thansk!

image
image

image

Frozen when trying stylize command

Getting stuck after running the command:
animatediff stylize create-config stylize/snaptik.mp4
image

Looks similar to issue #31, however I don't have a '.' in my model name so I believe the path should be valid.

Running a standard txt2vid works fine, just seems to be a problem with stylize (vid2vid).

In the output folder is several controlnet folders with some generated frames (around 400 or so), but the number of files doesn't appear to increase even though Powershell>Python shows about 7% CPU usage and 0% GPU.

'animatediff' is an internal or external command, It is not recognized as an operable program or batch file.

Hi, perhaps this is silly question but please help.
I followed the instructions of installation but this error came out.
Do I have to combine this library with original Animatediff to run?

(venv) D:\animatediff-cli-prompt-travel>animatediff generate -c config/prompts/prompt_travel.json -W 256 -H 384 -L 128 -C 16
'animatediff' is an internal or external command,
It is not recognized as an operable program or batch file.

(venv) D:\animatediff-cli-prompt-travel>animatediff --help
'animatediff' is an internal or external command,
It is not recognized as an operable program or batch file.

First try, AttributeError: 'Attention' object has no attribute 'to_to_k'

Run command: animatediff generate -c config\prompts\prompt_travel.json
In prompt_travel.json I only changed the model and the motion paths. What else do I need to change?
Beyond the "win setup" readme instructions, I had an error at first which said to run pip install mediapipe
I have not copied over any controlnet models and haven't toggled any of the fields to disable, just looking for defaults that work.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\a\animatediff-cli-prompt-travel\src\animatediff\cli.py:322 in generate │
│ │
│ 319 │ global g_pipeline │
│ 320 │ global last_model_path │
│ 321 │ if g_pipeline is None or last_model_path != model_config.path.resolve(): │
│ ❱ 322 │ │ g_pipeline = create_pipeline( │
│ 323 │ │ │ base_model=base_model_path, │
│ 324 │ │ │ model_config=model_config, │
│ 325 │ │ │ infer_config=infer_config, │
│ │
│ C:\a\animatediff-cli-prompt-travel\src\animatediff\generate.py:321 in │
│ create_pipeline │
│ │
│ 318 │ │ logger.info(f"Loading weights from {model_path}") │
│ 319 │ │ if model_path.is_file(): │
│ 320 │ │ │ logger.debug("Loading from single checkpoint file") │
│ ❱ 321 │ │ │ unet_state_dict, tenc_state_dict, vae_state_dict = get_checkpoint_weights(mo │
│ 322 │ │ elif model_path.is_dir(): │
│ 323 │ │ │ logger.debug("Loading from Diffusers model directory") │
│ 324 │ │ │ temp_pipeline = StableDiffusionPipeline.from_pretrained(model_path) │
│ │
│ C:\a\animatediff-cli-prompt-travel\src\animatediff\utils\model.py:73 in │
│ get_checkpoint_weights │
│ │
│ 70 │
│ 71 def get_checkpoint_weights(checkpoint: Path): │
│ 72 │ temp_pipeline: StableDiffusionPipeline │
│ ❱ 73 │ temp_pipeline, _ = checkpoint_to_pipeline(checkpoint, save=False) │
│ 74 │ unet_state_dict = temp_pipeline.unet.state_dict() │
│ 75 │ tenc_state_dict = temp_pipeline.text_encoder.state_dict() │
│ 76 │ vae_state_dict = temp_pipeline.vae.state_dict() │
│ │
│ C:\a\animatediff-cli-prompt-travel\src\animatediff\utils\model.py:58 in │
│ checkpoint_to_pipeline │
│ │
│ 55 │ if target_dir is None: │
│ 56 │ │ target_dir = pipeline_dir.joinpath(checkpoint.stem) │
│ 57 │ │
│ ❱ 58 │ pipeline = StableDiffusionPipeline.from_single_file( │
│ 59 │ │ pretrained_model_link_or_path=str(checkpoint.absolute()), │
│ 60 │ │ local_files_only=True, │
│ 61 │ │ load_safety_checker=False, │
│ │
│ c:\a\animatediff-cli-prompt-travel\venv\lib\site-packages\diffusers\loaders.p │
│ y:1471 in from_single_file │
│ │
│ 1468 │ │ │ │ force_download=force_download, │
│ 1469 │ │ │ ) │
│ 1470 │ │ │
│ ❱ 1471 │ │ pipe = download_from_original_stable_diffusion_ckpt( │
│ 1472 │ │ │ pretrained_model_link_or_path, │
│ 1473 │ │ │ pipeline_class=cls, │
│ 1474 │ │ │ model_type=model_type, │
│ │
│ c:\a\animatediff-cli-prompt-travel\venv\lib\site-packages\diffusers\pipelines │
│ \stable_diffusion\convert_from_ckpt.py:1374 in download_from_original_stable_diffusion_ckpt │
│ │
│ 1371 │ │ │ vae = AutoencoderKL(**vae_config) │
│ 1372 │ │ │
│ 1373 │ │ for param_name, param in converted_vae_checkpoint.items(): │
│ ❱ 1374 │ │ │ set_module_tensor_to_device(vae, param_name, "cpu", value=param) │
│ 1375 │ else: │
│ 1376 │ │ vae = AutoencoderKL.from_pretrained(vae_path) │
│ 1377 │
│ │
│ c:\a\animatediff-cli-prompt-travel\venv\lib\site-packages\accelerate\utils\mo │
│ deling.py:269 in set_module_tensor_to_device │
│ │
│ 266 │ if "." in tensor_name: │
│ 267 │ │ splits = tensor_name.split(".") │
│ 268 │ │ for split in splits[:-1]: │
│ ❱ 269 │ │ │ new_module = getattr(module, split) │
│ 270 │ │ │ if new_module is None: │
│ 271 │ │ │ │ raise ValueError(f"{module} has no attribute {split}.") │
│ 272 │ │ │ module = new_module │
│ │
│ c:\a\animatediff-cli-prompt-travel\venv\lib\site-packages\torch\nn\modules\mo │
│ dule.py:1614 in getattr
│ │
│ 1611 │ │ │ modules = self.dict['_modules'] │
│ 1612 │ │ │ if name in modules: │
│ 1613 │ │ │ │ return modules[name] │
│ ❱ 1614 │ │ raise AttributeError("'{}' object has no attribute '{}'".format( │
│ 1615 │ │ │ type(self).name, name)) │
│ 1616 │ │
│ 1617 │ def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Attention' object has no attribute 'to_to_k'

Some questions about controlNET and LoRAs.

I have 6 punctual questions:

  1. Which LoRAs are compatible and which are incompatible with animatediff-cli-prompt-travel? So far some LoRAs i have tried work great with this.

  2. Do embeddings (textual inversion) work with this?

  3. The main page says i have to change the 999 for another number here, any suggestion which number i should choose?

 "controlnet_map": {
    "input_image_dir" : "controlnet_image/test",
    "max_samples_on_vram": 999,
    "save_detectmap": true,
    "preprocess_on_gpu": true,
  1. How do i make ControlNET work with this? I drag the PNG and rename them JUST like the example says, but they are not used in the generation.

img1

Capture

And they are enabled:

   "controlnet_openpose":{
      "enable": true,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },

    "controlnet_canny": {
      "enable": true,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },

I am putting the PNG on openpose and canny but they are completely ignored, even in the output they do not appear while the examples have a subfolder in the output to show the controlNET used.

  1. How do i "order" the controlnet? I want canny as a background (so the entire animation has a stable background), and open pose as the pose of the subject, i suppose this is done automatically? Or i have to order the "layers" of controlNET?

  2. Any way to remove that "shutterstock" text on the bottom? I tried to put (watermark, text:1.5), logo, text on the negative prompt but nothing, the text is still there.

RuntimeError: CUDA error: invalid configuration argument

【How to solve this problem? Here is the error info】
17:01:40 INFO NumExpr defaulting to 6 threads. utils.py:160
INFO Using generation config: config/prompts/prompt_travel.json cli.py:279
INFO Using base model: runwayml/stable-diffusion-v1-5 cli.py:295
INFO Will save outputs to ./output/2023-09-16T17-01-40-animation-videos-dreamshaper_8 cli.py:303
Preprocessing images (controlnet_openpose) 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/3 [ 0:00:00 < -:--:-- , ? it/s ] INFO Loading openpose_full processor.py:94
Preprocessing images (controlnet_openpose) 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 [ 0:00:08 < 0:00:00 , 1 it/s ]
Saving Preprocessed images (controlnet_openpose) 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/3 [ 0:00:00 < -:--:-- , ? it/s ]
Preprocessing images (controlnet_softedge) 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/3 [ 0:00:00 < -:--:-- , ? it/s ]17:01:48 INFO Loading softedge_pidsafe processor.py:94
Preprocessing images (controlnet_softedge) 33% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 [ 0:00:00 < -:--:-- , ? it/s ]
Saving Preprocessed images (controlnet_softedge) 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/3 [ 0:00:00 < -:--:-- , ? it/s ]
17:01:49 INFO Checking motion module... generate.py:261
INFO Loading tokenizer... generate.py:275
INFO Loading text encoder... generate.py:277
17:01:50 INFO Loading VAE... generate.py:279
INFO Loading UNet... generate.py:281
17:01:59 INFO Loaded 453.20928M-parameter motion module unet.py:578
INFO Using scheduler "ddim" (DDIMScheduler) generate.py:293
INFO Loading weights from /home/pymo/animatediff-cli/data/models/sd/dreamshaper_8.safetensors generate.py:298
17:02:03 INFO Merging weights into UNet... generate.py:315
INFO Enabling xformers memory-efficient attention generate.py:330
17:02:04 INFO Creating AnimationPipeline... generate.py:342
INFO No TI embeddings found ti.py:102
INFO loading c='controlnet_openpose' model generate.py:371
17:02:05 INFO loading c='controlnet_softedge' model generate.py:371
17:02:06 INFO Sending pipeline to device "cuda" pipeline.py:22
INFO Selected data types: unet_dtype=torch.float16, tenc_dtype=torch.float16, vae_dtype=torch.bfloat16 device.py:90
INFO Using channels_last memory format for UNet and VAE device.py:111
17:02:09 INFO Saving prompt config to output directory cli.py:354
INFO Initialization complete! cli.py:362
INFO Generating 1 animations cli.py:363
INFO Running generation 1 of 1 cli.py:373
INFO Generation seed: 341774366206100 cli.py:383
0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/120 [ 0:00:00 < -:--:-- , ? it/s ]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/pymo/animatediff-cli/src/animatediff/cli.py:396 in generate │
│ │
│ 393 │ │ │ │ │ │
│ 394 │ │ │ │ │ prompt_map[int(k)]=pr │
│ 395 │ │ │ │
│ ❱ 396 │ │ │ output = run_inference( │
│ 397 │ │ │ │ pipeline=g_pipeline, │
│ 398 │ │ │ │ prompt="this is dummy string", │
│ 399 │ │ │ │ n_prompt=n_prompt, │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/generate.py:680 in run_inference │
│ │
│ 677 │ │
│ 678 │ seed_everything(seed) │
│ 679 │ │
│ ❱ 680 │ pipeline_output = pipeline( │
│ 681 │ │ prompt=prompt, │
│ 682 │ │ negative_prompt=n_prompt, │
│ 683 │ │ num_inference_steps=steps, │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in │
│ decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/pipelines/animation.py:2348 in call
│ │
│ 2345 │ │ │ │ │ # predict the noise residual │
│ 2346 │ │ │ │ │ │
│ 2347 │ │ │ │ │ stopwatch_record("normal unet start") │
│ ❱ 2348 │ │ │ │ │ pred = self.unet( │
│ 2349 │ │ │ │ │ │ latent_model_input.to(self.unet.device, self.unet.dtype), │
│ 2350 │ │ │ │ │ │ t, │
│ 2351 │ │ │ │ │ │ encoder_hidden_states=cur_prompt, │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/unet.py:427 in forward │
│ │
│ 424 │ │ down_block_res_samples = (sample,) │
│ 425 │ │ for downsample_block in self.down_blocks: │
│ 426 │ │ │ if hasattr(downsample_block, "has_cross_attention") and downsample_block.has │
│ ❱ 427 │ │ │ │ sample, res_samples = downsample_block( │
│ 428 │ │ │ │ │ hidden_states=sample, │
│ 429 │ │ │ │ │ temb=emb, │
│ 430 │ │ │ │ │ encoder_hidden_states=encoder_hidden_states, │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/unet_blocks.py:439 in forward │
│ │
│ 436 │ │ │ │ )[0] │
│ 437 │ │ │ │ # add motion module │
│ 438 │ │ │ │ hidden_states = ( │
│ ❱ 439 │ │ │ │ │ motion_module(hidden_states, temb, encoder_hidden_states=encoder_hid │
│ 440 │ │ │ │ │ if motion_module is not None │
│ 441 │ │ │ │ │ else hidden_states │
│ 442 │ │ │ │ ) │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/motion_module.py:67 in forward │
│ │
│ 64 │ │
│ 65 │ def forward(self, input_tensor, temb, encoder_hidden_states, attention_mask=None, an │
│ 66 │ │ hidden_states = input_tensor │
│ ❱ 67 │ │ hidden_states = self.temporal_transformer(hidden_states, encoder_hidden_states, │
│ 68 │ │ │
│ 69 │ │ output = hidden_states │
│ 70 │ │ return output │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/motion_module.py:148 in forward │
│ │
│ 145 │ │ │
│ 146 │ │ # Transformer Blocks │
│ 147 │ │ for block in self.transformer_blocks: │
│ ❱ 148 │ │ │ hidden_states = block( │
│ 149 │ │ │ │ hidden_states, encoder_hidden_states=encoder_hidden_states, video_length │
│ 150 │ │ │ ) │
│ 151 │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/motion_module.py:218 in forward │
│ │
│ 215 │ │ for attention_block, norm in zip(self.attention_blocks, self.norms): │
│ 216 │ │ │ norm_hidden_states = norm(hidden_states) │
│ 217 │ │ │ hidden_states = ( │
│ ❱ 218 │ │ │ │ attention_block( │
│ 219 │ │ │ │ │ norm_hidden_states, │
│ 220 │ │ │ │ │ encoder_hidden_states=encoder_hidden_states │
│ 221 │ │ │ │ │ if attention_block.is_cross_attention │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/pymo/animatediff-cli/src/animatediff/models/motion_module.py:296 in forward │
│ │
│ 293 │ │ │ raise NotImplementedError │
│ 294 │ │ │
│ 295 │ │ # attention processor makes this easy so that's nice │
│ ❱ 296 │ │ hidden_states = self.processor(self, hidden_states, encoder_hidden_states, atten │
│ 297 │ │ │
│ 298 │ │ if self.attention_mode == "Temporal": │
│ 299 │ │ │ hidden_states = rearrange(hidden_states, "(b d) f c -> (b f) d c", d=d) │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/diffusers/models/attention_p │
│ rocessor.py:1046 in call
│ │
│ 1043 │ │ key = attn.head_to_batch_dim(key).contiguous() │
│ 1044 │ │ value = attn.head_to_batch_dim(value).contiguous() │
│ 1045 │ │ │
│ ❱ 1046 │ │ hidden_states = xformers.ops.memory_efficient_attention( │
│ 1047 │ │ │ query, key, value, attn_bias=attention_mask, op=self.attention_op, scale=att │
│ 1048 │ │ ) │
│ 1049 │ │ hidden_states = hidden_states.to(query.dtype) │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/xformers/ops/fmha/init.p │
│ y:193 in memory_efficient_attention │
│ │
│ 190 │ │ and options. │
│ 191 │ :return: multi-head attention Tensor with shape [B, Mq, H, Kv]
│ 192 │ """ │
│ ❱ 193 │ return _memory_efficient_attention( │
│ 194 │ │ Inputs( │
│ 195 │ │ │ query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=scale │
│ 196 │ │ ), │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/xformers/ops/fmha/init.p │
│ y:291 in _memory_efficient_attention │
│ │
│ 288 ) -> torch.Tensor: │
│ 289 │ # fast-path that doesn't require computing the logsumexp for backward computation │
│ 290 │ if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]): │
│ ❱ 291 │ │ return _memory_efficient_attention_forward( │
│ 292 │ │ │ inp, op=op[0] if op is not None else None │
│ 293 │ │ ) │
│ 294 │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/xformers/ops/fmha/init.p │
│ y:311 in _memory_efficient_attention_forward │
│ │
│ 308 │ else: │
│ 309 │ │ ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op, inp) │
│ 310 │ │
│ ❱ 311 │ out, *
= op.apply(inp, needs_gradient=False) │
│ 312 │ return out.reshape(output_shape) │
│ 313 │
│ 314 │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:2 │
│ 51 in apply │
│ │
│ 248 │ │ │ cu_seqlens_k, │
│ 249 │ │ │ max_seqlen_k, │
│ 250 │ │ ) = _convert_input_format(inp) │
│ ❱ 251 │ │ out, softmax_lse, rng_state = cls.OPERATOR( │
│ 252 │ │ │ inp.query, │
│ 253 │ │ │ inp.key, │
│ 254 │ │ │ inp.value, │
│ │
│ /home/pymo/.local/lib/python3.10/site-packages/torch/_ops.py:502 in call
│ │
│ 499 │ │ # is still callable from JIT │
│ 500 │ │ # We save the function ptr as the op attribute on │
│ 501 │ │ # OpOverloadPacket to access it here. │
│ ❱ 502 │ │ return self._op(*args, **kwargs or {}) │
│ 503 │ │
│ 504 │ # TODO: use this to make a dir
│ 505 │ def overloads(self): │
│ │
│ /home/pymo/miniconda3/envs/animatediff/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:7 │
│ 9 in _flash_fwd │
│ │
│ 76 │ │ │ softmax_lse, │
│ 77 │ │ │ p, │
│ 78 │ │ │ rng_state, │
│ ❱ 79 │ │ ) = _C_flashattention.varlen_fwd( │
│ 80 │ │ │ query, │
│ 81 │ │ │ key, │
│ 82 │ │ │ value, │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

IndexError: list assignment index out of range

13:13:19 INFO     Creating AnimationPipeline...                                                                                                               generate.py:206
         INFO     No TI embeddings found                                                                                                                            ti.py:102
         INFO     Sending pipeline to device "cuda"                                                                                                            pipeline.py:23
         INFO     Selected data types: unet_dtype=torch.float16, tenc_dtype=torch.float16, vae_dtype=torch.bfloat16                                              device.py:90
         INFO     Using channels_last memory format for UNet and VAE                                                                                            device.py:109
         INFO     -> Selected data types: unet_dtype=torch.bfloat16,tenc_dtype=torch.bfloat16,vae_dtype=torch.bfloat16                                         pipeline.py:56
13:13:23 INFO     Saving prompt config to output directory                                                                                                         cli.py:328
         INFO     Initialization complete!                                                                                                                         cli.py:337
         INFO     Generating 1 animations from 1 prompts                                                                                                           cli.py:338
         INFO     Running generation 1 of 1 (prompt 1)                                                                                                             cli.py:347
         INFO     Generation seed: 341774366206100                                                                                                                 cli.py:357
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\cli.py:364   │
│ in generate                                                                                      │
│                                                                                                  │
│   361 │   │   │   │   if int(k) < length:                                                        │
│   362 │   │   │   │   │   prompt_map[int(k)]=model_config.prompt_map[k]                          │
│   363 │   │   │                                                                                  │
│ ❱ 364 │   │   │   output = run_inference(                                                        │
│   365 │   │   │   │   pipeline=pipeline,                                                         │
│   366 │   │   │   │   prompt="this is dummy string",                                             │
│   367 │   │   │   │   n_prompt=n_prompt,                                                         │
│                                                                                                  │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\generate.py: │
│ 401 in run_inference                                                                             │
│                                                                                                  │
│   398 │                                                                                          │
│   399 │   seed_everything(seed)                                                                  │
│   400 │                                                                                          │
│ ❱ 401 │   pipeline_output = pipeline(                                                            │
│   402 │   │   prompt=prompt,                                                                     │
│   403 │   │   negative_prompt=n_prompt,                                                          │
│   404 │   │   num_inference_steps=steps,                                                         │
│                                                                                                  │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\torch │
│ \utils\_contextlib.py:115 in decorate_context                                                    │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\pipelines\an │
│ imation.py:751 in __call__                                                                       │
│                                                                                                  │
│    748 │   │   │   │   │   }                                                                     │
│    749 │   │   │   │   │                                                                         │
│    750 │   │   │   │   │   for f in frames:                                                      │
│ ❱  751 │   │   │   │   │   │   controlnet_affected_list[f] = True                                │
│    752 │   │                                                                                     │
│    753 │   │                                                                                     │
│    754 │   │   def controlnet_is_affected( frame_index:int):                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list assignment index out of range

using the prompt_travel.json

Please merge this fork into the main AnimateDiff-CLI

It would be great if you could submit a PR for the main repo so it could be integrated into the root repo so we can stay put to date with new features and fixes while being able to use your wonderful work. 🙏

Question : Out of memory in colab ?

Hi, I'm running example :

!animatediff generate -c /content/animatediff-cli-prompt-travel/config/prompts/prompt_travel.json -W 256 -H 384 -L 128 -C 16

and it stop by itself, maybe out of memory, the last log :

...
Downloading mm_sd_v15_v2.ckpt: 100% 1.82G/1.82G [00:17<00:00, 102MB/s]
Loading tokenizer...
Loading text encoder...
Loading VAE...
Loading UNet...
Loaded 417.1376M-parameter motion module
Using scheduler "k_dpmpp_sde" (DPMSolverSinglestepScheduler)
Loading weights from /content/animatediff-cli-prompt-travel/data/share/Stable-diffusion/mistoonAnime_v20.safetensors
^C

Is 12GB RAM is not enough ? Tried to set -L to 64 and still same result

TIA

traveling lora weights?

First, thanks for this amazing repo, the potential is amazing already. I find myself using prompts to steer animatediff but using CN to design the frames. with lora, would it be possible to also let lora weights travel? like say I have a inference which has a scene change, I can use prompt travel to change the prompt to match, but being able to change which lora influences those frames would be huge.

ImportError: cannot import name 'maybe_allow_in_graph' from 'diffusers.utils'

While trying to set up the animatediff-cli project following the installation steps, I got an ImportError when running the animatediff --help command.

I did the following:
Cloned the repository using git clone https://github.com/neggles/animatediff-cli
Created a virtual environment using python3.10 -m venv .venv
Activated the virtual environment with source .venv/bin/activate
Installed Torch and other dependencies as per the instructions.
Ran animatediff --help

Received an ImportError with the following traceback:

Traceback (most recent call last):
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/.venv/bin/animatediff", line 7, in <module>
    from animatediff.cli import cli
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/src/animatediff/cli.py", line 12, in <module>
    from animatediff.generate import create_pipeline, run_inference
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/src/animatediff/generate.py", line 13, in <module>
    from animatediff.models.unet import UNet3DConditionModel
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/src/animatediff/models/unet.py", line 18, in <module>
    from .unet_blocks import (
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/src/animatediff/models/unet_blocks.py", line 9, in <module>
    from animatediff.models.attention import Transformer3DModel
  File "/home/user/Deep-Learning/Stable Diffusion/animatediff-cli/src/animatediff/models/attention.py", line 10, in <module>
    from diffusers.utils import BaseOutput, maybe_allow_in_graph
ImportError: cannot import name 'maybe_allow_in_graph' from 'diffusers.utils'

Environment:

OS: Ubuntu
Python Version: 3.10
Torch Version: 2.0.1+cu118

Would appreciate any help, thanks.

An error occurs when creating more than one video using a preprocessor.

An error occurs when creating more than one video using a preprocessor.
I tested 2 types of line art_anime and soft edge. 1 video saves normally, no error occurs, but raising -r to 2 or higher results in an error.
Images processed by the preprocessor are also saved normally.

image
image
image

Preprocessing images (controlnet_lineart_anime) 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/2 [ 0:00:00 < -:--:-- , ? it/s ]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\toyxy\animatediff-cli-prompt-travel\src\animatediff\cli.py:363 in generate │
│ │
│ 360 │ │ │ │ if int(k) < length: │
│ 361 │ │ │ │ │ prompt_map[int(k)]=model_config.prompt_map[k] │
│ 362 │ │ │ │
│ ❱ 363 │ │ │ output = run_inference( │
│ 364 │ │ │ │ pipeline=pipeline, │
│ 365 │ │ │ │ prompt="this is dummy string", │
│ 366 │ │ │ │ n_prompt=n_prompt, │
│ │
│ C:\Users\toyxy\animatediff-cli-prompt-travel\src\animatediff\generate.py:472 in run_inference │
│ │
│ 469 │ │ │ │ │ │ │ if frame_no < duration: │
│ 470 │ │ │ │ │ │ │ │ if frame_no not in controlnet_image_map: │
│ 471 │ │ │ │ │ │ │ │ │ controlnet_image_map[frame_no] = {} │
│ ❱ 472 │ │ │ │ │ │ │ │ controlnet_image_map[frame_no][c] = get_preprocessed_img │
│ 473 │ │ │ │ │ │ │ │ processed = True │
│ 474 │ │ │ │
│ 475 │ │ │ if save_detectmap and processed: │
│ │
│ C:\Users\toyxy\animatediff-cli-prompt-travel\src\animatediff\generate.py:172 in │
│ get_preprocessed_img │
│ │
│ 169 │ if type_str in ( "controlnet_tile", "controlnet_ip2p", "controlnet_inpaint"): │
│ 170 │ │ return img │
│ 171 │ else: │
│ ❱ 172 │ │ return get_preprocessor(type_str, device_str)(img) if use_preprocessor else img │
│ 173 │
│ 174 │
│ 175 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'NoneType' object is not callable

Where is inference images are located?

While producing images with more than one prompt, we cannot see the images until the entire production process is completed. I wonder where the images produced during production are stored, is there a temp file or a place where we can see these intermediate images?

Questions about control_scale_list!

What I understand is as follows. "control_scale_list" is a parameter that adjusts the effect of the input controlnet image frame on each adjacent frame. If the list is [0.5, 0.4], the controlnet image inserted in frame 5 has a scale of 0.5 in frames 4 and 6, and a scale of 0.4 in frames 3 and 7.

And the two parameters are calculated as follows.

control_scale_list * controlnet_conditioning_scale = Result scale

Therefore, if the scale is 0, all scale lists are also 0. Is my understanding correct?

image

And I wonder if the control scale list also affects between the first frame 0 and the last frame. I used controlnet tiles, placed 2 images on frame 0 and 15, then created 16 frames of video at "control_scale_list":[1.0]. In frames 0 and 1, the image entered in 0 was almost duplicated, but in the case of frame 15, the first frame, 0, was clearly influenced, so the front and back views were mixed.

image

Next I changed the controlnet 0 image to 2. Controlnet image 2 affects frames 1, 2, and 3, and controlnet image 15 affects frames 0, 14, and 15. is this intended? If so, I think it would be nice if an option was added to disable the scale list from looping. thanks!

image

Using openpose and tile causes the result color to be strange

I used same sd model
I have 3 key frame in tile:
image
and 3 key frame in openpose:
image
but the result color is strange such like no VAE being used,and background is not black:
image

every frame such like this:
00000027

How can I generate result like tile keyframes
this is my prompt:
my command is animatediff generate -c config/prompts/prompt.json -W 448 -H 640 -L 30 -C 16
prompt.zip

can not run offline?

Every time I run this program it tries to connect to 'raw.githubusercontent.com'.
Don't know if it's a problem with my settings or if it's just the way the program is.
Sometimes connection fails and error printed as "ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url:
/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml (Caused by
NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001F8CD5B4CA0>: Failed to establish a new
connection: [Errno 11004] getaddrinfo failed'))"

Question: Is there a preview mode to review some keyframes?

I really enjoyed make videos on this project, but sometimes I feel a bit lost when trying to preview specific keyframes. For example, when I need to review the frames at 00:05, 00:10, 00:15… those moments are critical for ensuring the quality of the videos.

For now, i must produce and upscaled , retry, retry, retry till some frames looks good.

Hello, what should I do if I encounter this problem?

PS F:\diff\animatediff-cli-prompt-travel-main> animatediff generate -c config/prompts/000sb.json -W 256 -H 384 -L 128 -C 16
Error in sitecustomize; set PYTHONVERBOSE for traceback:
TypeError: expected str, bytes or os.PathLike object, not NoneType
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _run_module_as_main:196 │
│ │
│ in run_code:86 │
│ │
│ in :4 │
│ │
│ 1 # -- coding: utf-8 -- │
│ 2 import re │
│ 3 import sys │
│ ❱ 4 from animatediff.cli import cli │
│ 5 if name == 'main': │
│ 6 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ 7 │ sys.exit(cli()) │
│ │
│ F:\diff\animatediff-cli-prompt-travel-main\src\animatediff\cli.py:15 in │
│ │
│ 12 from rich.logging import RichHandler │
│ 13 │
│ 14 from animatediff import version, console, get_dir │
│ ❱ 15 from animatediff.generate import (controlnet_preprocess, create_pipeline, │
│ 16 │ │ │ │ │ │ │ │ create_us_pipeline, ip_adapter_preprocess, │
│ 17 │ │ │ │ │ │ │ │ load_controlnet_models, run_inference, │
│ 18 │ │ │ │ │ │ │ │ run_upscale, save_output, │
│ │
│ F:\diff\animatediff-cli-prompt-travel-main\src\animatediff\generate.py:21 in │
│ │
│ 18 │ │ │ │ │ StableDiffusionPipeline) │
│ 19 from PIL import Image │
│ 20 from tqdm.rich import tqdm │
│ ❱ 21 from transformers import (AutoImageProcessor, CLIPImageProcessor, │
│ 22 │ │ │ │ │ │ CLIPTextModel, CLIPTokenizer, │
│ 23 │ │ │ │ │ │ UperNetForSemanticSegmentation) │
│ 24 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'UperNetForSemanticSegmentation' from 'transformers' (F:\SD\sd-webui-aki\1sd-webui-aki-v4\py310\lib\site-packages\transformers_init
.py)

Controlnet seg preprocessor detect map result is different from webui extension

The Detect Map image detected by the preprocessor on the controlnet seg differs from that obtained as a result of the webui controlnet extension. The image of webui has a fixed color for each object, but the image of prompt travel is blinking and not fixed.

Webui extension result
ezgif com-gif-maker (1)

Generated AnimateDiff video
01_27748780903300_masterpiece_best-quality_no-humans_blue-sky_white-cloud_outdoors

animatediff-cli-prompt-travel result
ezgif com-gif-maker

Generated AnimateDiff video
00_24834798825700_masterpiece_best-quality_1girl_solo_blonde-hair_blue-eyes

json config
image

[Test] Upscale test!

I tried a few ways to upscale the video generated by AnimateDiff.

Webui ControlNet(tile+lineart+TemproalNet) batch img2img

Flickering is visible throughout the video, just like a normal batch img2img.

5-2x-RIFE-RIFE4.0-48fps.mp4

AnimateDiff ControlNet tile

Much cleaner than TemporalNet! However, due to high VRAM consumption, upscaling beyond 1024x1536 is not possible.

27.mp4

AnimateDiff ControlNet tile - > Webui Adetailer + NMKD YandereNeo (4x)

I upscaled the video to 1024x1536 with AnimateDiff, then used Adetailer to enhance the detail on the character's face, then upscaled it to 4K resolution with NMKD YandereNeo (4x). It became very sharp!

33.mp4

https://github.com/Bing-su/adetailer

https://openmodeldb.info/models/4x-NMKD-YandereNeo

https://huggingface.co/CiaraRowles/TemporalNet/tree/main

[Test] Multi ControlNet tests!

I prepared an image sequence of ControlNet (openpose/line art) of full frame (16). And I used these to generate the video. works very well!

8.mp4
6.mp4

Vae loading error

    ```

INFO Loading vae from generate.py:350
I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\data\data\models\V
ae\kl-f8-anime2.ckpt
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\diffu │
│ sers\configuration_utils.py:348 in load_config │
│ │
│ 345 │ │ else: │
│ 346 │ │ │ try: │
│ 347 │ │ │ │ # Load from URL or cache if already cached │
│ ❱ 348 │ │ │ │ config_file = hf_hub_download( │
│ 349 │ │ │ │ │ pretrained_model_name_or_path, │
│ 350 │ │ │ │ │ filename=cls.config_name, │
│ 351 │ │ │ │ │ cache_dir=cache_dir, │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\huggi │
│ ngface_hub\utils_validators.py:110 in inner_fn │
│ │
│ 107 │ │ │ kwargs.items(), # Kwargs values │
│ 108 │ │ ): │
│ 109 │ │ │ if arg_name in ["repo_id", "from_id", "to_id"]: │
│ ❱ 110 │ │ │ │ validate_repo_id(arg_value) │
│ 111 │ │ │ │
│ 112 │ │ │ elif arg_name == "token" and arg_value is not None: │
│ 113 │ │ │ │ has_token = True │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\huggi │
│ ngface_hub\utils_validators.py:164 in validate_repo_id │
│ │
│ 161 │ │ ) │
│ 162 │ │
│ 163 │ if not REPO_ID_REGEX.match(repo_id): │
│ ❱ 164 │ │ raise HFValidationError( │
│ 165 │ │ │ "Repo id must use alphanumeric chars or '-', '
', '.', '--' and '..' are" │
│ 166 │ │ │ " forbidden, '-' and '.' cannot start or end the name, max length is 96:" │
│ 167 │ │ │ f" '{repo_id}'." │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot
start or end the name, max length is 96:
'I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\data\data\models\Vae\kl-f8-anime2.ckpt'.

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\stylize.py:4 │
│ 01 in generate │
│ │
│ 398 │ │ config_org = tmp_config_path │
│ 399 │ │
│ 400 │ │
│ ❱ 401 │ output_0_dir = generate( │
│ 402 │ │ config_path=config_org, │
│ 403 │ │ width=model_config.stylize_config["0"]["width"], │
│ 404 │ │ height=model_config.stylize_config["0"]["height"], │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\cli.py:323 │
│ in generate │
│ │
│ 320 │ global g_pipeline │
│ 321 │ global last_model_path │
│ 322 │ if g_pipeline is None or last_model_path != model_config.path.resolve(): │
│ ❱ 323 │ │ g_pipeline = create_pipeline( │
│ 324 │ │ │ base_model=base_model_path, │
│ 325 │ │ │ model_config=model_config, │
│ 326 │ │ │ infer_config=infer_config, │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\src\animatediff\generate.py: │
│ 351 in create_pipeline │
│ │
│ 348 │ if model_config.vae_path: │
│ 349 │ │ vae_path = data_dir.joinpath(model_config.vae_path) │
│ 350 │ │ logger.info(f"Loading vae from {vae_path}") │
│ ❱ 351 │ │ vae = AutoencoderKL.from_pretrained(vae_path) │
│ 352 │ │
│ 353 │ │
│ 354 │ # enable xformers if available │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\diffu │
│ sers\models\modeling_utils.py:511 in from_pretrained │
│ │
│ 508 │ │ } │
│ 509 │ │ │
│ 510 │ │ # load config │
│ ❱ 511 │ │ config, unused_kwargs, commit_hash = cls.load_config( │
│ 512 │ │ │ config_path, │
│ 513 │ │ │ cache_dir=cache_dir, │
│ 514 │ │ │ return_unused_kwargs=True, │
│ │
│ I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\venv\lib\site-packages\diffu │
│ sers\configuration_utils.py:384 in load_config │
│ │
│ 381 │ │ │ │ │ f" {pretrained_model_name_or_path}:\n{err}" │
│ 382 │ │ │ │ ) │
│ 383 │ │ │ except ValueError: │
│ ❱ 384 │ │ │ │ raise EnvironmentError( │
│ 385 │ │ │ │ │ f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load │
│ 386 │ │ │ │ │ f" in the cached files and it looks like {pretrained_model_name_or_p │
│ 387 │ │ │ │ │ f" directory containing a {cls.config_name} file.\nCheckout your int │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it
looks like I:\AnimateDiff\animatediff-cli-travel\animatediff-cli-prompt-travel\data\data\models\Vae\kl-f8-anime2.ckpt is
not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at
'https://huggingface.co/docs/diffusers/installation#offline-mode'.

Usable LoRA

Sorry to bother you. Can you provide me some publicly available LoRA model ? I have tried my LoRA model (trained based on mistoonAnime_v20) but it doesn't work.
I'd like to confirm if the problem is with my lora model or if the code needs to be modified.

help!How to stylize video?

Can you please come up with a more detailed tutorial on stylize video? For some reason I can't seem to link huggingface online to download the model. So I would like to know what models need to be downloaded and the path to where they are stored. Also, I'd like to know the steps to stylize video in more detail, the current steps seem a bit obscure. But that doesn't stop it from being a wonderful job. It's just that I'd like to know in more detail. Thanks!

Shape issues with controlnet_shuffle

Hello, I'm having some trouble getting the majority of ControlNets to work correctly.

So far I've tried depth, canny, lineart_anime, tile, and shuffle, but have only been able to generate a video with shuffle (but the results looked pretty weird, at least for the first part of the video).

Result GIF with shuffle

00_17276191161868842010_sci-fi-futuristic-digital-machinery-on-a-white-background_beautiful-cyberpunk-cartoon-illustration

The error I get is a shape mismatch in the ControlNet results:

Traceback
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /HUGE/Code/animatediff-cli-prompt-travel/src/animatediff/cli.py:363 in generate                  │
│                                                                                                  │
│   360 │   │   │   │   if int(k) < length:                                                        │
│   361 │   │   │   │   │   prompt_map[int(k)]=model_config.prompt_map[k]                          │
│   362 │   │   │                                                                                  │
│ ❱ 363 │   │   │   output = run_inference(                                                        │
│   364 │   │   │   │   pipeline=pipeline,                                                         │
│   365 │   │   │   │   prompt="this is dummy string",                                             │
│   366 │   │   │   │   n_prompt=n_prompt,                                                         │
│                                                                                                  │
│ /HUGE/Code/animatediff-cli-prompt-travel/src/animatediff/generate.py:443 in run_inference        │
│                                                                                                  │
│   440 │                                                                                          │
│   441 │   seed_everything(seed)                                                                  │
│   442 │                                                                                          │
│ ❱ 443 │   pipeline_output = pipeline(                                                            │
│   444 │   │   prompt=prompt,                                                                     │
│   445 │   │   negative_prompt=n_prompt,                                                          │
│   446 │   │   num_inference_steps=steps,                                                         │
│                                                                                                  │
│ /home/hans/.conda/envs/hans/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in       │
│ decorate_context                                                                                 │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /HUGE/Code/animatediff-cli-prompt-travel/src/animatediff/pipelines/animation.py:958 in __call__  │
│                                                                                                  │
│    955 │   │   │   │   │                                                                         │
│    956 │   │   │   │   │   cur_prompt = get_current_prompt_embeds(context, latents.shape[2])     │
│    957 │   │   │   │   │                                                                         │
│ ❱  958 │   │   │   │   │   down_block_res_samples,mid_block_res_sample = get_controlnet_result(  │
│    959 │   │   │   │   │                                                                         │
│    960 │   │   │   │   │   # predict the noise residual                                          │
│    961 │   │   │   │   │   pred = self.unet(                                                     │
│                                                                                                  │
│ /HUGE/Code/animatediff-cli-prompt-travel/src/animatediff/pipelines/animation.py:899 in           │
│ get_controlnet_result                                                                            │
│                                                                                                  │
│    896 │   │   │   │   │   │   mod = torch.tensor(scales).to(device, dtype=cur_mid.dtype)        │
│    897 │   │   │   │   │   │                                                                     │
│    898 │   │   │   │   │   │   add = cur_mid * mod[None,None,:,None,None]                        │
│ ❱  899 │   │   │   │   │   │   _mid_block_res_samples[:, :, loc_index, :, :] = _mid_block_res_s  │
│    900 │   │   │   │   │   │                                                                     │
│    901 │   │   │   │   │   │   for ii in range(len(cur_down)):                                   │
│    902 │   │   │   │   │   │   │   add = cur_down[ii] * mod[None,None,:,None,None]               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape mismatch: value tensor of shape [2, 1280, 1, 8, 8] cannot be broadcast to indexing result of shape [2, 1280, 1, 1, 1]

When I take a look at the shapes, all of the ControlNets that don't work are indeed different from shuffle, for example:

1_controlnet_shuffle
cur_down = [torch.Size([2, 320, 1, 1, 1]), torch.Size([2, 320, 1, 1, 1]), torch.Size([2, 320, 1, 1, 1]), torch.Size([2, 320, 1, 1, 1]), torch.Size([2, 640, 1, 1, 1]), torch.Size([2, 640, 1, 1, 1]), torch.Size([2,
640, 1, 1, 1]), torch.Size([2, 1280, 1, 1, 1]), torch.Size([2, 1280, 1, 1, 1]), torch.Size([2, 1280, 1, 1, 1]), torch.Size([2, 1280, 1, 1, 1]), torch.Size([2, 1280, 1, 1, 1])]
cur_mid =  torch.Size([2, 1280, 1, 1, 1])

1_controlnet_tile
cur_down = [torch.Size([2, 320, 1, 64, 64]), torch.Size([2, 320, 1, 64, 64]), torch.Size([2, 320, 1, 64, 64]), torch.Size([2, 320, 1, 32, 32]), torch.Size([2, 640, 1, 32, 32]), torch.Size([2, 640, 1, 32, 32]), 
torch.Size([2, 640, 1, 16, 16]), torch.Size([2, 1280, 1, 16, 16]), torch.Size([2, 1280, 1, 16, 16]), torch.Size([2, 1280, 1, 8, 8]), torch.Size([2, 1280, 1, 8, 8]), torch.Size([2, 1280, 1, 8, 8])]
cur_mid =  torch.Size([2, 1280, 1, 8, 8])

I'm using the exact same images in data/controlnet_image/test/controlnet_tile and data/controlnet_image/test/controlnet_shuffle. They're all 512x512 which is the same resolution that I'm rendering at (I also tried making the ControlNet images 256, but got the exact same result).

The pre-processed images in output/.../00_detectmap look correct for all of the ControlNets, but still only shuffle has the right shape during generation.

Any idea what I might be doing wrong?

My config
{
  "name": "blobcube",
  "path": "models/sd/v1-5-pruned-emaonly.safetensors",
  "motion_module": "models/motion-module/mm_sd_v15.ckpt",
  "compile": false,
  "seed": [
    -1
  ],
  "scheduler": "k_dpmpp_sde",
  "steps": 40,
  "guidance_scale": 20,
  "clip_skip": 2,
  "prompt_map": {
    "0": "sci-fi futuristic digital machinery on a white background, beautiful cyberpunk cartoon illustration"
  },
  "n_prompt": [
    ""
  ],
  "lora_map": {},
  "controlnet_map": {
    "input_image_dir": "controlnet_image/test",
    "max_samples_on_vram": 999,
    "save_detectmap": true,
    "controlnet_shuffle": {
      "enable": true,
      "use_preprocessor": true,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list": []
    },
    "controlnet_tile": {
      "enable": true,
      "use_preprocessor": true,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list": []
    },
    "controlnet_depth": {
      "enable": false,
      "use_preprocessor": true,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list": []
    },
    "controlnet_lineart_anime": {
      "enable": true,
      "use_preprocessor": true,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list": []
    }
  },
  "upscale_config": {
    "scheduler": "k_dpmpp_sde",
    "steps": 20,
    "strength": 0.5,
    "guidance_scale": 10,
    "controlnet_tile": {
      "enable": true,
      "controlnet_conditioning_scale": 1.0,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_ref": {
      "enable": true,
      "use_frame_as_ref_image": false,
      "use_1st_frame_as_ref_image": true,
      "ref_image": "none",
      "attention_auto_machine_weight": 1.0,
      "gn_auto_machine_weight": 1.0,
      "style_fidelity": 0.25,
      "reference_attn": true,
      "reference_adain": false
    }
  }
}

The video program has frozen or become unresponsive

The images can be extracted, but the configuration file cannot be generated, and the program continues to run indefinitely without completing. I have to force close it.
Only the model and prompts were changed in the settings, everything else remained unchanged

1
2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.