Comments (2)
from video-llama.
should i fill this ckpt with pretrained_ckpt or finetune_ckpt?
i fill this two blank with. finetune-ziya13b-zh.pth and finetune_vicuna7b_audiobranch.pth
my eval_config.yaml is following:
model:
arch: video_llama
model_type: pretrain_vicuna
freeze_vit: True
freeze_qformer: True
max_txt_len: 512
end_sym: "###"
low_resource: False
frozen_llama_proj: False
llama_model: "vicuna-13b/"
imagebind_ckpt_path: "imagebind/"
fusion_head_layers: 2
max_frame_pos: 32
fusion_header_type: "seqTransf"
ckpt: "finetune-ziya13b-zh.pth"
ckpt_2: "finetune_vicuna7b_audiobranch.pth"
datasets:
webvid:
vis_processor:
train:
name: "alpro_video_eval"
n_frms: 8
image_size: 224
text_processor:
train:
name: "blip_caption"
run:
task: video_text_pretrain
but these get the wrong of
audio encoder initialized.
Load first Checkpoint: /mnt/dolphinfs/hdd_pool/docker/user/hadoop-search/gongshuai06/mt_bert_docker_row/Big_model/Video_model/Video_LLaMa/pretrained_finetuned_weights/finetune-ziya13b-zh.pth
Traceback (most recent call last):
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-search/ganshu01/projects/Video-LLaMA-main/demo_audiovideo.py", line 66, in
model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-search/ganshu01/projects/Video-LLaMA-main/video_llama/models/video_llama.py", line 598, in from_config
ckpt = torch.load(ckpt_path, map_location="cpu")
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-search/ganshu01/envs/llama-ganshu/lib/python3.9/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-search/ganshu01/envs/llama-ganshu/lib/python3.9/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted
from video-llama.
Related Issues (20)
- change the frames and query_tokens size
- Multiple Video-Text pair Support HOT 1
- The question about llama parameters during pre-training and fine-tuning. HOT 2
- Hugging Face Spaces not working! HOT 1
- Prompt
- How to finetune video-llama using deepspeed?
- Very poor audio understanding HOT 1
- RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
- Dear author, How much time does it cost to train this model? With what type of GPU cards?
- Unable to access LLaMA weights to build Vicuna-7B HOT 1
- inf value occurs during forwarding process when fine-tuning VL branch with LLAVA-150K+MiniGPT4-3.5K+webvid-instruct HOT 1
- example model deployment
- A demo without gradio HOT 1
- multi-cards training
- Frame-aware? HOT 1
- Hugging Face demo runtime error HOT 1
- How to select the video encoder of the chinese version with BiLLA or Ziya ? HOT 2
- Incorrect model inference (what went wrong in my setup)
- What is the input sample of the forward function in videollama HOT 1
- 如何提升下游任务上finetune的效果
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from video-llama.