Comments (20)
Thanks for your interest and suggestion. For now, you can try open-set recognition by loading the pretrained model and custom configuring your dataset, including the video and candidate categories. Also, we will provide a simple inference code later.
from videox.
Hi,
I've got something which will make your life a bit easier ;) see the notebook at #61 (comment)
from videox.
Presumably all text provided to the model in this way is also going through the trained video-specific prompt generator before being scored against the video?
Yes, that's correct.
For example, in this case I would expect models fully-supervised on Kinetics-400 to expect only succinct text labels as text input and longer more descriptive captions would be outside of the training distribution.
Yes, but note that the authors of X-CLIP started from the weights of OpenAI's CLIP model, which has seen 400 million (image, text) pairs. This allows the model to also work on longer text descriptions. It's basically a sort of fine-tuning of CLIP.
from videox.
Thanks for your interest and suggestion. For now, you can try open-set recognition by loading the pretrained model and custom configuring your dataset, including the video and candidate categories. Also, we will provide a simple inference code later.
Hi, for open set video recognition , should i load the "Zero-shot", "Few-shot" or "Fully supervised" mode ?
from videox.
Thanks for your interest and suggestion. For now, you can try open-set recognition by loading the pretrained model and custom configuring your dataset, including the video and candidate categories. Also, we will provide a simple inference code later.
Hi, for open set video recognition , should i load the "Zero-shot", "Few-shot" or "Fully supervised" mode ?
@fixedwater Hi, you should load the zero-shot pretrained model.
from videox.
thx!, i got a task to label my own dataset with predefined categories (like 800 categories). Therefore it might be necessary to train on my own dataset for transferring. Is it enough to follow the guide of
on your readme.md, or anything to change with?
from videox.
thx!, i got a task to label my own dataset with predefined categories (like 800 categories). Therefore it might be necessary to train on my own dataset for transferring. Is it enough to follow the guide of on your readme.md, or anything to change with?
According to my understanding, you can follow the README.md to prepare your dataset and train the model : )
from videox.
thx!, i got a task to label my own dataset with predefined categories (like 800 categories). Therefore it might be necessary to train on my own dataset for transferring. Is it enough to follow the guide of on your readme.md, or anything to change with?
According to my understanding, you can follow the README.md to prepare your dataset and train the model : )
perfect! thanks for your amazing project
from videox.
Thanks @NielsRogge - the colab notebook in that comment is super helpful!
I wanted to check if anyone could verify something about this result from the xclip processor
.
inputs = processor(text=["playing sports", "eating spaghetti", "go shopping"], videos=list(video), return_tensors="pt", padding=True)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits_per_video.softmax(dim=1)
Presumably all text provided to the model in this way is also going through the trained video-specific prompt generator before being scored against the video?
Just trying to make sure I understand the sematics correctly of these pre-trained models. For example, in this case I would expect models fully-supervised on Kinetics-400 to expect only succinct text labels as text input and longer more descriptive captions would be outside of the training distribution.
from videox.
Hi,
I've got something which will make your life a bit easier ;) see the notebook at #61 (comment)
Hi @NielsRogge , thanks for the notebook.
I was trying to play around with it and noticed I cannot pass more than 8 frames to the model
def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
converted_len = int(clip_len * frame_sample_rate)
end_idx = np.random.randint(converted_len, seg_len)
start_idx = end_idx - converted_len
indices = np.linspace(start_idx, end_idx, num=clip_len)
indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
return indices
vr = VideoReader(file_path, num_threads=1, ctx=cpu(0))
# sample 16 frames
vr.seek(0)
indices = sample_frame_indices(clip_len=17, frame_sample_rate=1, seg_len=len(vr))
video = vr.get_batch(indices).asnumpy()
print(video.shape)
>>> (17, 360, 640, 3)
and pass the numpy arrray to
model_name = "microsoft/xclip-base-patch32"
processor = XCLIPProcessor.from_pretrained(model_name)
model = XCLIPModel.from_pretrained(model_name)
inputs = processor(text=["reading", "writing", "inspecting"], videos=list(video), return_tensors="pt", padding=True)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits_per_video.softmax(dim=1)
probs
I get RuntimeError: shape '[2, 8, 768]' is invalid for input of size 13056
from videox.
Hi,
It depends on which model you're using. If you use https://huggingface.co/microsoft/xclip-base-patch16-zero-shot, then the number of frames should be 32 (as this model was trained on 32 frames per video as seen here).
You can also check this as follows:
from transformers import XCLIPModel
model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32"")
print(model.config.vision_config.num_frames)
from videox.
Oh..somehow missed that. Thanks for pointing out!
from videox.
gpu is not working, is there any other way?
model = XCLIPModel.from_pretrained(model_name).to('cuda:0')
from videox.
What's the error message you're getting?
from videox.
Thanks for your interest and suggestion. For now, you can try open-set recognition by loading the pretrained model and custom configuring your dataset, including the video and candidate categories. Also, we will provide a simple inference code later.
excuse me , is your simple inference code ready? :)
from videox.
What's the error message you're getting?
i got error message -> segmentation fault (core dumped)
This part doesn't work -> model = model.to(device)
I want to use gpu.
It's working fine with the cpu
np.random.seed(0)
def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
converted_len = int(clip_len * frame_sample_rate)
end_idx = np.random.randint(converted_len, seg_len)
start_idx = end_idx - converted_len
indices = np.linspace(start_idx, end_idx, num=clip_len)
indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
return indices
vr = VideoReader(file_path, num_threads=1, ctx=cpu(0))
# sample 16 frames
vr.seek(0)
indices = sample_frame_indices(clip_len=8, frame_sample_rate=1, seg_len=len(vr))
video = vr.get_batch(indices).asnumpy()
device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)
#model_name = "microsoft/xclip-base-patch32-16-frames"
#model_name = "microsoft/xclip-base-patch32"
#model_name = "microsoft/xclip-base-patch16-kinetics-600"
model_name = "microsoft/xclip-large-patch14-kinetics-600"
model = XCLIPModel.from_pretrained(model_name)
model = model.to(device)
print("model load")
processor = XCLIPProcessor.from_pretrained(model_name)
inputs = processor(text=k600_names, videos=list(video), return_tensors="pt", padding=True)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits_per_video.softmax(dim=1)
np.set_printoptions(suppress=True)
result = dict(zip(k600_names, probs[0].numpy().tolist()))
res_topk5 = sorted(result.items(), key = lambda item: item[1], reverse = True)[:5]
for i in res_topk5:
print(i)
from videox.
Hi,
PyTorch places your model on the device in-place, no need to do model = model.to(device), just model.to(device)
is enough.
from videox.
Hi,
PyTorch places your model on the device in-place, no need to do model = model.to(device), just
model.to(device)
is enough.
Hi.
i used torch 1.11+cuda 10.2 version
I modified the code, but the same error message comes out.
Is there any other reason?
model = XCLIPModel.from_pretrained(model_name)
model.to(device)
print("model load")
processor = XCLIPProcessor.from_pretrained(model_name)
inputs = processor(text=k600_names, videos=list(video), return_tensors="pt", padding=True)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
probs = outputs.logits_per_video.softmax(dim=1)
from videox.
What's the error message you're getting?
i got error message -> segmentation fault (core dumped)
This part doesn't work -> model = model.to(device)
I want to use gpu.
It's working fine with the cpu
np.random.seed(0) def sample_frame_indices(clip_len, frame_sample_rate, seg_len): converted_len = int(clip_len * frame_sample_rate) end_idx = np.random.randint(converted_len, seg_len) start_idx = end_idx - converted_len indices = np.linspace(start_idx, end_idx, num=clip_len) indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64) return indices vr = VideoReader(file_path, num_threads=1, ctx=cpu(0)) # sample 16 frames vr.seek(0) indices = sample_frame_indices(clip_len=8, frame_sample_rate=1, seg_len=len(vr)) video = vr.get_batch(indices).asnumpy() device = "cuda:0" if torch.cuda.is_available() else "cpu" print(device) #model_name = "microsoft/xclip-base-patch32-16-frames" #model_name = "microsoft/xclip-base-patch32" #model_name = "microsoft/xclip-base-patch16-kinetics-600" model_name = "microsoft/xclip-large-patch14-kinetics-600" model = XCLIPModel.from_pretrained(model_name) model = model.to(device) print("model load") processor = XCLIPProcessor.from_pretrained(model_name) inputs = processor(text=k600_names, videos=list(video), return_tensors="pt", padding=True) # forward pass with torch.no_grad(): outputs = model(**inputs) probs = outputs.logits_per_video.softmax(dim=1) np.set_printoptions(suppress=True) result = dict(zip(k600_names, probs[0].numpy().tolist())) res_topk5 = sorted(result.items(), key = lambda item: item[1], reverse = True)[:5] for i in res_topk5: print(i)
@NielsRogge thank you
i just downgrade my torch version 1.11.0 -> 1.8.0
working fine gpu inference :)
from videox.
Dear author:
Thanks for your promising work. We have followed your code conducted on zero-shot of UCF-101, the test set has only one category, however, as training progresses, the test performance gradually decreases from the 1st epoch (The attached is our training log). We want you to seek help. Thank you!
log_rank0.txt
from videox.
Related Issues (20)
- [SeqTrack] The performance on NFS are lag then paper report HOT 3
- [sqetrack]The target has disappeared, but the model still has predicted results. How to solve it? HOT 1
- [seqtrack]The number of template? HOT 2
- 请问一下这套框架下如何测试NFS?
- 我自己训练的时候,iou一直是nan,正常吗?
- 请问batchsize设置成40 训练多少个epoch比较合适呢?
- [X-CLIP] SampleFrames possible wrong arguments?
- [SeqTrack] vot-rank diagram
- [SeqTrack]seed HOT 1
- [X-CLIP] Some errors related about cuda during runtime
- Draw Figure
- 关于模型参数量问题
- [SeqTrack] How does the SeqTrack model determine when targets disappear?
- [SeqTrack] High memory usage
- VOT很差的性能 Poor performance of VOT
- 在线模板更新代码
- 【SeqTrack】如何测试自己的视频数据?
- Implementation method of heatmap visualization (as shown in Figure 6 of the ARTrack paper
- evaluation issues HOT 1
- Confusion about zero-shot
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from videox.