gsig / actor-observer Goto Github PK
View Code? Open in Web Editor NEWActorObserverNet code in PyTorch from "Actor and Observer: Joint Modeling of First and Third-Person Videos", CVPR 2018
License: GNU General Public License v3.0
ActorObserverNet code in PyTorch from "Actor and Observer: Joint Modeling of First and Third-Person Videos", CVPR 2018
License: GNU General Public License v3.0
The command I used is python third_to_first_person.py.
and then with some problems
cachefile ./caches/third_to_first_person//CharadesEgo_train.pkl
Loading cached result from './caches/third_to_first_person//CharadesEgo_train.pkl'
516965 samples loaded
cachefile ./caches/third_to_first_person//CharadesEgo_val.pkl
Loading cached result from './caches/third_to_first_person//CharadesEgo_val.pkl'
137321 samples loaded
cachefile ./caches/third_to_first_person//CharadesEgo_val_video.pkl
Loading cached result from './caches/third_to_first_person//CharadesEgo_val_video.pkl'
0 samples loaded
Thank you in advance!
Hi,
how to fine-tune the model from the third-view dataset on the first-view dataset?
I tried to directly fine-tune the trained model from https://github.com/gsig/charades-algorithms by using this script https://github.com/gsig/actor-observer/blob/master/exp/baseline_resnet152imagenet.py.
However, the results is only 22, which is the same with the model fine-tuned from ImageNet, which indicates the third-view pre-training has no effect on the first-view performance.
Could you please indicate how to correctly fine-tune the model trained from the third-view dataset?
Thanks.
thank you !
Sorry to disturb you!
Should base_x
be replaced with base_y
?
w_y = self.third_fc(base_x).view(-1) * torch.exp(self.third_scale)
And What does this function do? (in tasks.py
)
def best_one_sec_moment(mat, winsize=6):
thank you!
Hi, thank you for sharing this amazing code.
How many GPUs are you used to train the model, and what type of the GPUs?
Thank you in advance!
Hi Gunnar,
I was just wondering what "ResNet-152 Transfer" stands for.
The papers said "It uses the Charades model to predict the activities in the third person video,
and then uses those labels as supervision for the first-person video."
Does it mean you run the Charades model on CharadesEgo_only3rd videos, then warp the labels to their corresponding 1st person video, and use the labels as supervision to finetune the original Charades model?
Thank you.
Best,
Ziwei
Hi~
In the Charades and CharadesEgo dataset, one video always contains several actions. In the code, you divide the video into several clips according to the start and end time, but I have observed that one frame may belongs to multiple action tags.
In this case, Can the loss function be trained normally?
Hi,
I find that you use version0 under the folder datasets/labels, but when I download the CharadesEgo dataset I get the label of version1. What version do you use to get the result of the paper?
Thanks.
The command I used is python baseline_resnet152charades.py
.
Thank you in advance!
=> using pre-trained model 'resnet152'
loading pretrained-weights from /nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar
Traceback (most recent call last):
File "baseline_resnet152charades.py", line 38, in
main()
File "./main.py", line 60, in main
model, criterion, optimizer = create_model(args)
File "./models/init.py", line 10, in create_model
model = load_architecture(args)
File "./models/utils.py", line 78, in load_architecture
model = generic_load(args.arch, args.pretrained, args.pretrained_weights, args)
File "./models/utils.py", line 61, in generic_load
model = model.dictarch
File "./models/ActorObserverBase.py", line 55, in init
model = load_sub_architecture(args)
File "./models/utils.py", line 73, in load_sub_architecture
model = generic_load(args.subarch, args.pretrained, args.pretrained_subweights, args)
File "./models/utils.py", line 65, in generic_load
chkpoint = torch.load(weights)
File "/home/csdept/anaconda3/envs/py27/lib/python2.7/site-packages/torch/serialization.py", line 301, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.