GithubHelp home page GithubHelp logo

alexandrosstergiou / progressive-action-prediction Goto Github PK

View Code? Open in Web Editor NEW
22.0 22.0 3.0 10.44 MB

[CVPR 2023] Code for action prediction from videos

Home Page: https://alexandrosstergiou.github.io/project_pages/TemPr/index.html

License: MIT License

Python 100.00%
early-action-prediction video-understanding

progressive-action-prediction's Introduction

Hi there Wave Emoji

I am an Assistant Professor at University of Twente's Data Management & Biometrics (DMB) group with research interest in video understanding. Previously, I was a Postdoc at VUB and a Research Associate at the University of Bristol working with Dima Damen on video understanding. I obtained my PhD from Utrecht University where I was lucky to be supervised by Ronald Poppe and Remco C. Veltkamp. My thesis was on human action and interaction recognition in everyday social settings.

Alex Stergiou | webpage Alex Stergiou | google scholar Alex Stergiou | UT webpage Alex Stergiou | Twitter X Alex Stergiou | Linkedin

progressive-action-prediction's People

Contributors

alexandrosstergiou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

progressive-action-prediction's Issues

Are all test splits from UCF-101 are used in evaluation?

Hi, thank you for the great work! I was wondering if you are using all the 3 test splits given in UCF-101 dataset during evaluation. I have read the paper but didn't find it, therefore, it would be great if you could advise us. Thank you!

Performance with full frames.

Hi, could you provide the performance for Epic-Kitchen when there are no missing frames (observation ratio $\rho=1.0$).

Memory Leak

I try to train the model with any batch size and worker but I met memory leak, how can I solve this problem?

this is a question about data

I don't understand why there is a problem with data reading.

C:\Users\YA\anaconda3\envs\pytorch\python.exe E:\yc2\progressive-action-prediction-main\train.py
2023-07-13 12:12:02 DESKTOP-R5U0C5C root[14144] INFO CUDA_VISIBLE_DEVICES set to 0
2023-07-13 12:12:02 DESKTOP-R5U0C5C root[14144] INFO Using pytorch version 1.12.1 (['C:\Users\YA\anaconda3\envs\pytorch\lib\site-packages\torch'])
2023-07-13 12:12:02 DESKTOP-R5U0C5C root[14144] INFO Start training with args:
{
"attn_dropout": 0.0,
"backbone": "MTNet_xs",
"batch_size": 16,
"config": null,
"cross_dim_head": 64,
"cross_heads": 1,
"data_dir": "data/UCF-101",
"dataset": "UCF-101",
"end_epoch": 60,
"ff_dropout": 0.0,
"frame_len": 16,
"frame_size": [
224,
224
],
"gpus": [
0
],
"head": "Tempr_h",
"label_dir": "labels/",
"latent_dim": 512,
"latent_dim_head": 64,
"latent_heads": 8,
"log_file": "logs/video_pred_at-DESKTOP-R5U0C5C_datetime_2023-7-13_with_observation_ratio_None_Tempr_h_MTNet_xs_ada.log",
"long_cycles": false,
"lr_base": 0.01,
"lr_factor": 0.1,
"lr_mult": {
"classifier": 1.0,
"head": 0.1,
"pool": 0.1
},
"lr_steps": [
14,
32,
44
],
"max_freq": 10.0,
"model_dir": "./results\observation_ratio_None\Tempr_h_MTNet_xs_ada",
"num_freq_bands": 10,
"num_latents": 256,
"num_samplers": 3,
"optimiser": "AdamW",
"pool": "ada",
"precision": "fp32",
"pretrained_dir": null,
"print_net": false,
"random_seed": 1,
"results_dir": "./results",
"resume_epoch": 0,
"save_frequency": 60,
"short_cycles": false,
"train_frame_interval": [
1,
2,
3,
4
],
"val_frame_interval": [
1,
2
],
"video_per": null,
"video_per_train": 0.4,
"video_per_val": 0.4,
"weight_decay": 1e-05,
"weight_tie_layers": false,
"workers": 0
}
2023-07-13 12:12:02 DESKTOP-R5U0C5C root[14144] INFO CUDA availability: True
2023-07-13 12:12:02 DESKTOP-R5U0C5C root[14144] INFO Preprocessing:: using default mean & std.

2023-07-13 12:12:11 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: - Found: 9537/9536 videos from csv file
2023-07-13 12:12:11 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: Found dict at: labels/dictionary.json

2023-07-13 12:12:11 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: iterator initialized (phase: 'train', num: 9537)

2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: - Found: 3783/3782 videos from csv file
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: Found dict at: labels/dictionary.json

2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO VideoIter:: iterator initialized (phase: 'val', num: 3783)
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO Optimiser:: - classifier lr is set to 1.0e-2 for 4 params
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO Optimiser:: - head lr is set to 1.0e-3 for 81 params
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO Optimiser:: - pool lr is set to 1.0e-3 for 1 params
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO Optimiser:: - base lr is set to 1.0e-02 for 635 params

2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO IterScheduler:: Each epoch will have 596 iterations based on batch size 16
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO Iter 0: start with learning rate: 1.00000e-02 (next lr step: 8344)
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO LRScheduler: The learning rate will change at steps: [8344, 19072, 26224]
2023-07-13 12:12:13 DESKTOP-R5U0C5C root[14144] INFO No cycles selected
Traceback (most recent call last):
File "E:\yc2\progressive-action-prediction-main\data\video_iterator.py", line 299, in getitem
frames, label, vid_path = self.getitem_array_from_video(index)
File "E:\yc2\progressive-action-prediction-main\data\video_iterator.py", line 279, in getitem_array_from_video
sampled_frames.append(video.extract_frames(indices=sampled_indices).unsqueeze(0))
File "E:\yc2\progressive-action-prediction-main\data\video_iterator.py", line 88, in extract_frames
frames = self.extract_frames_fast(indices)
File "E:\yc2\progressive-action-prediction-main\data\video_iterator.py", line 124, in extract_frames_fast
t, h, w, _ = frames.shape
ValueError: not enough values to unpack (expected 4, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\yc2\progressive-action-prediction-main\train.py", line 581, in
net.fit(train_iter=train_data,
File "E:\yc2\progressive-action-prediction-main\run\model.py", line 576, in fit
data,target = next(train_loader)
File "C:\Users\YA\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 681, in next
data = self._next_data()
File "C:\Users\YA\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\YA\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\YA\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\yc2\progressive-action-prediction-main\data\video_iterator.py", line 317, in getitem
index = random.randrange(d_time % self.len())
File "C:\Users\YA\anaconda3\envs\pytorch\lib\random.py", line 306, in randrange
raise ValueError("empty range for randrange()")
ValueError: empty range for randrange()

About Data processing

Hello, I would like to ask you some questions. In the part of video data processing, I use UCF-101 data set, how to use the script? I didn't understand the official information.The part I don't know how to manipulate is here:
................................................................................................................................................
A custom format is used for the train/val label files of each datasets:
This can be done through the scripts provided in labels
( If I use the ucf101 dataset, how do I generate labels using scripts? What do time_start and time_end mean here?)
We have tested our code over the following datasets:
Conversion of videos to SQLite3

(How to convert a video to SQLite3? Sorry I'm so stupid, but I really want to use your code.)
Instead of extracting video frames stored as image files (//etc.) that dramatically increase the number of inodes use, we use files for each video and store frames as BLOBS..png.jpeg.db
You can use the pypi package or repo to convert video files to SQL:dataset2databse
..............................................................................................................................................................................................................................
I don't know much about the code, and I really hope to get your help. I really appreciate it,Looking forward to your reply! If possible, I hope to have your contact information, wechat or ins, etc

Poor performance of checkpoint models

Hi @alexandrosstergiou,

Thank you so much for this amazing work.

However, when tried doing inference using the pre-trained checkpoints provided, the top-1 or top-5 is very less. Could you please help me in understanding if I am missing something out here?

Thank you so much in advance.

Dataset used is UCF101. Label processing done using scripts in labels folder.

Here is the log of inference for 90% video visibility:
2024-03-05 12:20:51 linux root[24581] INFO VideoIter:: iterator initialized (phase: 'val', num: 914)

2024-03-05 12:21:05 linux root[24581] WARNING Initialiser:: The following keys were missing: []
2024-03-05 12:21:05 linux root[24581] WARNING Initialiser:: The following keys were not expected: []
2024-03-05 12:21:05 linux root[24581] INFO Initialiser:: Only model state resumed from: /home/work/model/early_action_prediction/progressive_action_prediction/UCF/Tempr_h_movinet_ada_obs_09.pth' 2024-03-05 12:21:05 linux root[24581] INFO Running inference 2024-03-05 12:22:30 linux root[24581] INFO Inference: average top-1 acc: 0.00000 average top-5 acc: 0.02037 average loss 21.21562 2024-03-05 12:22:30 linux root[24581] INFO Inference: >> Sampler 0 average top-1 acc: 0.00000 average top-5 acc: 0.00399 average loss 16.58495 2024-03-05 12:22:30 linux root[24581] INFO Inference: >> Sampler 1 average top-1 acc: 0.03367 average top-5 acc: 0.09589 average loss 26.51506 2024-03-05 12:22:30 linux root[24581] INFO Inference: >> Sampler 2 average top-1 acc: 0.00000 average top-5 acc: 0.00347 average loss 24.16939 2024-03-05 12:22:30 linux root[24581] INFO Inference: >> Sampler 3 average top-1 acc: 0.00308 average top-5 acc: 0.03084 average loss 21.43339 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label: Basketballaverage accuracy: 0.00000 num:36 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:BasketballDunkaverage accuracy: 0.00000 num:37 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Bikingaverage accuracy: 0.00000 num:38 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:CliffDivingaverage accuracy: 0.00000 num:39 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:CricketBowlingaverage accuracy: 0.00000 num:36 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Divingaverage accuracy: 0.00000 num:45 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Fencingaverage accuracy: 0.00000 num:34 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:FloorGymnasticsaverage accuracy: 0.00000 num:36 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:GolfSwingaverage accuracy: 0.00000 num:39 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:HorseRidingaverage accuracy: 0.00000 num:49 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:IceDancingaverage accuracy: 0.00000 num:46 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:LongJumpaverage accuracy: 0.00000 num:39 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:PoleVaultaverage accuracy: 0.00000 num:40 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:RopeClimbingaverage accuracy: 0.00000 num:34 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:SalsaSpinaverage accuracy: 0.00000 num:43 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:SkateBoardingaverage accuracy: 0.00000 num:32 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Skiingaverage accuracy: 0.00000 num:40 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Skijetaverage accuracy: 0.00000 num:28 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:SoccerJugglingaverage accuracy: 0.00000 num:39 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:Surfingaverage accuracy: 0.00000 num:33 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:TennisSwingaverage accuracy: 0.00000 num:49 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:TrampolineJumpingaverage accuracy: 0.00000 num:32 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:VolleyballSpikingaverage accuracy: 0.00000 num:35 2024-03-05 12:22:30 linux root[24581] INFO Inference: Label:WalkingWithDogaverage accuracy: 0.00000 num:35 2024-03-05 12:22:30 linux root[24581] INFO > Avg class accuracy: 0.00000 2024-03-05 12:22:30 linux root[24581] INFO >> Avg class accuracy for sampler0: 0.00000 2024-03-05 12:22:30 linux root[24581] INFO >> Avg class accuracy for sampler 1: 0.01273 2024-03-05 12:22:30 linux root[24581] INFO >> Avg class accuracy for sampler 2: 0.00000 2024-03-05 12:22:30 linux root[24581] INFO >> Avg class accuracy for sampler 3`: 0.00199
2024-03-05 12:22:30 linux root[24581] INFO --- Finished ---

Classes of SSsub21

Hi!

Could you specify the 21 classes of SSsub21 you have used for your experiments? In the original paper of Something Something v1 [here] are mentioned only the subsets containing 10, 40, or 174 action classes.

Thank you!

Error in forward/backward

I have an error when start training
INFO one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4096, 256]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
WARNING Error in forward/backward: forward executed: True , backward executed: False
WARNING Creating dataloader for batch of size (2,16,224,224)

after research i found the solution about clone the data before network but i can't find where to clone

Thank you!

Checkpoint request

I encountered some difficulties in the process of reproducing your work, would you like to share the pre-trained checkpoint? thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.