epic-kitchens / c1-action-recognition-tsn-trn-tsm Goto Github PK
View Code? Open in Web Editor NEWEPIC-Kitchens-100 Action Recognition baselines: TSN, TRN, TSM
License: Other
EPIC-Kitchens-100 Action Recognition baselines: TSN, TRN, TSM
License: Other
C1-Action-Recognition-TSN-TRN-TSM/src/transforms.py
Lines 60 to 76 in 2b7fa18
It looks like ret
is never returned, but instead the original images are returned. Can you check if the code may be not applying flip at all?
Additionally, is_flow
in the __call__
method seems to be never used.
Thanks for sharing this great repo. I am looking at https://github.com/epic-kitchens/C1-Action-Recognition-TSN-TRN-TSM/blob/master/src/systems.py#L265, and I am trying to understand for the accuracy values were obtained. According to line 265, it seems the accuracy values are calculated based on only single batch, instead of the entire validation dataset (I didn't find accumulation operation, either). Can you confirm if acc_1 and acc_5 are for single batch or the entire validation dataset?
I am trying to test the models on a personal egocentric dataset. Instead of creating a video dataset, I extract frames from the videos and stack them together (tested with frame_count 8 and 25) and feed it to model (TSN and TSM).
This is my code below :
`folder = '/data/sample/'
transforms = transforms.Compose([transforms.CenterCrop(224),
transforms.ToTensor()])
dict = torch.load('/data/tsn_rgb.ckpt', map_location="cpu")
cfg = OmegaConf.create(dict["hyper_parameters"])
OmegaConf.set_struct(cfg, False)
cfg.data._root_gulp_dir = os.getcwd() # set default root gulp dir to prevent
# exceptions on instantiating the EpicActionRecognitionSystem
data_dir_key = f"test_gulp_dir"
cfg.data[data_dir_key] = folder
cfg.trainer.accelerator = None
system = EpicActionRecognitionSystem(cfg)
system.load_state_dict(dict["state_dict"])
Img = None
for i in range(1, 26):
img = transforms(Image.open(folder + 'img_' + str(i) + '.jpg')).unsqueeze(dim=0)
if i > 1 :
Img = torch.cat((Img, img), dim=0)
else:
Img = img
#
Img = Img.unsqueeze(dim=0)
print(Img.shape) # torch.Size([1, 25, 3, 224, 224])
print(system)
out = system(Img)
print(out.shape)
v, n = out[:, :97], out[:, 97:]
print(v.shape, n.shape)
print(torch.mean(v), torch.mean(n))`
This is my config :
{'modality': 'RGB', 'seed': 42, 'data': {'frame_count': 8, 'test_frame_count': 25, 'segment_length': 1, 'train_gulp_dir': '${data._root_gulp_dir}/rgb_train', 'val_gulp_dir': '${data._root_gulp_dir}/rgb_validation', 'test_gulp_dir': '/data/sample/', 'worker_count': 40, 'pin_memory': True, 'preprocessing': {'bgr': False, 'rescale': True, 'input_size': 224, 'scale_size': 256, 'mean': [0.485, 0.456, 0.406], 'std': [0.485, 0.456, 0.406]}, 'train_augmentation': {'multiscale_crop_scales': [1, 0.875, 0.75, 0.66]}, 'test_augmentation': {'rescale_size': 256}, '_root_gulp_dir': '/home/sanketthakur/Documents/gaze_pred/C1-Action-Recognition-TSN-TRN-TSM'}, 'model': {'type': 'TSN', 'backbone': 'resnet50', 'pretrained': 'imagenet', 'dropout': 0.7, 'partial_bn': True}, 'learning': {'batch_size': 4, 'optimizer': {'type': 'SGD', 'momentum': 0.9, 'weight_decay': 0.0005}, 'lr': 0.01, 'lr_scheduler': {'type': 'StepLR', 'gamma': 0.1, 'epochs': [20, 40]}}, 'trainer': {'gradient_clip_val': 20, 'max_epochs': 80, 'weights_summary': 'full', 'benchmark': True, 'terminate_on_nan': True, 'distributed_backend': 'dp', 'gpus': 0, 'accumulate_grad_batches': 2, 'accelerator': None}}
The network always predicts verb_id as 0 and noun_id as 1. I am not sure, if I am doing something wrong here. Any help is appreciated.
Thanks.
Hi!
I was testing the pre-trained model, TSM RGB, and I got odd results in the validation set.
For action@1, I got 28.23 while you reported 35.75
all_action_accuracy_at_1: 28.237484484898633
all_action_accuracy_at_5: 47.6934215970211
all_noun_accuracy_at_1: 39.68762929251138
all_noun_accuracy_at_5: 65.98055440628879
all_verb_accuracy_at_1: 57.03351261894911
all_verb_accuracy_at_5: 86.38808440215143
tail_action_accuracy_at_1: 12.045088566827697
tail_noun_accuracy_at_1: 20.157894736842106
tail_verb_accuracy_at_1: 28.40909090909091
commit: d58e695
Steps
Hi,
I downloaded the TSN (RGB) checkpoint to test it. After looking at the configuration attributes I saw that the mean and the std values for data pre-processing were actually the same. It is on purpose ?
If I do:
ckpt = torch.load("path/to/tosn_rgb.ckpt", map_location=lambda storage, loc: storage)
cfg = OmegaConf.create(ckpt["hyper_parameters"])
OmegaConf.set_struct(cfg, False)
cfg.data._root_gulp_dir = os.getcwd()
print(cfg.data.preprocessing.mean)
print(cfg.data.preprocessing.std)
Then I have:
'mean': [0.485, 0.456, 0.406]
'std': [0.485, 0.456, 0.406]
Actually after looking at the checkpoints of TRN and TMN I saw that they also have this problem.
Thanks.
Hi,
I was wondering if anyone has reproduced the PIL-SIMD patching of the conda environment recently?
Thanks!
When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?
~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)
(epic100-models)
Thanks for the datasets and pytorch codes. I have a question about how to change the frame rate during training/evaluation.
I am looking at tsn_rgb.yaml right now. Which attribute in this config file controls the frame rate? For example, if I want to lower the frame rate for action recognition (predict based on fewer frames for a segment), how should I modify the yaml file or python code?
Does anyone want a patch to enable or disable ipdb in train.py
from the config file?
Here you go:
from contextlib import nullcontext
context_manager = nullcontext
if cfg.debug:
import ipdb
context_manager = ipdb.ipdb.launch_ipdb_on_exception
with context_manager():
Thanks for sharing these great resources. I tried to run the code on our server, but it randomly crashed when I set the num_workers to be larger than 0 (sometimes, it crashed at epoch 1, while other times it crashed after 6 epochs). When num_workers is 0, it didn't crash but it was extremely slow.
The error messages are like these:
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f19cd7288b2 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f19cd97af20 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f19cd713b7d in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5f9e52 (0x7f1a4c9dfe52 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/utils.py", line 356, in <lambda>
lambda: hydra.run(
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
return run_job(
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/core/utils.py", line 125, in run_job
ret.return_value = task_function(task_cfg)
File "src/train.py", line 53, in main
trainer.fit(system, datamodule=data_module)
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
results = self.accelerator_backend.train()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
return self.train_or_test()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train
self.train_loop.run_training_epoch()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 542, in run_training_epoch
for batch_idx, (batch, is_last_batch) in train_dataloader:
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/profiler/profilers.py", line 85, in profile_iterable
value = next(iterator)
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 46, in _with_is_last
last = next(it)
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
data = self._next_data()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
idx, data = self._get_data()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data
success, data = self._try_get_data()
File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 25051) exited unexpectedly
Could someone provide some guidance on how to get around this error?
Hi,
I got this error trying to launch the training. Could you please provide a pointer on how to debug that?
I'm new to PytorchLighting. It seems that it modularizes the code too much.
I couldn't reach the function or class returning the wrong data type.
Details
The output of my shell
$ ls _root_gulp_dir
flow_test flow_validation flow_train rgb_test rgb_validation rgb_train
BTW, I already visualize the data
Hi!
First, thank you for the nice repository, really helpful. I have a question regarding the sampling strategy you used to train TSM architecture using just RGB frames, the one from the Pretrained Models table.
From the config file, I see that you use 8 frames. However, I have been checking your EPIC Kitchens paper, and also the original TSM paper, and I have not been able to find how these 8 frames are sampled from the complete video sequence for a given action.
Thank you!
Alex.
Dear Will,
Could you please confirm that the frame values in record.metadata
is 1-indexed?
I have that impression.
Thanks in advance!
Hi!
I'm trying to download the RGB features extracted with TSM, but the link seems broken.
Do you have an alternative one or should I just wait?
Thanks in advance.
Kind regards,
Alessandro
Dear Will,
Is there a way to pickle the EpicVideoDataset
object?
I was trying to use that class with a code using Distributed Data Parallel, multiprocessing. I got an error along the lines of Can't pickle object EpicVideoDataset
.
Is that related to the GULPReader? Do you know a workaround for that?
Any thoughts or pointer will be relevant.
Thanks!
When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?
~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)
I've forked gulpio to gulpio2 to use faster JPEG decoding from simplejpeg. It should be a drop in replacement, but we need to check that gulping the full dataset still works before merging this change in
I downloaded pretrained models and loaded them from check points in models folder
python src/test.py \
models/trn_rgb.ckpt \
results/trn_rgb.pt \
--split val
But I get this error TypeError: init() got an unexpected keyword argument 'row_log_interval'
How should I load a pretrained models and test it?
Thanks
Hi, I found that in the RGB gulp adaptor, the stop_frame
is inclusive.
whereas in the flow adaptor, the stop_frame
is excluded.
I know that it makes little difference, but just wanted to point it out because I found it when I was adapting the code to my training pipeline and I had an error trying to read one more frame.
PTL v0.9.0 is quite old now and they've changed their API quite a lot.
When running the src/gulp_data.py i always get an Error on the same Frame and it stops.
raise ImageNotFound("Image is None from path:{}".format(img_path))
gulpio.utils.ImageNotFound: Image is None from path:/run/media/local_admin/ESMI MD II/EPIC-KITCHENS/P01/rgb_frames/P01/P01_01/frame_0000000008.jpg
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.