EPIC-Kitchens-100 Action Recognition baselines: TSN, TRN, TSM

License: Other

Python 9.31% Shell 0.05% Jupyter Notebook 90.65%

c1-action-recognition-tsn-trn-tsm's Issues

Horizontal flip not returning flipped images?

C1-Action-Recognition-TSN-TRN-TSM/src/transforms.py

Lines 60 to 76 in 2b7fa18

 class GroupRandomHorizontalFlip: 

 """Randomly horizontally flips the given PIL.Image with a probability of 0.5""" 

 def __init__(self, is_flow=False): 

 self.is_flow = is_flow 

 @profile 

 def __call__(self, img_group, is_flow=False): 

 v = random.random() 

 if v < 0.5: 

 ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group] 

 if self.is_flow: 

 for i in range(0, len(ret), 2): 

 ret[i] = ImageOps.invert( 

 ret[i] 

 ) # invert flow pixel values when flipping 

 return img_group

It looks like ret is never returned, but instead the original images are returned. Can you check if the code may be not applying flip at all?

Additionally, is_flow in the __call__ method seems to be never used.

Is the accuracy based on each batch or the entire val dataset?

Thanks for sharing this great repo. I am looking at https://github.com/epic-kitchens/C1-Action-Recognition-TSN-TRN-TSM/blob/master/src/systems.py#L265, and I am trying to understand for the accuracy values were obtained. According to line 265, it seems the accuracy values are calculated based on only single batch, instead of the entire validation dataset (I didn't find accumulation operation, either). Can you confirm if acc_1 and acc_5 are for single batch or the entire validation dataset?

Action models always giving same output

I am trying to test the models on a personal egocentric dataset. Instead of creating a video dataset, I extract frames from the videos and stack them together (tested with frame_count 8 and 25) and feed it to model (TSN and TSM).
This is my code below :
`folder = '/data/sample/'
transforms = transforms.Compose([transforms.CenterCrop(224),
transforms.ToTensor()])

dict = torch.load('/data/tsn_rgb.ckpt', map_location="cpu")
cfg =  OmegaConf.create(dict["hyper_parameters"])
OmegaConf.set_struct(cfg, False)

cfg.data._root_gulp_dir = os.getcwd()  # set default root gulp dir to prevent
# exceptions on instantiating the EpicActionRecognitionSystem
data_dir_key = f"test_gulp_dir"
cfg.data[data_dir_key] = folder
cfg.trainer.accelerator = None

system = EpicActionRecognitionSystem(cfg)
system.load_state_dict(dict["state_dict"])

Img = None
for i in range(1, 26):
    img = transforms(Image.open(folder + 'img_' + str(i) + '.jpg')).unsqueeze(dim=0)
    if i > 1 :
        Img = torch.cat((Img, img), dim=0)
    else:
        Img = img
#
Img = Img.unsqueeze(dim=0)
print(Img.shape) # torch.Size([1, 25, 3, 224, 224])
print(system)
out = system(Img)
print(out.shape)
v, n = out[:, :97], out[:, 97:]
print(v.shape, n.shape)
print(torch.mean(v), torch.mean(n))`

This is my config :
{'modality': 'RGB', 'seed': 42, 'data': {'frame_count': 8, 'test_frame_count': 25, 'segment_length': 1, 'train_gulp_dir': '${data._root_gulp_dir}/rgb_train', 'val_gulp_dir': '${data._root_gulp_dir}/rgb_validation', 'test_gulp_dir': '/data/sample/', 'worker_count': 40, 'pin_memory': True, 'preprocessing': {'bgr': False, 'rescale': True, 'input_size': 224, 'scale_size': 256, 'mean': [0.485, 0.456, 0.406], 'std': [0.485, 0.456, 0.406]}, 'train_augmentation': {'multiscale_crop_scales': [1, 0.875, 0.75, 0.66]}, 'test_augmentation': {'rescale_size': 256}, '_root_gulp_dir': '/home/sanketthakur/Documents/gaze_pred/C1-Action-Recognition-TSN-TRN-TSM'}, 'model': {'type': 'TSN', 'backbone': 'resnet50', 'pretrained': 'imagenet', 'dropout': 0.7, 'partial_bn': True}, 'learning': {'batch_size': 4, 'optimizer': {'type': 'SGD', 'momentum': 0.9, 'weight_decay': 0.0005}, 'lr': 0.01, 'lr_scheduler': {'type': 'StepLR', 'gamma': 0.1, 'epochs': [20, 40]}}, 'trainer': {'gradient_clip_val': 20, 'max_epochs': 80, 'weights_summary': 'full', 'benchmark': True, 'terminate_on_nan': True, 'distributed_backend': 'dp', 'gpus': 0, 'accumulate_grad_batches': 2, 'accelerator': None}}

The network always predicts verb_id as 0 and noun_id as 1. I am not sure, if I am doing something wrong here. Any help is appreciated.
Thanks.

Confirm results of pretrained models

Hi!

I was testing the pre-trained model, TSM RGB, and I got odd results in the validation set.

For action@1, I got 28.23 while you reported 35.75

all_action_accuracy_at_1: 28.237484484898633
all_action_accuracy_at_5: 47.6934215970211
all_noun_accuracy_at_1: 39.68762929251138
all_noun_accuracy_at_5: 65.98055440628879
all_verb_accuracy_at_1: 57.03351261894911
all_verb_accuracy_at_5: 86.38808440215143
tail_action_accuracy_at_1: 12.045088566827697
tail_noun_accuracy_at_1: 20.157894736842106
tail_verb_accuracy_at_1: 28.40909090909091

commit: d58e695

Steps

I generated the results in the validation set with this repo
Then, I evaluate those with the corresponding code.

mean and std have the same values in the checkpoints RGB

Hi,
I downloaded the TSN (RGB) checkpoint to test it. After looking at the configuration attributes I saw that the mean and the std values for data pre-processing were actually the same. It is on purpose ?

If I do:
ckpt = torch.load("path/to/tosn_rgb.ckpt", map_location=lambda storage, loc: storage)
cfg = OmegaConf.create(ckpt["hyper_parameters"])
OmegaConf.set_struct(cfg, False)
cfg.data._root_gulp_dir = os.getcwd()
print(cfg.data.preprocessing.mean)
print(cfg.data.preprocessing.std)

Then I have:
'mean': [0.485, 0.456, 0.406]
'std': [0.485, 0.456, 0.406]

Actually after looking at the checkpoints of TRN and TMN I saw that they also have this problem.
Thanks.

Reproducing PIL-SIMD

Hi,

I was wondering if anyone has reproduced the PIL-SIMD patching of the conda environment recently?

Thanks!

Converting RGB idx to Flow Pandas SettingWithCopyWarning

When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?

~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)
(epic100-models)

How can I change the frame rate for training and evaluation?

Thanks for the datasets and pytorch codes. I have a question about how to change the frame rate during training/evaluation.
I am looking at tsn_rgb.yaml right now. Which attribute in this config file controls the frame rate? For example, if I want to lower the frame rate for action recognition (predict based on fewer frames for a segment), how should I modify the yaml file or python code?

Enable ipdb via config

Does anyone want a patch to enable or disable ipdb in train.py from the config file?

Here you go:

from contextlib import nullcontext
context_manager = nullcontext
    if cfg.debug:
        import ipdb
        context_manager = ipdb.ipdb.launch_ipdb_on_exception

    with context_manager():

Random crash when num_workers is larger than 0

Thanks for sharing these great resources. I tried to run the code on our server, but it randomly crashed when I set the num_workers to be larger than 0 (sometimes, it crashed at epoch 1, while other times it crashed after 6 epochs). When num_workers is 0, it didn't crash but it was extremely slow.

The error messages are like these:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: initialization error
Exception raised from insert_events at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f19cd7288b2 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f19cd97af20 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f19cd713b7d in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5f9e52 (0x7f1a4c9dfe52 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/utils.py", line 356, in <lambda>
    lambda: hydra.run(
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/core/utils.py", line 125, in run_job
    ret.return_value = task_function(task_cfg)
  File "src/train.py", line 53, in main
    trainer.fit(system, datamodule=data_module)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
    results = self.accelerator_backend.train()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
    return self.train_or_test()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
    results = self.trainer.train()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train
    self.train_loop.run_training_epoch()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 542, in run_training_epoch
    for batch_idx, (batch, is_last_batch) in train_dataloader:
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/profiler/profilers.py", line 85, in profile_iterable
    value = next(iterator)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 46, in _with_is_last
    last = next(it)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data
    success, data = self._try_get_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 25051) exited unexpectedly

Could someone provide some guidance on how to get around this error?

Error launching training

Hi,

I got this error trying to launch the training. Could you please provide a pointer on how to debug that?

I'm new to PytorchLighting. It seems that it modularizes the code too much.
I couldn't reach the function or class returning the wrong data type.

Details

The output of my shell

$ ls _root_gulp_dir
flow_test  flow_validation  flow_train  rgb_test  rgb_validation  rgb_train

BTW, I already visualize the data

Sampling strategy for training TSM architechture

Hi!

First, thank you for the nice repository, really helpful. I have a question regarding the sampling strategy you used to train TSM architecture using just RGB frames, the one from the Pretrained Models table.

From the config file, I see that you use 8 frames. However, I have been checking your EPIC Kitchens paper, and also the original TSM paper, and I have not been able to find how these 8 frames are sampled from the complete video sequence for a given action.

Are those frames consecutive?
Are uniformly sampled from the complete video sequence? eg. if the sequence has 300 frames, we select frames [0, 43, 86, 129, 171, 214, 257, 300]
Any other sampling strategy?

Thank you!

Alex.

Are frames values in VideoRecord 1-indexed?

Dear Will,

Could you please confirm that the frame values in record.metadata is 1-indexed?
I have that impression.

Thanks in advance!

Download link for TSM features not working

Hi!

I'm trying to download the RGB features extracted with TSM, but the link seems broken.
Do you have an alternative one or should I just wait?

Thanks in advance.

Kind regards,
Alessandro

Pickle EpicVideoDataset object for distributed processing training

Dear Will,

Is there a way to pickle the EpicVideoDataset object?
I was trying to use that class with a code using Distributed Data Parallel, multiprocessing. I got an error along the lines of Can't pickle object EpicVideoDataset.

Is that related to the GULPReader? Do you know a workaround for that?

Any thoughts or pointer will be relevant.

Thanks!

src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning

When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?

~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)

Switch to gulpio2

I've forked gulpio to gulpio2 to use faster JPEG decoding from simplejpeg. It should be a drop in replacement, but we need to check that gulping the full dataset still works before merging this change in

How to test a pretrained model

I downloaded pretrained models and loaded them from check points in models folder

python src/test.py \
    models/trn_rgb.ckpt \
    results/trn_rgb.pt \
    --split val

But I get this error TypeError: init() got an unexpected keyword argument 'row_log_interval'

How should I load a pretrained models and test it?

Thanks

Minor inconsistency in the gulp adaptor.

Hi, I found that in the RGB gulp adaptor, the stop_frame is inclusive.

C1-Action-Recognition-TSN-TRN-TSM/src/utils/gulp_adapter.py

Line 75 in e8e1a79

for idx in range(meta["start_frame"], meta["stop_frame"] + 1)

whereas in the flow adaptor, the stop_frame is excluded.

C1-Action-Recognition-TSN-TRN-TSM/src/utils/gulp_adapter.py

Line 120 in e8e1a79

for idx in range(start_frame, stop_frame)

I know that it makes little difference, but just wanted to point it out because I found it when I was adapting the code to my training pipeline and I had an error trying to read one more frame.

Update to PTL 1.1.8

PTL v0.9.0 is quite old now and they've changed their API quite a lot.

gulpio.utils.ImageNotFound

When running the src/gulp_data.py i always get an Error on the same Frame and it stops.

raise ImageNotFound("Image is  None from path:{}".format(img_path))
gulpio.utils.ImageNotFound: Image is  None from path:/run/media/local_admin/ESMI MD II/EPIC-KITCHENS/P01/rgb_frames/P01/P01_01/frame_0000000008.jpg

	class GroupRandomHorizontalFlip:
	"""Randomly horizontally flips the given PIL.Image with a probability of 0.5"""

	def __init__(self, is_flow=False):
	self.is_flow = is_flow

	@profile
	def __call__(self, img_group, is_flow=False):
	v = random.random()
	if v < 0.5:
	ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
	if self.is_flow:
	for i in range(0, len(ret), 2):
	ret[i] = ImageOps.invert(
	ret[i]
	) # invert flow pixel values when flipping
	return img_group

epic-kitchens / c1-action-recognition-tsn-trn-tsm Goto Github PK

c1-action-recognition-tsn-trn-tsm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs