EGTEA+ dataset include three different train/test splits for action recognition as train/test_split(1-3). This model takes a small subset of dataset from the EGTEA Gaze+ dataset. This dataset belongs to split 1 and includes cropped videos corresponding to the folder P17-R04-ContinentalBreakfast under the ContinentalBreakfast category.
Because of the constraint of resources, I have trained the model on a custom CNN architecture similar to VGGNet. This model covers only one stream as of now which is single frame.
The dataset used is available here.
The logs of the training session is available here and model weights can be downloaded from here. Also the plots from the training session are as follows: