GithubHelp home page GithubHelp logo

ai4bharat / openhands Goto Github PK

View Code? Open in Web Editor NEW
70.0 70.0 5.0 6.06 MB

👐OpenHands : Making Sign Language Recognition Accessible

Home Page: https://openhands.readthedocs.io

License: Apache License 2.0

Python 100.00%

openhands's People

Contributors

gokulnc avatar grohith327 avatar manideepladi avatar prem-kumar27 avatar sumit-negi-github avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

openhands's Issues

Convert Holistic to BlazePose+Hands

Holistic just seems to be BlazePose+Hands+FaceMesh.
For faster generation of keypoints, better use the first 2 itself as in INCLUDE repo.

Support for UniformSampling and Frame-Skipping

This is important for models that take only fixed no. of timesteps as input.

1. Support for UniformSampling

  • Rename TemporalSubsample to TemporalSample
  • Add a param: subsample_prob
    • If 0 < subsample_prob < 1, randomly sub-sampling or uniform-sampling is done
    • If subsample_prob==0, only uniform-sampling (for test sets)
    • If subsample_prob==1, only sub-sampling (to reproduce results of some papers that use only subsampling)

2. Support for FrameSkipping

  • Create a new aug class with param: skip_frames
  • Sample once per that value and return

About the wrong st_gcn checkpoints files provided on GSL

import omegaconf
from openhands.apis.inference import InferenceModel

cfg = omegaconf.OmegaConf.load("GSL/gsl/st_gcn/config.yaml")
model = InferenceModel(cfg=cfg)
model.init_from_checkpoint_if_available()
if cfg.data.test_pipeline.dataset.inference_mode:
    model.test_inference()
else:
    model.compute_test_accuracy()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_6585/2983784194.py in <module>
      4 cfg = omegaconf.OmegaConf.load("GSL/gsl/st_gcn/config.yaml")
      5 model = InferenceModel(cfg=cfg)
----> 6 model.init_from_checkpoint_if_available()
      7 if cfg.data.test_pipeline.dataset.inference_mode:
      8     model.test_inference()

~/OpenHands/openhands/apis/inference.py in init_from_checkpoint_if_available(self, map_location)
     47         print(f"Loading checkpoint from: {ckpt_path}")
     48         ckpt = torch.load(ckpt_path, map_location=map_location)
---> 49         self.load_state_dict(ckpt["state_dict"], strict=False)
     50         del ckpt
     51 

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1050         if len(error_msgs) > 0:
   1051             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1052                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1053         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1054 

RuntimeError: Error(s) in loading state_dict for InferenceModel:
	size mismatch for model.encoder.A: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.st_gcn_networks.0.gcn.conv.weight: copying a param with shape torch.Size([128, 2, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 2, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.0.gcn.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for model.encoder.st_gcn_networks.1.gcn.conv.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 64, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.1.gcn.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for model.encoder.st_gcn_networks.2.gcn.conv.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 64, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.2.gcn.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for model.encoder.st_gcn_networks.3.gcn.conv.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 64, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.3.gcn.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]).
	size mismatch for model.encoder.st_gcn_networks.4.gcn.conv.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 64, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.4.gcn.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for model.encoder.st_gcn_networks.5.gcn.conv.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 128, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.5.gcn.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for model.encoder.st_gcn_networks.6.gcn.conv.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 128, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.6.gcn.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for model.encoder.st_gcn_networks.7.gcn.conv.weight: copying a param with shape torch.Size([512, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 128, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.7.gcn.conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for model.encoder.st_gcn_networks.8.gcn.conv.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 256, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.8.gcn.conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for model.encoder.st_gcn_networks.9.gcn.conv.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 256, 1, 1]).
	size mismatch for model.encoder.st_gcn_networks.9.gcn.conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for model.encoder.edge_importance.0: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.1: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.2: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.3: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.4: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.5: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.6: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.7: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.8: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).
	size mismatch for model.encoder.edge_importance.9: copying a param with shape torch.Size([2, 27, 27]) from checkpoint, the shape in current model is torch.Size([3, 27, 27]).

Question about 'Config-based training'

I try the code from Config-based training as below.

import omegaconf
from openhands.apis.classification_model import ClassificationModel
from openhands.core.exp_utils import get_trainer
import os 

os.environ["CUDA_VISIBLE_DEVICES"]="2,3"
cfg = omegaconf.OmegaConf.load("examples/configs/lsa64/decoupled_gcn.yaml")
trainer = get_trainer(cfg)


model = ClassificationModel(cfg=cfg, trainer=trainer)
model.init_from_checkpoint_if_available()
model.fit()
/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:747: UserWarning: You requested multiple GPUs but did not specify a backend, e.g. `Trainer(accelerator="dp"|"ddp"|"ddp2")`. Setting `accelerator="ddp_spawn"` for you.
  "You requested multiple GPUs but did not specify a backend, e.g."
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
/raid/xxx/OpenHands/openhands/apis/inference.py:21: LightningDeprecationWarning: The `LightningModule.datamodule` property is deprecated in v1.3 and will be removed in v1.5. Access the datamodule through using `self.trainer.datamodule` instead.
  self.datamodule.setup(stage=stage)
Found 64 classes in train splits
Found 64 classes in test splits
Train set size: 2560
Valid set size: 320
/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py:424: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.
  f"DataModule.{name} has already been called, so it will not be called again. "
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2,3]
Traceback (most recent call last):
  File "study_train.py", line 15, in <module>
    model.fit()
  File "/raid/xxx/OpenHands/openhands/apis/classification_model.py", line 104, in fit
    self.trainer.fit(self, self.datamodule)
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run
    self._dispatch()
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch
    self.accelerator.start_training(self)
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 158, in start_training
    mp.spawn(self.new_process, **self.mp_spawn_kwargs)
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/raid/xxx/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 148, in start_processes
    process.start()
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/raid/xxx/anaconda3/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'DecoupledGCN_TCN_unit.__init__.<locals>.<lambda>'
(base) 

Lower accuracy when inferring a single video

Hello,

When I supply the inference model with multiple videos, the model predicts all of them right. But if I supply only one video then the prediction is wrong. I am curious about the cause of this? Can anyone please explain?

Thank you!

Consistent Dataset Handling

Very nice repo and documentation!

I think this repository can benefit from using https://github.com/sign-language-processing/datasets as data loaders.

It is fast, consistent across datasets, and allows loading videos / poses from multiple datasets.
If a dataset you are using is not there, you can ask for it or add it yourself, it is a breeze.

The repo supports many datasets, multiple pose estimation formats, binary pose files, fps and resolution manipulations, and dataset disk mapping.

Finally, this would make this repo less complex. This repo does pre-training and fine-tuning, the other repo does datasets, and they could be used together.

Please consider :)

Question about GSL dataset

I have no idea how to get the Isolated gloss sign language recognition (GSL isol.) data (xxx_signerx_repx_glosses), while I only find the continuous sign language recognition data (xxx_signerx_repx_sentences) from https://zenodo.org/record/3941811.

Thank you very much for any information about this.

Add augmentations for viewpoint shift (camera angle and distance)

As in:
Contrastive Self-Supervised Learning for Skeleton Action Recognition

image

Viewpoint Augmentations:

  • Angle shift
    • Implemented as rotation transform of the original coordinate-system
    • Currently, the implemented RotationTransform only does rotation about 1 axis. Extend to 2 axis (if possible, 3D)
  • Distance shift
    • (But since we do ScaleNormalization before sending to network, is this necessary?)
    • (Also ScaleTransform should handle this?)

Retraining support by replacing final FC classifier

For any model trained on any dataset, add support for retraining the model using the given checkpoint trained using some other dataset.

Suggestions:

  • Make FC as the base class for LSTM and BERT
  • If backbone is passed in config, load the given checkpoint except for the final layer.

Pose Interpolation for failed frames

Add an augmentation to repair missing keypoints in some intermediate frames, by interpolating between previous good frame and next good frame.

ST-GCN does not work for mediapipe

Currently the openpose layout seems to be hardcoded in graph_utils.py's Graph class.
Should we also add a layout for mediapipe, or pass the joints via yml?

installation issue

Hello, thank you for providing such a great framework, but there was an error when I import the module. Could you please offer me a help?
code:

import omegaconf
from openhands.apis.classification_model import ClassificationModel
from openhands.core.exp_utils import get_trainer

cfg = omegaconf.OmegaConf.load("1.yaml")
trainer = get_trainer(cfg)

model = ClassificationModel(cfg=cfg, trainer=trainer)
model.init_from_checkpoint_if_available()
model.fit()

ERROR:
Traceback (most recent call last):
File "/home/hxz/project/pose_SLR/main.py", line 3, in
from openhands.apis.classification_model import ClassificationModel
ModuleNotFoundError: No module named 'openhands.apis'

Using `pose-format` for consistent `.pose` files

Seems like for pose data you are using pkl and h5.
Also, that you have a custom mediapipe holistic script

Personally I believe it would be more shareable, and faster, to use a binary format like https://github.com/AmitMY/pose-format
Every pose file also declares its content, so you can transfer them between projects, or convert them to different formats with relative is.

Besides the fact that it has a holistic loading script and multiple formats of OpenPose, it is a binary format which is faster to load, allows loading to numpy, torch and tensorflow, and can perform several operations on poses.

It also allows the visualization of pose files, separately or on top of videos, and while admittedly this repository is not perfect, in my opinion it is better than having json or pkl files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.