GithubHelp home page GithubHelp logo

bsespede / image-and-video-understanding Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 5.43 MB

Space-time volume instances for platform diving classification

Python 70.28% QML 11.00% TeX 18.72%
spatio-temporal-analysis classification

image-and-video-understanding's Introduction

Potential Topics

  • detect computer keyboard key hits from video; this is probably only feasible for a reduced set of allowed keys
    • possible approaches:
      • using space time volumes (Filip Ilic PhD) of the hand
      • directly using image segmentation (detecting shape of the hands) and learning individual key presses from these (stationary) images with NN
    • data source:
      • self made
    • potential issues:
      • distinguish pressing/touching keys
  • detecting different routines from platfom/cliff diving videos
    • possible approaches:
      • using space time volumes (Filip Ilic PhD)
    • data source:
      • videos from red bull cliff diving world series competitions
    • potential issues:
      • ?
  • cloud forecast based on satellite image sequence or rain forecast based on weather radar image sequence
    • possible approach:
      • R(C)NN
    • data source:
      • one of the datasets from kaggle
      • ZAMG, or equivalent ...
  • If we want to keep the effort low, we could do something similar to the examples in Chapters 11-14 from this book: Deep Learning Cookbook, Douwe Osinga

Space-Time Volume Gesture Classification

Image Preprocessing

  • morphological operations if necessary

Feature Extraction

Maybe do a principal component analysis on the potential features to find most descriptive features.

  1. extract contours
    • perform pointPolygonTest to detect overlapping/occluding volumes
    • contour tracking? from lecture slides: Kass, Witkin, Terzopoulos, 1st ICCV, London, 1987
  2. fit shape
    • fitEllipse
    • fitLine
    • minAreaRect
  3. compute scalar feature values
    • for 2-3 largest shapes:
      • absolute orientation angle of two biggest objects
      • relative orientation angle
      • relative distance
      • distance to center of bounding box
      • relative size w.r.t. bounding box
      • elongation/compactness
    • number of shapes in bounding box
      • with relative size/area w.r.t. bounding box greater than some threshold

Feature normalization/centering: could become interesting with the absolute angle as 0=pi=2pi -> maybe rough discretization makes sense or (co)sine

Feature/Token Selection - Outlier Suppression

  • minimum relative area of fitted shape w.r.t. boundingbox
  • maybe ransac

Feature Filtering

  • matchShapes for STV contours over time:
    • if there are discontinuities in the matching value -> interpolate
    • or: use it for RANSAC
  • median filtering for scalar feature values

ML

  • CNN with 1D convolution layers and pooling layers
  • fully connected NN on entire feature-over-time sequences
  • RNN ?
  • Cost function: cross entropy

Code structure

  • Data preprocessing:
    • classifying-tool.py
    • visualization-tool.py
  • Feature extraction:
    • batch_extract_features.py
  • Training:
    • svm_train_test.py
  • Evaluation/visualisation:
    • test.py: run feature extraction on individual video
    • evaluate.py: visualize feature extraction & classification result of individual video
    • box_plot_accuracy.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.