GithubHelp home page GithubHelp logo

vaishwi / capsum-joint-video-summarization-and-video-captioing Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 31.98 MB

AI based Video summarizer along with captioning.

License: MIT License

Dockerfile 0.27% Python 58.10% HTML 1.10% JavaScript 11.19% CSS 29.33%
video-summarization video-captioning gcp-storage gcp-sql gcp-app-engine

capsum-joint-video-summarization-and-video-captioing's Introduction

CapSum: Joint Video Summarization and Video Captioning

Introduction

In the present day the goal of every application is to provide a method which reduces time and provides accurate results. The idea of the project is to provide an application which combines video summarization and video captioning. The software as services which are to be provided allows user to move through the application with the option to get summarized video and view previously summarized video. The goal is to provide a summarized video along with captions added to the summarized video. The services which are provided are depended on Google Cloud Platform (GCP) for storage and database both. The application firstly saves the time by reducing video length and secondly the captions provides information in written format. Video summarization and video captioning are considered two separate tasks in existing studies.

Dataset

The dataset benchmark TVSum consist of labeled keyshots video summarization, albeit still small for training deep models. The model is trained using frame-level scores, while the evaluation is performed with the binary keyshot summaries.In the augmented setting we also maintain the 5-fold cross validation with the 80/20 train/test, but add the other datasets to the training split.

Microsoft Video Description Corpus, is a collection of Youtube clips which is collected on Mechanical Turk by requesting workers to pick short clips depicting a single activity. The videos were then used to elicit single sentence descriptions from annotators. Also this dataset has more than one description for individual clips. While prepossessing there is division of the total dataset into three parts- Training, Validation and Testing as 60%, 20% and 20% respectively.

Models

Architecture Diagram

Architecture

CapSum

CapSum

Summarization

VASNet

Captioning

S2VT model

Results

Video Summarization

For demonstration 4 videos of resolution 854x480, frame-rate 24 and duration varying from 02:00 to 06:00 are used.

Link to the results: Summarization Result

Video Captioning

For testing use of clips with short time duration is done. This clips contain single activities which may include more than one person.

Link to the results: Captioning Result

CapSum (Combine)

After the summarized and captioning model is created, they are deployed. The website then receives the values of the generated which is updated for so.

Link to the results: CapSum Demo

Conclusions

After applying these models, for captioning we have been able to generate results which provided near accurate results. Whereas for summarization the reduction of frames was desirable in terms of providing the frames of high importance.

Reference

  1. Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizingweb videos using titles,” inProceedings of the IEEE conference oncomputer vision and pattern recognition, 2015, pp. 5179–5187.
  2. D. Chen and W. B. Dolan, “Collecting highly parallel data for paraphraseevaluation,” inProceedings of the 49th Annual Meeting of the Associ-ation for Computational Linguistics: Human Language Technologies,2011, pp. 190–200.
  3. J. Fajtl, H. Sokeh, V. Argyriou, D. Monekosso, and P. Remagnino,“Summarizing videos with attention,” 12 2018.
  4. S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, andK. Saenko, “Sequence to sequence-video to text,” inProceedings ofthe IEEE international conference on computer vision, 2015, pp. 4534–4542.

capsum-joint-video-summarization-and-video-captioing's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

arpitpatel1501

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.