GithubHelp home page GithubHelp logo

ibm / watson-multimedia-analyzer Goto Github PK

View Code? Open in Web Editor NEW
23.0 15.0 30.0 59.59 MB

WARNING: This repository is no longer maintained ⚠️ This repository will not be updated. The repository will be kept available in read-only mode. A Node app that use Watson Visual Recognition, Speech to Text, Natural Language Understanding, and Tone Analyzer to enrich media files.

Home Page: https://developer.ibm.com/code/patterns/enrich-multi-media-files-using-ibm-watson/

License: Apache License 2.0

JavaScript 45.18% CSS 42.42% HTML 12.30% Dockerfile 0.10%
watson-visual-recognition watson-tone-analyzer watson-speech bluemix natural-language nodejs cloudant-nosql-database watson-services ibmcode

watson-multimedia-analyzer's Introduction

Build Status

Using IBM Watson to enrich audio and visual files.

WARNING: This repository is no longer maintained.

This repository will not be updated. The repository will be kept available in read-only mode.

In this developer journey we will use Watson services to showcase how media (both audio and video) can be enriched on a timeline basis. Credit goes to Scott Graham for providing the initial application.

Flow

  1. Media file is passed into the Media Processor enrichment process.
  2. The Watson Speech to Text Service translates audio to text. The text is broken up into scenes, based on a timer, a change in speaker, or a significant pause in speech.
  3. The Watson Natural Language Understanding Service pulls out keywords, entities, concepts, and taxonomy for each scene.
  4. The Watson Tone Analyzer Service extracts top emotions, social and writing tones for each scene.
  5. The Watson Visual Recognition Service takes a screen capture every 10 seconds and creats a 'moment'. Classifications, faces and words are extracted from each screen shot.
  6. All scenes and 'moments' are stored in the Watson Cloudant NoSQL DB.
  7. The app UI displays stored scenes and 'moments'.

Watson Accelerators

Visit the Watson Accelerators portal to see more live patterns in action.

Included components

  • Watson Natural Language Understanding: A IBM Cloud service that can analyze text to extract meta-data from content such as concepts, entities, keywords, categories, sentiment, emotion, relations, semantic roles, using natural language understanding.
  • Watson Speech-to-Text: A service that converts human voice into written text.
  • Watson Tone Analyzer: Uses linguistic analysis to detect communication tones in written text.
  • Watson Visual Recognition: Visual Recognition understands the contents of images - visual concepts tag the image, find human faces, approximate age and gender, and find similar images in a collection.
  • Cloudant NoSQL DB: A fully managed data layer designed for modern web and mobile applications that leverages a flexible JSON schema.

Featured Technologies

  • Node.js: An asynchronous event driven JavaScript runtime, designed to build scalable applications.
  • AngularJS

Watch the Video

Steps

This journey contains multiple apps - the app server which communicates with the Watson services and renders the UI, and the process media app which enriches multimedia files. Both of these need to be run locally to enrich media files. Once media files are enriched, the app server can be deployed to IBM Cloud so that the UI can be run remotely.

NOTE: To enrich multimedia files, both the app server and enrichment process must be run locally.

For convenience, we recommend that you use the Deploy to IBM Cloud button to initially create the Watson services and deploy the Watson Multimedia Analyzer application. Using this feature will provide the following benefits:

  • All Watson services are automatically created and associated with the deployed app.
  • Watson service credentials will be centrally located and easily accessible.
  • Once you have completed this journey, all of the Watson services can be automatically deleted along with deployed app.

Deploy to IBM Cloud

Deploy to IBM Cloud

  1. Press the above Deploy to IBM Cloud button and then click on Deploy.

  2. In Toolchains, click on Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking 'View app'.

  1. To see the app and services created and configured for this journey, use the IBM Cloud dashboard. The app is named watson-multimedia-analyzer with a unique suffix. The following services are created and easily identified by the wma- prefix:
    • wma-natural-language-understanding
    • wma-speech-to-text
    • wma-tone-analyzer
    • wma-visual-recognition
    • wma-cloudant

Note: Even though the watson-mulitmedia-analyzer has been deployed to IBM Cloud and can be accessed remotely, it will not display correctly until the following steps are completed.

  1. Clone the repo
  2. Configure the Watson Multimedia Analzer application
  3. Configure credentials
  4. Run application
  5. Enrich multimedia files
  6. View results in UI

1. Clone the repo

Clone the watson-multimedia-analyzer locally. In a terminal, run:

$ git clone https://github.com/ibm/watson-multimedia-analyzer

2. Configure the Watson Multimedia Analzer application

Install package managers

Use this link to download and install node.js and npm to your local system.

Install the Bower package manager:

npm install -g bower

Install dependencies

cd watson-multimedia-analyzer
npm install
bower install

3. Configure credentials

The credentials for IBM Cloud services (Visual Recognition, Speech to Text, Tone Analyzer, Natural Language Understanding, and Cloudant NoSQL DB), can be found in the Services menu in Bluemix, by selecting the Service Credentials option for each service.

Or, all of the credentials can be conveniently accessed by visiting the Connections IBM Cloud panel for the deployed app.

Copy the env.sample to .env.

$ cp env.sample .env

Edit the .env file with the necessary settings.

env.sample:

# Replace the credentials here with your own.
# Rename this file to .env before starting the app.

# Cloudant Credentials
# The name of your database (Created upon startup of APP) You can leave this alone and use default below
DB_NAME=video_metadata_db

# Cloudant NoSQL DB Credentials and Config options (Required)
DB_USERNAME=<add_db_username>
DB_PASSWORD=<add_db_password>
DB_HOST=<add_db_host_name>
DB_PORT=<add_db_port_num>
DB_URL=<add_db_url>

# Tone Analyzer Credentials
TONE_ANALYZER_USERNAME=<add_tone_username>
TONE_ANALYZER_PASSWORD=<add_tone_password>

# SpeechToText Credentials
SPEECH_TO_TEXT_USERNAME=<add_stt_username>
SPEECH_TO_TEXT_PASSWORD=<add_stt_username>

# Visual Recognition Key
VR_KEY=<add_vr_recognition_key>

# Natural Language Understanding Credentials
NATURAL_LANGUAGE_UNDERSTANDING_USERNAME=<add_nlu_username>
NATURAL_LANGUAGE_UNDERSTANDING_PASSWORD=<add_nlu_password>

4. Run application

npm start
  • Take note of the successful creation and deployment of the Cloudant NoSQL DB
watson-multimedia-analyzer $ npm start

> [email protected] start /test/watson-multimedia-analyzer
> node app.js | node_modules/.bin/pino

[2017-06-13T21:17:14.333Z] INFO (50150 on TEST-MBP.attlocal.net): AppEnv is: {"app":{},"services":{},"isLocal":true,"name":"test-multimedia-enrichment","port":6007,"bind":"localhost","urls":["http://localhost:6007"],"url":"http://localhost:6007"}
[2017-06-13T21:17:14.335Z] INFO (50150 on TEST-MBP.attlocal.net): cloudant_credentials null
[2017-06-13T21:17:14.336Z] INFO (50150 on TEST-MBP.attlocal.net): dbConfig  {"url":"https://65e02d54-e2d1-4ccb-a5db-72064d16f76d-bluemix:19f3a0601a8992be63e4a6cb449172a6ef3f1533e52669e96de93eb31e0115f2@65e02d54-e2d1-4ccb-a5db-72064d16f76d-bluemix.cloudant.com","host":"65e02d54-e2d1-4ccb-a5db-72064d16f76d-bluemix.cloudant.com","port":"443","username":"xxx","password":"xxx"}
[2017-06-13T21:17:14.368Z] INFO (50150 on TEST-MBP.attlocal.net): AppEnv is: {"app":{},"services":{},"isLocal":true,"name":"test-multimedia-enrichment","port":6007,"bind":"localhost","urls":["http://localhost:6007"],"url":"http://localhost:6007"}
[2017-06-13T21:17:14.368Z] INFO (50150 on TEST-MBP.attlocal.net): cloudant_credentials null
server starting on http://localhost:6007
[2017-06-13T21:17:15.053Z] INFO (50150 on TEST-MBP.attlocal.net): video_metadata_db_status Database already created!
[2017-06-13T21:17:15.058Z] INFO (50150 on TEST-MBP.attlocal.net): video_metadata_db Database already created!
[2017-06-13T21:17:15.058Z] INFO (50150 on TEST-MBP.attlocal.net): Successfully created database:  video_metadata_db
[2017-06-13T21:17:15.136Z] INFO (50150 on TEST-MBP.attlocal.net): Successfully Created views in database
[2017-06-13T21:17:15.136Z] INFO (50150 on TEST-MBP.attlocal.net): Views already exist.

5. Enrich multimedia files

To enrich media files, they need to be processed by the processMedia function.

For encoding Speech-to-Text (STT) and Visual Recognition (VR) from the command line, you need to install ffmpeg and ffprobe.

# Install ffmpeg with the libopus audio codex enabled
# On OSX
brew install ffmpeg --with-opus
npm install node-ffprobe

# On Ubuntu
sudo apt-get install ffmpeg --with-opus
npm install node-ffprobe

Enrichment is initiated via the command line using bin/processMedia. The usage for the command is as follows:

bin/processMedia --help

Usage: processMedia [options]

Options:

-h, --help output usage information
-d, --save-to-db save to db
-o, --save-to-file save to file
-S, --use-stt use STT
-V, --use-vr Use Visual Recognition
-r, --vr-rate <i> Visual Recognition Rate (default 10 seconds)
-m, --enrichment-model GAP|TIMED Enrichment Model
-g, --time-gap  Time Gap for GAP model
-f, --media-file filename Media File
-x, --xml-file filename XML URI or filename

Note: Using Visual Recognition will take significantly longer. It is worth testing your setup without using the -V option. Once the -S option or the subtitles are correctly determined, add the -V option. There is a limitation on your VR account (250 images/day), so proceed with caution.

Enrich a local MP4/WAV file (Using STT)

If you just have an MP4 or Wav file locally on your machine, you can just enrich it. We will copy this file to public/media_files automatically so you can use the UI to browse the results.

For convenience, use the supplied sample mp4 file:

# STT Only
bin/processMedia -S -f public/media_files/grid-breakers.mp4

# STT & VR (Will take a lot longer)
bin/processMedia -S -V -f public/media_files/grid-breakers.mp4

Enrich from a URL pointing to a MP4/WAV file (Using STT)

If you have a MP4 or Wav at a URL or on YouTube you can enrich it as follows:

# STT & VR (Will take a lot longer)
bin/processMedia -S -f http://someurl.com/somefilename.mp4

# (Youtube) STT & VR (Will take a lot longer)
bin/processMedia -S -V -r 10000 -f https://www.youtube.com/watch?v=_aGCpUeIVZ4

Note: Remember the VR Rate can QUICKLY eat up your 250 images. So choose Wisely!!!

Enrich from a URL Feed:

If you have a remote URL that references an XML file in the 'schema/media' or 'mrss' format then you can enrich by pointing to that URL

bin/processMedia -V -x http://some.url.com/some_mrss.xml

Enrich a Media+Transcript file via an XML

Open the XML Template file (samples/episode_template.xml) and fill it out as noted. You MUST give it a GUID/Title/media:content and media:subTitle to make this work.

Save this file as a new name somewhere (like feeds):

bin/processMedia -V -x feeds/new_feed.xml

6. View results in UI

Point your browser to the URL specified when the server was started. For example:

http://localhost:6007/

Username and password are defined by the object users in app.js. The default username/password credentials are enrich/enrichit.

Note that the default credentials must NOT be removed. You can, however, add additional credentials.

Deploy the Application to IBM Cloud

After you have enriched your media files, you can deploy the application to IBM Cloud so that you can view the UI remotely.

Note: If you already have the application deployed, you will either need to delete it (take care not to also delete any assoicated services at the same time), or modify the manifest.yml to change the name of the application. The default name is watson-multimedia-analyzer.

  • Download and install the Cloud Foundry CLI tool.
  • Login to the Cloud Foundry service.
  • From the root directory of this project run the following command:
cf push
  • You should see a lot of activity as the application is deployed to IBM Cloud. At the end of the activity, the application should be 'Running'.
  • Access the application using the following url:
http:\\{BLUEMIX_APPLICATION_NAME}.mybluemix.net
  • When prompted for a username and password, use the credentials stored in app.js. The default username/password credentials are enrich/enrichit.

Note: If you enrich additional media files with Visual Recognition, you will need to re-deploy the application to IBM Cloud to view the new content.

Sample Output

Troubleshooting

  • ffmpeg reports error that "audio codec libopus is not available"

    Ensure that the audio codec libopus is included in the version of ffmpeg that you install. To check this, make sure it is listed using this command:

ffmpeg -encoders | grep opus
  • ffprobe reports error

    Ensure you are on at least version 3.3.1

  • Enrichment does not complete or reports errors

    Note that there are several IBM Cloud trial version limitations that you may run into if you attempt to enrich multiple OR large mp4 files.

    Watson Tone Analyzer - max of 2500 API calls.
    Solution - delete and create new service instance

    Watson Visual Recognition - max of 250 API calls per day.
    Solution - wait 24 hours to run again.

Links

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

watson-multimedia-analyzer's People

Contributors

dolph avatar imgbot[bot] avatar jamaya2001 avatar kant avatar ljbennett62 avatar markstur avatar rhagarty avatar scottdangelo avatar stevemart avatar swgraham avatar tomcli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

watson-multimedia-analyzer's Issues

Create blogs

Some ideas for blogs:

  • define episodes, segments, scenes and moments, and how they are generated
  • how to utilize samples/episode_template.xml
  • the UI. How to navigate, define fields and values...

Documentation: YouTube video URL has a bad parameter value

See the code segment in this section.
The YouTube video URL points to a video, v=_aGCpUeIVZ4.
This causes following error while trying to store the document in cloudant db.

_aGCpUeIVZ4 - failed to write to DB { Error: Only reserved document ids may start with underscore.
    at Request._callback (/Users/prafulll/code/personal/direkshanProjects/ibm-dw/watson-multimedia-analyzer/node_modules/cloudant-nano/lib/nano.js:248:15)
    at Request.self.callback (/Users/prafulll/code/personal/direkshanProjects/ibm-dw/watson-multimedia-analyzer/node_modules/request/request.js:186:22)
    at Request.emit (events.js:160:13)
    at Request.<anonymous> (/Users/prafulll/code/personal/direkshanProjects/ibm-dw/watson-multimedia-analyzer/node_modules/request/request.js:1163:10)
    at Request.emit (events.js:160:13)
    at IncomingMessage.<anonymous> (/Users/prafulll/code/personal/direkshanProjects/ibm-dw/watson-multimedia-analyzer/node_modules/request/request.js:1085:12)
    at Object.onceWrapper (events.js:255:19)
    at IncomingMessage.emit (events.js:165:20)
    at endReadableNT (_stream_readable.js:1101:12)
    at process._tickCallback (internal/process/next_tick.js:152:19)
  name: 'Error',
  error: 'illegal_docid',
  reason: 'Only reserved document ids may start with underscore.',
  scope: 'couch',
  statusCode: 400,
...

Bower packages breaks the rightside moment-viewer

Doing a straight bower install, the bower packages will break the moment viewer. Launch an instance, start the gui and click "Show moment" will throw this error:

angular.js:15018 TypeError: Failed to set the 'currentTime' property on 'HTMLMediaElement': The provided double value is non-finite. at Object.seekTime (videogular.js:405) at VideoController.js:66 at Scope.$broadcast (angular.js:19165) at MomentController.vm.gotoMoment (watson.moment.directive.js:68)

This is confirmed on win 10 and ubuntu 18.04 + 16.04.

Application won't start after issuing npm start. It produces error

Hi,

I followed all the steps and configured services. Upon trying to run "npm start" it gives following errors. Please help

C:\My Projects\Watson\Sand\watson-multimedia-analyzer>npm start

[email protected] start C:\My Projects\Watson\Sand\watson-multimedia-analyzer
node app.js | node_modules/.bin/pino

'node_modules' is not recognized as an internal or external command,
operable program or batch file.

npm ERR! Windows_NT 6.1.7601
npm ERR! argv "c:\Program Files\nodejs\node.exe" "c:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" "start"
npm ERR! node v4.5.0
npm ERR! npm v2.15.9
npm ERR! code ELIFECYCLE
npm ERR! [email protected] start: node app.js | node_modules/.bin/pino
npm ERR! Exit status 255
npm ERR!
npm ERR! Failed at the [email protected] start script 'node app.js | node_modules/.bin/pino '.
npm ERR! This is most likely a problem with the watson-multimedia-analyzer package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node app.js | node_modules/.bin/pino
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs watson-multimedia-analyzer
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!
npm ERR! npm owner ls watson-multimedia-analyzer
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR! C:\My Projects\Watson\Sand\watson-multimedia-analyzer\npm-debug.log

Thanks,
Vadiraj

Using the Deploy to Bluemix Button, some files aren't running on strict mode.

Some files in the lib/enricher didn't enable strict mode using the Deploy to Bluemix Button and create the following errors.

2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR /home/vcap/app/lib/enricher/index.js:32
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR   let text = null;
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR   ^^^
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR SyntaxError: Block-scoped declarations (let, const, function, class) not yet supported outside strict mode
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at exports.runInThisContext (vm.js:53:16)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Module._compile (module.js:373:25)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Object.Module._extensions..js (module.js:416:10)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Module.load (module.js:343:32)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Function.Module._load (module.js:300:12)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Module.require (module.js:353:17)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at require (internal/module.js:12:17)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Object.<anonymous> (/home/vcap/app/app.js:37:16)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Module._compile (module.js:409:26)
2017-09-18T10:03:00.37-0700 [APP/PROC/WEB/0] ERR     at Object.Module._extensions..js (module.js:416:10)

Determine VR rate for documentation

  • processMedia command help lists default VR rate at 10 seconds.
  • processMedia command help states that the '-r' option takes msecs.

Verify and update help accordingly.

New authentication token-based Identity and Access Management (IAM)

The service has migrated to token-based Identity and Access Management (IAM) authentication for all locations. All IBM Cloud services now use IAM authentication. The Speech to Text service migrated in each location on the following dates:

Dallas (us-south): October 30, 2018
Frankfurt (eu-de): October 30, 2018
Washington, DC (us-east): June 12, 2018
Sydney (au-syd): May 15, 2018

All the services credentials are deprecated with username and password requisition.

https://console.bluemix.net/docs/services/speech-to-text/release-notes.html#October2018b

Could not enrich the video in windows

Hello,

Everything seems to be OK except the video enrichment.
The command "bin/processMedia -S -V -f public/media_files/grid-breakers.mp4" does not work on windows.
Could you pls help me ?

Thanks in advance and BR,
Yassine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.