GithubHelp home page GithubHelp logo

justinmealey / whisper-node Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ariym/whisper-node

0.0 1.0 0.0 56 KB

Home Page: https://npmjs.com/whisper-node

License: MIT License

JavaScript 25.29% TypeScript 74.71%

whisper-node's Introduction

whisper-node

npm downloads npm downloads

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

  • Output transcripts to JSON (also .txt .srt .vtt)
  • Optimized for CPU (Including Apple Silicon ARM)
  • Timestamp precision to single word

Installation

  1. Add dependency to project
npm install whisper-node
  1. Download whisper model of choice [OPTIONAL]
npx whisper-node download

Requirement for Windows: Install the make command from here.

Usage

import whisper from 'whisper-node';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Full Options List

import whisper from 'whisper-node';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Roadmap

  • Support projects not using Typescript
  • Allow custom directory for storing models
  • Config files as alternative to model download cli
  • Remove path, shelljs and prompt-sync package for browser, react-native expo, and webassembly compatibility
  • fluent-ffmpeg to automatically convert to 16Hz .wav files as well as support separating audio from video
  • Pyanote diarization for speaker names
  • Implement WhisperX as optional alternative model for diarization and higher precision timestamps (as alternative to C++ version)
  • Add option for viewing detected langauge as described in Issue 16
  • Include typescript typescript types in d.ts file
  • Add support for language option
  • Add support for transcribing audio streams as already implemented in whisper.cpp

Modifying whisper-node

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Acknowledgements

whisper-node's People

Contributors

ariym avatar justinmealey avatar kaivinc avatar casperwarnich avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.