nikdanilov / whisper-obsidian-plugin Goto Github PK

Speech-to-text in Obsidian using OpenAI Whisper

License: MIT License

TypeScript 87.93% JavaScript 9.12% CSS 2.70% Shell 0.25%

obsidian whisper openai-whisper speech-to-text stt transcribe voice

whisper-obsidian-plugin's Introduction

Speech-to-text in Obsidian using OpenAI Whisper 🗣️📝

Obsidian Whisper is a plugin that effortlessly turns your speech into written notes. Just speak your mind, and let Whisper from OpenAI do the rest!

🚀 Getting Started

This plugin can be installed from "Community Plugins" inside Obsidian.
For this plugin to work, you will need to provide your OpenAI API key. See the Settings section of this README file for more information.

🎯 How to Use

Access Recording Controls

Click on the ribbon button to open the recording controls interface.

Record Audio

Use the "Start" button to begin recording. You can pause and resume the recording using the "Pause/Resume" button. Click the "Stop" button once you're done. After stopping the recording, the plugin will automatically transcribe the audio and create a new note with the transcribed content and linked audio file in the specified folder.

You can quickly start or stop recording using the Alt + Q shortcut.

Upload Existing Audio File

You can also transcribe an existing audio file:

Open the command palette with Ctrl/Cmd + P.
Search for "Upload Audio File" and select it.
A file dialog will appear. Choose the audio file you want to transcribe.
The plugin will transcribe the selected file and create a new note with the content and linked audio file in the specified folder.

Command Palette for Quick Actions

Both "Start/Stop recording" and "Upload Audio File" actions can also be accessed quickly through the command palette.

For further explanation of using this plugin, check out the article "Speech-to-text in Obsidian using OpenAI Whisper Service" by TfT Hacker

⚙️ Settings

API Key: Input your OpenAI API key to unlock the advanced transcription capabilities of the Whisper API. You can obtain a key from OpenAI at this link. If you are not familiar with the concept of an API key, you can learn more about this at this link.
API URL: Specify the endpoint that will be used to make requests to the Whisper API. This should not be changed unless you have a specific reason to use a different endpoint.
Model: Choose the machine learning model to use for generating text transcriptions. This should not be changed unless you have a specific reason to use a different model.
Language: Specify the language of the message being whispered. For a list of languages and codes, consult this link.
Save recording: Toggle this option to save the audio file after sending it to the Whisper API. When enabled, you can specify the path in the vault where the audio files should be saved.
Recordings folder: Specify the path in the vault where to save the audio files. Example: folder/audio. This option is only available if "Save recording" is enabled.
Save transcription: Toggle this option to create a new file for each recording, or leave it off to add transcriptions at your cursor. When enabled, you can specify the path in the vault where the transcriptions should be saved.
Transcriptions folder: Specify the path in the vault where to save the transcription files. Example: folder/note. This option is only available if "Save transcription" is enabled.

🤝 Contributing

We welcome and appreciate contributions, issue reports, and feature requests from the community! Feel free to visit the Issues page to share your thoughts and suggestions.

💬 Whisper API

For additional information, including limitations and pricing related to using the Whisper API, check out the OpenAI Whisper FAQ
For a high-level overview of the Whisper API, check out this information from OpenAI

⚒️ Manual Installation

If you want to install this plugin manually, use the following steps:

Download manifest.json, main.js, styles.css from the GitHub repository into the plugins/whisper folder within your Obsidian vault.
Click on Reload plugins button inside Settings > Community plugins.
Locate the "Whisper" plugin and enable it.
In the plugin settings include your OpenAI API key.

🤩 Say Thank You

Are you finding value in this plugin? Great! You can fuel my coding sessions and share your appreciation by buying me a coffee here.

Help others discover the magic of the Obsidian Whisper Plugin! I'd be thrilled if you could share your experiences on Twitter, Reddit, or your preferred social media platform!

You can find me on Twitter @nikdanilov_.

whisper-obsidian-plugin's People

Contributors

Stargazers

Watchers

whisper-obsidian-plugin's Issues

Enhencement request : have access to the whisper parameters

in the option expose the whispers api parameters

URI handler for the recording functionality

This is an awesome plugin. I'd like to create an iOS shortcut to open the mobile app and go directly to the recording interface. I think the easiest way to do this is to create a URI for the plugin, something like: obsidian://whisper that triggers the opening of the recording pane directly.

I think this would be done using registerObsidianProtocolHandler, of which there are some examples here.

Thanks again for the cool plugin.

Feature request: Toggle start/stop recording by assigned hardware button

Hi,
is there a way to assign the start/stop command to a universal hardware button? I think in android it's possible to customize such button.
That would be great for convenient use!!! :)
thanks for that great plugin and best regards,
Silias

Extend to chrome extension and android keyboard

It has been over that year since Whisper was released and it seems crazy that there hasn't been the development of any Chrome extension plugin and any Android keyboard that works through the API key as well as this obsidian plug in.
It would be another project but it will be great if you could do the same as you did here with the Obsidian plugin with a new Chrome extension or/and Android keyboard.

Error parsing audio: Request failed with status code 429 ？

have this issue

Error 400 with Whisper plugin

After the system is configured, the recording tries to get connected into openAI and then I receive error message error code 400.
Please your support. Thank you.

Feature Request: Retain Audio File

Would it be possible to add a toggle to keep the audio file, maybe linked in the document?

Transcript on the same note

Is it possible to make it so that it transcribes directly on the same note where I am already writing? Without creating a new note somewhere else with the transcript

Feature Request: Offer Options to Embed or Link to Audio File

First, I am so excited to see that you added the option to save the audio file!!! Thank you! 🎉

I love that I have the option to insert the transcription at my cursor! I would really like to have the option to add an embed link to the saved audio file above (though I suppose some people might prefer below 🤷🏽‍♀️) the transcription.

For me, the ideal result would look something like this:

![[Audio/2023-06-18T16-13-51-312Z.webm|2023-06-18T16-13-51-312Z]]
Here is my fabulous audio transcription.

Problem with microphone remains turned on

Hi, I'm writing about the problem with your Whisper plugin. When I use it for the first time, then after I used it, the microphone of my headphones remains on. Is there any way to fix it? If you need any more info, please feel free to let me know. I'm looking forward to hearing from you!

Recording in Canvas doesn't respect "Create new file after recording" setting

Scenario

I have the "create new file after recording" setting turned off.
I am creating a diagram using Obsidian Canvas.
I have a text box in the diagram open for editing, with the cursor blinking, the same as it does inside a normal Obsidian note.
I activate the Whisper plugin and I speak some stuff.

Expected Behavior

The Whisper ASR result is inserted into the Canvas text box at my cursor.

Actual, Unwanted Behavior

A new time-stamped file opens with the ASR result, as if I had the "create new file after recording" setting turned on.

I am pretty busy with work and school, but I might be able to contribute a fix myself. (I don't know how to add the bug label or else I would.)

Doesn't work on iOS (mime type not supported)

When trying to run on iOS, recording can't start because of the error "mime type not supported". Appears to be because webm isn't available on iOS. See also https://stackoverflow.com/questions/67874713/mediarecorder-ios-14-6-mimetype-not-supported

Creates a link to a nonexistent file

In iOS version 1.5.5, if save transcript is on and save recording is off, then it creates a new note that starts with a link to a nonexistent MP4 file.

E.g

![[2024-03-08T03-55-00-475Z.mp4]]
This is a little bit of a test.

Where the file referenced does not exist. It probably simply should not include the link

cancel recording

is it possible to cancel the recording so that it is not sent to openai and I do not get charged?
atm I only see start/stop

Transcribe from file?

This is a really fantastic plugin! Thanks for this. Would you have any idea how I can transcribe longer recordings (<25 MB of course) from a saved file such as an mp4 file?

Can this be adjusted to use whisper.cpp for offline transcription?

I'm looking for something that works offline / local-only and whisper.cpp seems great for that.
I can probably work out some basic piping to clipboard, but perhaps there's an easier way.

Feature Request: Optional notification to see when recording has started

On my Android tablet (Onyx Boox) it is hard to see when the device is recording, to know this it would be useful to have a quick (optional) pop-up notification to see whether the plugin has started recording.

I wrote a one-line addition to this plugin to do this for myself, but I do not trust myself to really contribute it here.

How to use it on mobile app?

I can't find a way to activate it on mobile?

Enhencement request : keyboard shortvut for immediate non important reccording and not to be saved after

Have a different keyboard shortcut to reccord without saving audio and adding text to the cursor even if saving audio and as a note is enable in the settings ( IE for short non important text that should be inserted right here right now )

Feature Request: Language Switching Feature in Plugin

Currently, there is no easy way to switch languages while using the plugin. It would be highly convenient to have a feature that allows users to switch languages either through a hotkey or by clicking on the language displayed in the bottom-right panel of the plugin name (Whisper idle). This would greatly enhance the user experience and streamline the language-switching process. Thank you for considering this suggestion.

FR: Allow use of local whisper instance

I am using the local version of whisper: https://github.com/ahmetoner/whisper-asr-webservice/

Can or is there be a setting to use the local endpoint? I might have missed it!

Defect: An error occurs when attempting to use the plugin.

Download and configure the plugin (add a key and specify the folder)
Click on the plugin icon in the left menu of the main Obsidian window
Press the record button and start recording for some amount of time, 4-5 seconds will suffice
Press the stop button

Expected result: A file with the result appears in the folder specified in the first step
Actual result: An error message appears, "Error sending audio data: Request failed with status code 429"

README.md installation instructions out-of-date

In Installation, first instruction is to download three files into '''obsidian-whisper-plugin''' folder
However during Install, a '''whisper''' folder is auto-created instead, with those three files are copied.

Hotkeys for Recorder Controls

Is it possible to assign hotkeys to the Recorder Controls? If not can that option be added.

FR: Support command

Thank you for the great plugin!
I don't know whoever like me that always close the ribbon.
So support command for the plugin please.

File Size Limited to 20 MB?

Hello, thank you for this very useful tool. However, I am wondering what setup would it require to send large files?

I think the current limit is 20 MB? Every time I send files exceeding that limit it will send a 301 status code?

Error sending audio data: Parent folder doesn't exist

Hi,

I'm very new to all of this and can't seem to figure out how to set up my iCloud directory path? I've included three screen shots from my iPhone. The first one shows where the folder is with the path that i copied, the second the settings page where I saved the path, and the third is the error that I receive. Could someone clue me in on what I should do?

I'd really like to get this working.

Sincerely,
Anish

iPad OS?

Would love to use this on my iPad, but sending audio file fails with error code 400.

Error Code 413

After attempting to upload an audio file, I get error code 413. "error sending audio data: request failed with status code 413"

Add prompting to the settings

The whisper API allows you to provide some text to "prompt" the voice recognition. Would it be possible to add this to the plugin. Details on prompting can be found at prompting. This gives some control over punctuation and specific terminology that you use often. I think it makes sense to use the same prompt for all submitted audio.

"Sending/parsing" popups on mobile cover significant portion of screen

In the desktop version of Obsidian, there are notification pop-ups which give feedback that the Whisper plugin is operating correctly.

On desktop, these do not get in the way. They are just small bubbles in the top right of Obsidian. However, on mobile, the large, bubble-shaped notifications take up about 1/3 of the visible typing area above the keyboard. This means that I cannot see any the text that I am editing, or that the plugin has just inserted, if it falls in that area. Since I use the Whisper plugin every 10 or 15 seconds to create more text, this is a significant barrier to my use of the plugin on mobile.

Would it be possible to add one or more of the following options to the plugin?

use a more minimal format for these pop-ups, e.g. condensing the three popups to two or one
Removing the second popup entirely, since "an audio file with such and such computer-generated name is being sent" is probably not useful to users (though definitely useful for plugin debugging!)
remove the first pop-up when the second one fires, and remove the second one when the third one fires, so there's only one on screen at once
entirely disable the pop-ups

Nik D, thank you for all this work on this plugin. As someone suffering from RSI that limits my ability to type, Whisper is the secret weapon which will allow me to turn in my master's thesis on time. Your plugin is the easiest interface to Whisper that I have yet tried, and has replaced almost entirely my use of the extremely crappy iPhone dictation, which has been a huge struggle for academic and professional use the last three years.

it says: failed to save record: error code 429

No 'Access-Control-Allow-Origin' header is present on the requested resource.

Hi!

I have my own openai compatible endpoint that works fine in all my programs
i.e. in python:

import openai
client = openai.OpenAI(
    api_key="my-key",
    base_url="http://***.111:80/v1"
)
with open("./tracks/test.ogg", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        file = audio_file,
        model = "whisper-1",
        response_format="srt",
    )
print(transcript)

So endpoint works fine.

But when I try to put it as API_URL I keep getting network error message in logs

Access to XMLHttpRequest at 'http://***111/v1/audio/transcriptions' from origin 'app://obsidian.md' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Thing that confuses me is the error message contain no port info. Is it normal?

I've tried putting all cersions:

http://***.111:80/v1/audio/transcriptions
http://***.111:80/v1
http://***.111:80
Non of them proved to be working

I saw in other issues (i.e. #2) that changing endpoint works well.
What I'm doing wrong?

Whisper store the audio data?

I used Obsidian “Whisper: Upload audio file”; Whisper store the audio data?

FR: Support the other language

Please support the other language!

GPT Correction

Hi,

Would it be possible to add the ability to correct text with ChatGPT-4, for example, to avoid misspelling certain technical terms that might be misspelled and make no sense in the context, or possibly with punctuation? I've noticed that when I say aloud "open parentheses" or "open quotation marks," the model often struggles to know whether to write what I say or to incorporate it as punctuation. This should normally be easy for GPT-4. Integrating the ability to use a review by GPT with an appropriate and possibly customizable prompt would be a plus. In general, adding this feature would also allow for translation or even asking for a response to what one has said, etc. The use cases are endless and useful.

Thanks for the amazing work,

Jean

Fix Github Tags [MINOR]

Among your tags, you have:
speach-to-text. The correct is speech-to-text.
Other useful keywords:

STT
transcription / transcribe
voice / audio

Can we have an option to leave the text directly inside the file we are in rather than outputting to a different destination file?

Is there a way whisper can auto-detect the language?

I am on a use case where two languages could be used during recording and whisper defaults to the language that is set in the settings.

Transcription hallucinations

I guess this is more of an issue than Whisper itself rather than this plug-in, but I thought I'd just report it.

Many thanks to the team who created this plug-in, But I do find sometimes the transcription comes back with hallucinations; in other words, words that I clearly didn't say - not just mishearing. It's fairly common for my transcriptions to come back with "thank you" at the end and I've checked over and over again on the recordings and I never actually said thank you. It's just added it.

I've had several occasions where it's added references to other companies, specifically transcription companies (I got one mentioning Otter.ai for example a couple of days ago). I've got to imagine this is Whisper getting mixed up or something in a cache somewhere, but has anybody else seen this specific to the plug-in? Is there anything we can do?

Again many thanks for the for this work. It's Incredibly useful.

FR: Shortcut toggle between cursor and new file

FR: Add text at cursor location in active leaf.

It would be nice to have text added to the cursor location in active leaf instead of creating a new file everytime.

Audio not saved if any downstream error

Thanks for making this plugin, I like the recorder feature with end-to-end transcription and auto note generation. Works great when it works, but sometimes when internet is poor, it has connection issues (nothing to do with the plugin, mind you). The only change I would make is to save the audio as soon as it is created, because at the moment, if there is a network issue and transcription doesn’t happen, the audio is not saved and goes to waste. It would be much better if the audio remained to allow further attempts at transcription.

Release Zip does not contain `main.js`

Apologies if this is not in a great format.

Environment

OS Windows 10, latest service pack
Obsidian Version v1.1.16
Installer Version v0.14.6

Issue

After installation, I cannot enable the plugin in Obsidian.

Process

Following the instructions from the README.md file, I did the following:

Followed the link to the latest release of the repository, and downloaded the zip file attached (as I had read ahead slightly to "...downloaded zip file...").
Extracted the zip into (vault)/.obsidian/plugins directory
Restarted Obsidian
Located the plugin in the Installed Plugins list

However, I could not enable it. I received only a very generic "Plugin could not be enabled" error from Obsidian, and was in the process of locating any internal log files. It occurred to me to check the plugin folder for one (fruitless, of course). This led me to check the other plugin folders and notice each had a main.js and manifest.json basically. Checking the whisper-obsidian-plugin-0.1.0 directory however, only main.*ts* was present.

Quick Fix

Returning to the Releases page, one of the additional listed files was main.js, which when placed in the plugin folder (and restarting Obsidian again) solved the problem and allowed me to enable the plugin.

Solution

I considered opening a pull request to suggest removing main.js from the ignore list, the "poor practice" simply being a hazard of simplicity in releases. However, there are several approaches that could all solve the slight instruction issue, so it was best left to your preference. I'll share a few options I briefly considered in case they are useful.

Additional Instruction

Add additional clarification to Steps 1 and 2 in README.md.

...the latest release of Whisper plugin (both the source zip and main.js) from the...

...within your Obsidian vault, and then move the main.js file you downloaded inside of it.

Update Release method

I'll be honest that I'm not overly familiar with Release structures (more experience in monorepos, etc) but it seems as though it may be possible to compress main.js along with the manifest and styles and add that to the release list as well. This would allow a simpler instruction set with only a minor clarification as to which zip file to download.

Remove `main.js` from the `.gitignore` file

A bit of a compromise on code practice, but by far the least effort.