deepgram / deepgram-python-captions Goto Github PK

This package is the Python implementation of Deepgram's WebVTT and SRT formatting. Given a transcription, this package can return a valid string to store as WebVTT or SRT caption files.

License: MIT License

Python 100.00%

deepgram-python-captions's People

Contributors

Stargazers

Watchers

Forkers

nikitoshina

deepgram-python-captions's Issues

Converter does not accept a response unless it is a dictionary

What is the current behavior?

Now when getting a response from the Python SDK, the captions converter will not work. The response type that comes back from the SDK is a PrerecordedResponse type. This needs to be converted to a dictionary for the captions to work. This can be done by the user with this:

transcription = DeepgramConverter(json.loads(response.to_json()))

However, we need to fix the captions package to be able to handle that type and convert it to a dictionary.

Steps to reproduce

In the following code, change json.loads(response.to_json() to response or response.to_json() (remove the json.loads)

import json
from deepgram import (
    DeepgramClient,
    PrerecordedOptions,
)
from deepgram_captions import DeepgramConverter, srt

AUDIO_URL = {
    "url": "https://dpgr.am/spacewalk.wav"
}

deepgram = DeepgramClient("")
options = PrerecordedOptions(
    model="nova",
    smart_format=True,
    utterances=True
)
response = deepgram.listen.prerecorded.v("1").transcribe_url(AUDIO_URL, options)
if isinstance(json.loads(response.to_json()), dict):
    print("Response is a dictionary.")
else:
    print("Response is not a dictionary.")

transcription = DeepgramConverter(json.loads(response.to_json()))

captions = srt(transcription)
print(captions)

Expected behavior

Expect to not have to add json.loads

OR MAYBE JUST UPDATE THE README TO INFORM USERS OF HOW TO DO THIS

See discord conversation where this was originally raised: https://discord.com/channels/1108042150941294664/1204723190044295238/1204723190044295238

deepgram.extra.to_SRT(response, line_length=1)

i think it wold be a cool if you can implement (line_length) to yout code.
i use this official code deepgram.extra.to_SRT(response, line_length=1) , but i cant save to srt file

Fails on empty transcription with IndexError: list index out of range

What's happening that seems wrong?

It raises an IndexError.

Steps to reproduce

import deepgram_captions

mm = {
    "metadata": {
        "transaction_key": "deprecated",
        "request_id": "1ddf6fb2-703a-4d22-b4be-2d1c2eac1c02",
        "sha256": "c3595443b4c3b0950919e613f065983c5d5d8538ee2565ec985a990e8eef8d53",
        "created": "2024-07-03T10:15:54.608Z",
        "duration": 5.02,
        "channels": 1,
        "models": ["30089e05-99d1-4376-b32e-c263170674af"],
        "model_info": {
            "30089e05-99d1-4376-b32e-c263170674af": {
                "name": "2-general-nova",
                "version": "2024-01-09.29447",
                "arch": "nova-2",
            }
        },
    },
    "results": {
        "channels": [
            {
                "alternatives": [
                    {
                        "transcript": "",
                        "confidence": 0.0,
                        "words": [],
                        "paragraphs": {"transcript": "\n", "paragraphs": []},
                    }
                ],
                "detected_language": "en",
                "language_confidence": 0.15507619,
            }
        ]
    },
}

transcription = deepgram_captions.DeepgramConverter(mm)
print(transcription.response)
captions_text = deepgram_captions.webvtt(transcription)

Traceback (most recent call last):
  File "/home/mymymy.py", line 40, in <module>
    captions_text = deepgram_captions.webvtt(transcription)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.venv/lib/python3.12/site-packages/deepgram_captions/webvtt.py", line 25, in webvtt
    speaker_labels = "speaker" in lines[0][0]
                                  ~~~~~~~~^^^
IndexError: list index out of range

To make it faster to diagnose the root problem. Tell us how can we reproduce the bug.

Expected behavior

What would you expect to happen when following the steps above?

To return empty captions

Please tell us about your environment

We want to make sure the problem isn't specific to your operating system or programming language.

Operating System/Version: Ubuntu 22.04
Language: Python 3.12

Add configurable max pause to subtitles

Proposed changes

It would be good to have the ability to split subtitles based on the amount of time between words.

Context

Today, if there's a big gap in time between two words, the subtitle just lingers in the screen, which can be distracting. Instead, it would be better for the subtitle to finish where the gap starts.

Possible Implementation

Should be similar to how you implement the line_length-based splitting, just look at the most recent word end time and compare it to the current start time.

Subtitles Out Of Order

What is the current behavior?

On some transcriptions, the produced SRT file has subtitles out of order.

Steps to reproduce

I have a small ogg file that I transcribed using nova-2, and then passed to the captions library, which generated this error. For convenience I included the json output (it's in japanese), the final srt, and the sound file (to bypass github's upload limitations, I used a .txt extension, but it's originally a .ogg file).
IMAX PR Video.json
sub.txt
sound.txt

Expected behavior

The produced subtitle file should have in-order subtitles.

Please tell us about your environment

We want to make sure the problem isn't specific to your operating system or programming language.

Operating System/Version: Ubuntu 24.04
Language: Python
Browser: Chrome

Other information

Looking at the JSON output, it looks like the words are just returned out of order, but the converter assumes they're in order. Looking at the subtitles themselves, it actually almost looks like deepgram translated multiple times in some weird way. My initial thought was to just sort the words by start time, but when I translate the results of that, there's a bit of overlap still. It's almost as if deepgram is returning multiple alternative translations at once, or maybe different streams. It's almost as if diarization is happening, even though I didn't use it through the API.

Here's my code:

def main():
    try:
        # STEP 1 Create a Deepgram client using the API key
	deepgram = DeepgramClient(API_KEY)

	with open(AUDIO_FILE, "rb") as file:
	    buffer_data = file.read()

	payload: FileSource = {
            "buffer": buffer_data,
	}

        #STEP 2: Configure Deepgram options for audio analysis
	options = PrerecordedOptions(
            model="nova-2",
            smart_format=True,
            language=LANG,
	)

        print("Transcribing")
	# STEP 3: Call the transcribe_file method with the text payload and options
	response = deepgram.listen.prerecorded.v("1").transcribe_file(payload, options)

	# Step 4: transcribe to SRT.
	transcription = DeepgramConverter(response)
        captions = srt(transcription)
        with open(f"{basename}.{LANG}.srt", 'w') as f:
	  print(captions, file=f)

deepgram / deepgram-python-captions Goto Github PK

deepgram-python-captions's People

Contributors

Stargazers

Watchers

Forkers

deepgram-python-captions's Issues

What is the current behavior?

Steps to reproduce

Expected behavior

Steps to reproduce

Expected behavior

Please tell us about your environment

Proposed changes

Context

Possible Implementation

What is the current behavior?

Steps to reproduce

Expected behavior

Please tell us about your environment

Other information

Recommend Projects

Recommend Topics

Recommend Org

Jobs