GithubHelp home page GithubHelp logo

elevenlabs-docs's Introduction

Mintlify Starter Kit

Used to create our text to speech API documentation.

Click on Use this template to copy the Mintlify starter kit. The starter kit contains examples including

  • Guide pages
  • Navigation
  • Customizations
  • API Reference pages
  • Use of popular components

👩‍💻 Development

Install the Mintlify CLI to preview the documentation changes locally. To install, use the following command

npm install -g [email protected]

Run the following command at the root of your documentation (where mint.json is)

mintlify dev

😎 Publishing Changes

Changes will be deployed to production automatically after pushing to the default branch.

You can also preview changes using PRs, which generates a preview link of the docs.

Troubleshooting

  • Mintlify dev isn't running - Run mintlify install it'll re-install dependencies.
  • Page loads as a 404 - Make sure you are running in a folder with mint.json

elevenlabs-docs's People

Contributors

am-holt avatar armandobelardo avatar cahyosubroto avatar codebuddy-developer avatar crypblizz8 avatar devman0129 avatar dlovric2 avatar drummerjolev avatar dunky11 avatar edwarderelt avatar elevenlabsmark avatar erayalakese avatar felixwaweru avatar j-elevenlabs avatar jitendra2603 avatar lagercat avatar lharries avatar limo1996 avatar lookquad avatar marcelthomas5 avatar mateusz-kopec avatar maxilevi avatar nechita avatar rayan-saleh avatar samsklar11 avatar talexgeorge avatar vktrbr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elevenlabs-docs's Issues

Prompting emotion without having the prompt read aloud?

Path: /speech-synthesis/prompting

The AI generates text well when prompted but the prompting text is included in the output. How does one prompt a particular emotion without having "he shouted angrily" in the output? Is there any kind of hidden markup that can be used?

Issue on docs

Path: /speech-synthesis/voice-settings

Can you please input vpoice speed option. The voice is too fast.

Issue on docs

Path: /api-reference/history-download

it's not downloading or do i have to do something with the response.text?

JSON returned by https://api.elevenlabs.io/v1/voices label/description in voices sometimes has an extra trailing space

   {
      "voice_id": "flq6f7yk4E4fJM5XTYuZ",
      "name": "Michael",
.....
      "labels": {
        "accent": "american",
        "age": "old",
        "gender": "male",
        "use case": "audiobook",
        "description ": "orotund"
      },
......
      ]
    },

has extra space in "description "
"description ": "orotund"



daniel voice doesn't have extra space

   "labels": {
        "accent": "british",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "news presenter"
      },

Issue on docs

Path: /api-reference/text-to-speech-websockets
// 5. Handle server responses

socket.onmessage = function (event) {
    const response = JSON.parse(event.data);

    if (response.audio) {
        // handle the audio data (e.g., play it)
    }

    if (response.is_final) {
        // the generation is complete
    }

    if (response.normalized_alignment) {
        // use the alignment info if needed
    }
};

It is supposed to be response.isFinal and response.normalizedAlignment.

NormalizedAlignment is sometimes null, sometimes populated?

I've noticed on some of my JSON responses that normalizedAlignment is null?

Documentation Reference - Streaming Output Audio

{
  "audio": "SKcYUakhZRVxEUVItVti7....1e0=",
  "isFinal": null,
  "normalizedAlignment": null
}

Why is this? And in other cases it's non-null, is there information I'm missing about why normalizedAlignment is sometimes null, sometimes not?

{"audio":"//uQxAAACm07F4GEezKHtKAo9hm4CaSjlTfc+cEIru7oiJXd3T/4E3d3d3cyREL/RET93EAAAHcHYx1/5jGN+AAALjvmMAAAWQBj/jGMYxj/+7nxC/dwMH3wQ...XuSdB32nkrBZSPgcVQ+ZSQdgUBCkJz6SKnJ5VNC6GoLCk9oenLL5OKp+H4ofQTOzhcLEzxUViwga+fjdKjSzRuGRizC2pKGdPpTxyqsZQROm1KLrvVb6h5DCm1U5HKNLQY2lkjYusxFTU4Q3//15766m3aBCNmUG+krOuy4UfaIP7tfySRG+w==",
  "isFinal":null,
  "normalizedAlignment":{
    "chars":[" ","...", ...," "],
    "charStartTimesMs":[0,46,81,139,186,209,267,313,348,430,522,580,627,673,731,766,813,836,894,964,1010,1068,1138,1254,1370,1544,1602,1637,1660,1707,1741,1776,1811,1834,1869,1916,1997,2113,2194,2218],
    "charDurationsMs":[46,35,58,47,23,58,46,35,82,92,58,47,46,58,35,47,23,58,70,46,58,70,116,116,174,58,35,23,47,34,35,35,23,35,47,81,116,81,24,243]
  }
}

Issue on docs

Path: /api-reference/voices-get

Why is this the way to get an id for a voice you've created? Please just show it in the interface somewhere.

Issue on docs

Path: /api-reference/text-to-speech

How can we define the language of the text? because I give the text in French but I always get the pronunciation in English

Clarity in docs on input streaming parameters

Path: /api-reference/text-to-speech-websockets

When you are sending input updates, is the text field supposed to be the delta since the previous update or the complete text?

eg.
1 "Hello "
2 "world, "
3 "how are you? "

or
1 "Hello "
2 "Hello world, "
3 "Hello world, how are you? "

Similarly in the server response, are you returning partial audio with each message eg. 0s-0.5s, 0.5s-1s, 1s-1.5s or are you returning full audio each time up to the generated point eg. 0s-0.5s, 0s-1s, 0s-1.5s?

emphasis

Is there any kind emphasis control? what i need to do to correct mistake in emphasis of the word?

The project composition page is no

Path: /projects/overview

Hi,
Until a few weeks ago the project composition page was accessible to screen readers; I had even complimented it on discord.
Today, however, upon returning to the page, I notice that things are back to the way they were. The buttons are unlabeled again and the button to add chapters no longer reads.
I would also suggest adding keyboard shortcuts so that composing the project is faster and more efficient:
alt+d for the divider
alt+p for play selected block
alt+v for voice selection
alt+s for voice settings
alt+o for enable/disable continuous generation
alt+r for regenerating selected block
alt+1-6 for headings
alt+n for adding new chapter
alt+c convert entire project.
These are my suggestions.
Germano

Issue: Project | Versions Ordered Listing

Path: /projects/overview/

There appears to be an issue the order within Projects, versions. Upon conversion, the order is out of order by generation (its not chronological).

Fields Shown:
Versions
Title

Play
Download

The issue is the order shown is 47, 33, 25, 44, 26 (not as rendered by time)

Upon pressing play each shows the same (most recent) created timestamp rather than the actual time of generation.

Download - appears to also be malfunctioning as no downloads render from this pop up screen but are available after clicking play at the bottom. (no download renders from chapter, convert, download, post chapter conversion)

Please also update documentation to discuss this screen/versions within chapters/projects

image

Issue on docs

Remove these from the online docs, as they come up in the search results please

Path: /voicelab/scripts/news-article

openapi.json not matching endpoint for get /v1/voices/:voice_id

I'm auto generating a JS client sdk based on the openapi.json, using https://www.zodios.org/ and I got the follow error:

Invalid response from endpoint 'get /v1/voices/:voice_id'
status: 200 OK
cause:
[
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "null",
    "path": [
      "fine_tuning",
      "model_id"
    ],
    "message": "Expected string, received null"
  },
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "null",
    "path": [
      "fine_tuning",
      "language"
    ],
    "message": "Expected string, received null"
  },
  {
    "code": "invalid_type",
    "expected": "array",
    "received": "null",
    "path": [
      "fine_tuning",
      "verification_attempts"
    ],
    "message": "Expected array, received null"
  },
  {
    "code": "invalid_type",
    "expected": "array",
    "received": "null",
    "path": [
      "fine_tuning",
      "slice_ids"
    ],
    "message": "Expected array, received null"
  },
  {
    "code": "invalid_type",
    "expected": "object",
    "received": "null",
    "path": [
      "fine_tuning",
      "manual_verification"
    ],
    "message": "Expected object, received null"
  },
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "null",
    "path": [
      "preview_url"
    ],
    "message": "Expected string, received null"
  },
  {
    "code": "invalid_type",
    "expected": "object",
    "received": "null",
    "path": [
      "settings"
    ],
    "message": "Expected object, received null"
  },
  {
    "code": "invalid_type",
    "expected": "object",
    "received": "null",
    "path": [
      "sharing"
    ],
    "message": "Expected object, received null"
  }
]
received:
{
  "voice_id": "9YXpMb9pAsnObE1AEL6A",
  "name": "My test voice",
  "samples": [
    {
      "sample_id": "GCCGpI1KQkfdYbXQFR1A",
      "file_name": "1689357774442.mp3",
      "mime_type": "audio/mpeg",
      "size_bytes": 104557,
      "hash": "6f0cd1654547c301d14901eb047dbe10"
    },
    {
      "sample_id": "MsWPzZaq87g9jAabeUcB",
      "file_name": "1689357786420.mp3",
      "mime_type": "audio/mpeg",
      "size_bytes": 109873,
      "hash": "6fccac4015d75ee5d9a106918214aa56"
    },
    {
      "sample_id": "cAFaDZFm7HYPQvI0y94T",
      "file_name": "1689357766138.mp3",
      "mime_type": "audio/mpeg",
      "size_bytes": 100102,
      "hash": "d99ce9dbdb5ad9d3d62f1cb82f376e23"
    }
  ],
  "category": "cloned",
  "fine_tuning": {
    "model_id": null,
    "language": null,
    "is_allowed_to_fine_tune": false,
    "fine_tuning_requested": false,
    "finetuning_state": "not_started",
    "verification_attempts": null,
    "verification_failures": [],
    "verification_attempts_count": 0,
    "slice_ids": null,
    "manual_verification": null,
    "manual_verification_requested": false
  },
  "labels": {},
  "description": "sdfsdfsdf",
  "preview_url": null,
  "available_for_tiers": [],
  "settings": null,
  "sharing": null
}

This makes sense, as in the openapi json file you have the following:

{
      ...,
      "VoiceResponseModel": {
        "title": "VoiceResponseModel",
        "required": [
          "voice_id",
          "name",
          "samples",
          "category",
          "fine_tuning",
          "labels",
          "description",
          "preview_url",
          "available_for_tiers",
          "settings",
          "sharing"
        ],
        "type": "object",
        "properties": {
          "voice_id": {
            "title": "Voice Id",
            "type": "string"
          },
          "name": {
            "title": "Name",
            "type": "string"
          },
          "samples": {
            "title": "Samples",
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/SampleResponseModel"
            }
          },
          "category": {
            "title": "Category",
            "type": "string"
          },
          "fine_tuning": {
            "$ref": "#/components/schemas/FineTuningResponseModel"
          },
          "labels": {
            "title": "Labels",
            "type": "object",
            "additionalProperties": {
              "type": "string"
            }
          },
          "description": {
            "title": "Description",
            "type": "string"
          },
          "preview_url": {
            "title": "Preview Url",
            "type": "string"
          },
          "available_for_tiers": {
            "title": "Available For Tiers",
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "settings": {
            "$ref": "#/components/schemas/VoiceSettingsResponseModel"
          },
          "sharing": {
            "$ref": "#/components/schemas/VoiceSharingResponseModel"
          }
        }
      },
      "FineTuningResponseModel": {
        "title": "FineTuningResponseModel",
        "required": [
          "model_id",
          "language",
          "is_allowed_to_fine_tune",
          "fine_tuning_requested",
          "finetuning_state",
          "verification_attempts",
          "verification_failures",
          "verification_attempts_count",
          "slice_ids",
          "manual_verification",
          "manual_verification_requested"
        ],
        "type": "object",
        "properties": {
          "model_id": {
            "title": "Model Id",
            "type": "string"
          },
          "language": {
            "title": "Language",
            "type": "string"
          },
          "is_allowed_to_fine_tune": {
            "title": "Is Allowed To Fine Tune",
            "type": "boolean"
          },
          "fine_tuning_requested": {
            "title": "Fine Tuning Requested",
            "type": "boolean"
          },
          "finetuning_state": {
            "title": "Finetuning State",
            "enum": [
              "not_started",
              "is_fine_tuning",
              "fine_tuned"
            ],
            "type": "string"
          },
          "verification_attempts": {
            "title": "Verification Attempts",
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/VerificationAttemptResponseModel"
            }
          },
          "verification_failures": {
            "title": "Verification Failures",
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "verification_attempts_count": {
            "title": "Verification Attempts Count",
            "type": "integer"
          },
          "slice_ids": {
            "title": "Slice Ids",
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "manual_verification": {
            "$ref": "#/components/schemas/ManualVerificationResponseModel"
          },
          "manual_verification_requested": {
            "title": "Manual Verification Requested",
            "type": "boolean"
          }
        }
      },
}

Feedback and Suggestion for Mandarin Voice Cloning

Dear Elevenlabs Team,

I hope this message finds you well. I am a frequent user of your voice cloning software and I am genuinely impressed by the innovation and hard work you have put into developing this remarkable tool. As a developer myself, I truly appreciate the technical prowess that has gone into making this software.

I am reaching out today to provide some constructive feedback and a suggestion that I believe could further enhance the user experience and the overall functionality of the software. Being a native English speaker, I have found the software's ability to clone voices in multiple languages to be particularly impressive. However, when it comes to the Chinese language (Mandarin), I have observed a potential area for improvement.

In the context of Chinese voice cloning, it would be beneficial if the software could allow users to adjust the tonal inflections of certain Chinese characters by using prompts. This would, in turn, improve the accuracy of the Mandarin tonality produced by the AI. As you are aware, the tonal nature of the Chinese language plays a significant role in conveying the correct meaning of words and sentences. Therefore, the ability to fine-tune these tonal aspects would be a valuable addition to your software, enhancing its effectiveness and authenticity in voice cloning for Mandarin and other tonal languages.

I understand that implementing this feature may require significant effort and resources. However, I believe that this enhancement could substantially improve the user experience and the overall quality of voice cloning, particularly for Mandarin speakers.

Thank you for taking the time to consider my suggestion. Your commitment to continual improvement and customer satisfaction is one of the many reasons why I hold your software in high regard. I am looking forward to seeing how Elevenlabs continues to evolve and innovate in the future.

Best regards,

Issue on docs: Incomplete and innaccurate references to a speech component for React

Path: /api-reference/integration-guides/react-text-to-speech-guide

Problem 1: This page mentions an available React component. But it calls it by two different names in the same documentation: "AudioStream" component and "SpeechStreamComponent" component. Which is correct?

Problem 2: The documentation doesn't actually explain how to download/install this React component. The only install command it provides in the "Leveraging ElevenLabs’ AudioStream React Component" section is the following which doesn't actually result in that component being installed...

run npm install react react-dom @types/react @types/react-dom axios

Problem 3: Finally, in the install command above, the command should not start with run. That will cause an error. Instead, the command should start npm install

Issue on docs

Path: /api-reference/history-download

Zip file no longer contains files named by their history id.

Phoneme Tags don't work as outlined in FAQ

Path: /speech-synthesis/prompting

Hi. The FAQ mentions it's possible to embed phoneme tags with English Multilingual v1, but the tags produce only silence. They don't work. Even the exact examples from the FAQ, pasted verbatim into the speech box, do not work. At all. Either the feature was removed, or something crucial is missing from the FAQ.

There's an alien name in my audiobook, "Nopileos", and whatever I do, the model will always try to put emphasis on the "e", which is incorrect. I tried a dozen different alternative spellings, nothing helps. Phonemes would be the only remaining option, like so:

<phoneme alphabet="ipa" ph="nˈɔpiːl-ˈeːɔs">Nopileos</phoneme>
<phoneme alphabet="cmu-arpabet" ph="N AY1 P AH0 L Z">Nopileos</phoneme>

But that produces only silence, just like the examples from your FAQ. I tried it with English v1, English v2, German, and the phoneme tags are always ignored. That makes it really difficult, if not imposible, to produce an audiobook where the protagonist is said NOPILEOS and whose name is said hundreds of time throughout the book...

Issue on docs

Path: /welcome/introduction

Not sure if it's only from my end, but I believe the join discord button link expired.

Issue on docs

Path: /welcome/getting-started
The following paragraph is not clear,
“Please note that custom voices and cloned voices have different meanings. Custom voices are all voices that are not automatically included with the accounts; this includes cloned voices.”
Custom voices, include Cloned voices? Then, are they part of Custom voices or different?
What's the meaning of Cloned voices?
As is written is perplexing and does not explain the different meanings.

Can't access voices() using code from the docs

using the docs: /api-reference/voices

code:

from elevenlabs import voices

print(voices())

produces the error:

Exception has occurred: UnboundLocalError
cannot access local variable 'voices' where it is not associated with a value
  File "G:\GitRepos\coda\utils\speak_response.py", line 14, in speak_response
    print(voices())
          ^^^^^^
  File "G:\GitRepos\coda\commands\connected.py", line 26, in run
    speak.speak_response(response)
  File "G:\GitRepos\coda\utils\on_command.py", line 25, in on_command
    if not commands[cmd].run(args):
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\GitRepos\coda\utils\on_command.py", line 8, in run
    on_command(message, commands)
  File "G:\GitRepos\coda\utils\voice_recognizer.py", line 39, in run
    command.run(str(message), commands)
  File "G:\GitRepos\coda\main.py", line 104, in <module>
    voice_recognizer.run(wakewords, commands, type='normal')
UnboundLocalError: cannot access local variable 'voices' where it is not associated with a value

I presume the docs are out of date, but there's a high chance I'm missing something

Your site says Check Your Mail Box

Why don't you send me an email or give me a resend? There is nothing in my mailbox from you.

I was about to spend money, but I can't, so thank you.

Issue on docs

Path: /dubbing/overview
Is there an API for dubbing?

unusual activity detected

i sign up elevenlabs and i tried the speach synthesis it was very good but not for long now its showing this(Unusual activity detected. Free Tier usage disabled. If you are using proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. Please play fair. Please purchase any Paid Subscription to continue. Update subscription?) when ever i try to use it , and i did nothing suspecious or did not used any proxy or vpn

Issue on docs

Path: /api-reference/text-to-speech-websockets

The example provided doesn't work - the initial message "Hello World " is too short, so we never get a response from the websocket server. It works if you provide a longer message. Can you update the docs so that the example includes a message that is long enough?

Issue on docs

Path: /voicelab/instant-voice-cloning

Hello ElevenLabs Technical Team,

I am planning to use your ElevenLabs platform for voice cloning processes. In this context, I have some uncertainties and need your assistance.

Specifically, I want to perform voice cloning by reading Turkish texts. Is there any issue with my provided voice samples being in Turkish for the cloning process? Once the cloning process is completed, I plan to use this cloned voice to narrate Turkish texts. Do you have any restrictions or recommendations regarding this on your platform?

Thank you,

Issue on docs: websockets example

Path: /api-reference/text-to-speech-websockets

Copied the end-to-end example for python websockets, put in my own api-keys + voice ID and the code just gets stuck on the first line response = await websocket.recv() and awaits forever.

Issue on docs

Path: /api-reference/history

Only the python code reflects the changes made when pagination was added.
The description of the endpoint itself is lacking the "page_size" and "start_after_history_item_id" query parameters, and the description of the response is missing the "last_history_item_id" and "has_more" values alongside the "history" array.

Problème de payement

Bonjour,

J'ai récemment voulu prendre un abonnement, mais ma carte ne fonctionne pas, alors que j'ai une carte visa qui peut faire des achats en ligne.
J'aimerais avoir une réponse à mon problème.

Cordialement,

Debosschere Florian

Issue on docs

Path: /api-reference/text-to-speech

I a trying to set the language in an API call to Portuguese (Portugal).

I have tried the code:

data = { "text": "Eles jogam futebol aos domingos.", "voice": "Bella", "lang": "pt-PT", "model_id": "eleven_multilingual_v2", "voice_settings": { "stability": 0.5, "similarity_boost": 0.5 } }

But the response I am getting is always in Brazilian Portuguese.

How can I force the language to Portuguese from Portugal?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.