coqui-ai / xtts-streaming-server Goto Github PK

License: Mozilla Public License 2.0

Python 95.76% Dockerfile 4.24%

xtts-streaming-server's Introduction

XTTS streaming server

Warning: XTTS-streaming-server doesn't support concurrent streaming requests, it's a demo server, not meant for production.

movie.mp4

1) Run the server

Use a pre-built image

CUDA 12.1:

$ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121

CUDA 11.8 (for older cards):

$ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest

CPU (not recommended):

$ docker run -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cpu

Run with a fine-tuned model:

Make sure the model folder /path/to/model/folder contains the following files:

config.json
model.pth
vocab.json

$ docker run -v /path/to/model/folder:/app/tts_models --gpus=all -e COQUI_TOS_AGREED=1  --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest`

Setting the COQUI_TOS_AGREED environment variable to 1 indicates you have read and agreed to the terms of the CPML license. (Fine-tuned XTTS models also are under the CPML license)

Build the image yourself

To build the Docker container Pytorch 2.1 and CUDA 11.8 :

DOCKERFILE may be Dockerfile, Dockerfile.cpu, Dockerfile.cuda121, or your own custom Dockerfile.

$ git clone [email protected]:coqui-ai/xtts-streaming-server.git
$ cd xtts-streaming-server/server
$ docker build -t xtts-stream . -f DOCKERFILE
$ docker run --gpus all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 xtts-stream

Setting the COQUI_TOS_AGREED environment variable to 1 indicates you have read and agreed to the terms of the CPML license. (Fine-tuned XTTS models also are under the CPML license)

2) Testing the running server

Once your Docker container is running, you can test that it's working properly. You will need to run the following code from a fresh terminal.

Clone `xtts-streaming-server` if you haven't already

$ git clone [email protected]:coqui-ai/xtts-streaming-server.git

Using the gradio demo

$ cd xtts-streaming-server
$ python -m pip install -r test/requirements.txt
$ python demo.py

Using the test script

$ cd xtts-streaming-server/test
$ python -m pip install -r requirements.txt
$ python test_streaming.py

xtts-streaming-server's People

Contributors

Stargazers

Watchers

xtts-streaming-server's Issues

Is it possible to use it without docket and does it support M1?

Request for Integration of RunPod Serverless Template

Hello Coqui AI Team,

I hope this message finds you well. I am reaching out to express my interest in the xtts-streaming-server project. It's an impressive repository, and I appreciate the hard work that has gone into it.

I would like to suggest the integration of a RunPod serverless template into the xtts-streaming-server. This addition would significantly enhance the project's accessibility and usability for users who prefer or require serverless environments.

RunPod is gaining popularity for serverless deployments, and having a ready-to-use template would not only streamline the setup process but also expand the potential user base of xtts-streaming-server. It would allow users to quickly deploy your excellent TTS solution in a serverless environment, making it more accessible to a broader audience.

I understand that integrating a new deployment option requires time and resources, and I appreciate any consideration you can give to this suggestion. The addition of a RunPod serverless template would be a valuable enhancement to your already fantastic project.

Thank you for your time and all the great work on xtts-streaming-server. Looking forward to potentially seeing this integration in the future.

Best regards

Does xtts-streaming-server support multiple query?

CPU usage

would be helpful if readme said whether or not CPU usage is possible, and if so, how

docker container crashes at start

Hi,
when trying to start your docker container, it will crash after all components have been loaded:

Loading default model
Downloading XTTS Model: tts_models/multilingual/multi-dataset/xtts_v2

Downloading model to /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
Model's license - CPML
Check https://coqui.ai/cpml.txt for more info.
XTTS Model downloaded
Loading XTTS
Traceback (most recent call last):
File "/opt/conda/bin/uvicorn", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/uvicorn/main.py", line 416, in main
run(
File "/opt/conda/lib/python3.10/site-packages/uvicorn/main.py", line 587, in run
server.run()
File "/opt/conda/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
return asyncio.run(self.serve(sockets=sockets))
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/lib/python3.10/site-packages/uvicorn/server.py", line 68, in serve
config.load()
File "/opt/conda/lib/python3.10/site-packages/uvicorn/config.py", line 467, in load
self.loaded_app = import_from_string(self.app)
File "/opt/conda/lib/python3.10/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/app/main.py", line 39, in
config.load_json(os.path.join(model_path, "config.json"))
File "/opt/conda/lib/python3.10/site-packages/coqpit/coqpit.py", line 728, in load_json
dump_dict = json.loads(input_str)
File "/opt/conda/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The same Error occurs on both CUDA versions

How to Enable Concurrent Streaming Requests in XTTS-Streaming-Server?

Hi there,
I am using the XTTS-streaming-server for a text-to-speech application, and I encountered the following warning:Warning: XTTS-streaming-server doesn't support concurrent streaming requests
My application requires handling multiple concurrent streaming requests, but it seems that the current implementation of XTTS-streaming-server does not support this.Could you please provide guidance on how to modify or configure the XTTS-streaming-server to handle concurrent streaming requests? Alternatively, is there a recommended approach or best practice for achieving this, either by modifying the existing server or by using additional tools and technologies?
Thank you for your assistance!

Handling Concurrent Streaming Requests

I'm working on a project that requires efficient handling of multiple concurrent streaming requests. I have some specific requirements and challenges that I'd like advice on:

Scalability: I need to scale the number of concurrent streams efficiently.
Resource Management: I need to manage GPU and memory usage effectively when handling multiple streams.

Any advice or recommendations on these issues would be greatly appreciated.

CUSTOM_MODEL_PATH?

Do I need to download a model and then specify environment variables? Where do I go to download the model? https://github.com/coqui-ai/TTS?

Error with torch.isin() in Docker Container with transformers Library

Describe the bug

When running the application inside a Docker container, an error occurs related to the torch.isin() method within the transformers library. The error does not occur when running the application locally (outside of the container), suggesting a possible incompatibility or issue with the dependencies inside the Docker container.

To Reproduce

Build the Docker image using the provided Dockerfile.

Dockerfile:

FROM python:3.11.8-slim

ENV PYTHONUNBUFFERED=1

# Install system dependencies and Rust
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    libsndfile1 \
    libgomp1 \
    pkg-config \
    libssl-dev && \
    curl https://sh.rustup.rs -sSf | sh -s -- -y

ENV PATH="/root/.cargo/bin:${PATH}"
ENV COQUI_TOS_AGREED=1

# Update pip to the latest version
RUN pip install --upgrade pip

# Install Python dependencies
RUN pip install --no-cache-dir fastapi uvicorn torch==2.2.0 torchaudio==2.2.0 transformers==4.43.1 numpy==1.24.3 TTS==0.22.0 sudachipy cutlet
RUN pip install --upgrade transformers

# Copy the FastAPI application code
COPY main.py /app/main.py

WORKDIR /app

EXPOSE 8001

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]

main.py:


import io
import os
import wave
import torch
import numpy as np
from fastapi import FastAPI, Request, Header, Body
from fastapi.responses import StreamingResponse
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
from TTS.utils.generic_utils import get_user_data_dir
from TTS.utils.manage import ModelManager

# Set the number of threads and device
torch.set_num_threads(int(os.environ.get("NUM_THREADS", os.cpu_count())))
device = torch.device("cuda" if torch.cuda.is_available() and os.environ.get("USE_CPU", "0") == "0" else "cpu")

# Load custom model if available, otherwise download the default model
custom_model_path = os.environ.get("CUSTOM_MODEL_PATH", "/app/tts_models")
if os.path.exists(custom_model_path) and os.path.isfile(custom_model_path + "/config.json"):
    model_path = custom_model_path
    print("Loading custom model from", model_path, flush=True)
else:
    print("Loading default model", flush=True)
    model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
    print("Downloading XTTS Model:", model_name, flush=True)
    ModelManager().download_model(model_name)
    model_path = os.path.join(get_user_data_dir("tts"), model_name.replace("/", "--"))
    print("XTTS Model downloaded", flush=True)

# Load model configuration and model
print("Loading XTTS", flush=True)
config = XttsConfig()
config.load_json(os.path.join(model_path, "config.json"))
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=model_path, eval=True, use_deepspeed=True if device == "cuda" else False)
model.to(device)
print("XTTS Loaded.", flush=True)

# Initialize FastAPI
app = FastAPI(
    title="XTTS Streaming server",
    description="XTTS Streaming server",
    version="0.0.1",
    docs_url="/",
)

# Helper functions
def postprocess(wav):
    if isinstance(wav, list):
        wav = torch.cat(wav, dim=0)
    wav = wav.clone().detach().cpu().numpy()
    wav = wav[None, : int(wav.shape[0])]
    wav = np.clip(wav, -1, 1)
    wav = (wav * 32767).astype(np.int16)
    return wav

def wav_data_generator(frame_input, sample_rate=24000, sample_width=2, channels=1):
    wav_buf = io.BytesIO()
    with wave.open(wav_buf, "wb") as vfout:
        vfout.setnchannels(channels)
        vfout.setsampwidth(sample_width)
        vfout.setframerate(sample_rate)
        vfout.writeframes(frame_input)

    wav_buf.seek(0)
    return wav_buf.read()

# Streaming generator
def predict_streaming_generator(text, language, add_wav_header, stream_chunk_size):

    speaker_name = "Alison Dietlinde"
    speaker_raw = model.speaker_manager.speakers[speaker_name]["speaker_embedding"].cpu().squeeze().half().tolist()
    gpt_raw = model.speaker_manager.speakers[speaker_name]["gpt_cond_latent"].cpu().squeeze().half().tolist()

    speaker_embedding = torch.tensor(speaker_raw).unsqueeze(0).unsqueeze(-1)
    gpt_cond_latent = torch.tensor(gpt_raw).reshape((-1, 1024)).unsqueeze(0)

    chunks = model.inference_stream(
        text,
        language,
        gpt_cond_latent,
        speaker_embedding,
        stream_chunk_size=int(stream_chunk_size),
        enable_text_splitting=True
    )

    for i, chunk in enumerate(chunks):
        chunk = postprocess(chunk)
        if i == 0 and add_wav_header:
            yield wav_data_generator(b"")
            yield chunk.tobytes()
        else:
            yield chunk.tobytes()

# FastAPI endpoint for streaming
@app.post("/tts_stream")
async def predict_streaming_endpoint(
    text: str = Header(...),
    language: str = Header(...),
    add_wav_header: bool = Header(True),
    stream_chunk_size: str = Header("20")
):
    try:
        return StreamingResponse(
            predict_streaming_generator(text,language, add_wav_header, stream_chunk_size),
            media_type="audio/wav"
        )
    except Exception as e:
        raise


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8001)

Start the Docker container.
Make a POST request to the /tts_stream endpoint with the appropriate headers and data.
test.py:


import argparse
import json
import shutil
import subprocess
import sys
import time
from typing import Iterator

import requests


def is_installed(lib_name: str) -> bool:
    lib = shutil.which(lib_name)
    if lib is None:
        return False
    return True


def save(audio: bytes, filename: str) -> None:
    with open(filename, "wb") as f:
        f.write(audio)


def stream_ffplay(audio_stream, output_file, save=True):
    if not save:
        ffplay_cmd = ["ffplay", "-nodisp", "-probesize", "1024", "-autoexit", "-"]
    else:
        print("Saving to ", output_file)
        ffplay_cmd = ["ffmpeg", "-probesize", "1024", "-i", "-", output_file]

    ffplay_proc = subprocess.Popen(ffplay_cmd, stdin=subprocess.PIPE)
    for chunk in audio_stream:
        if chunk is not None:
            ffplay_proc.stdin.write(chunk)

    # close on finish
    ffplay_proc.stdin.close()
    ffplay_proc.wait()


def tts(text, language, server_url, stream_chunk_size) -> Iterator[bytes]:
    start = time.perf_counter()

    headers = {
        "text": text,
        "language": language,
        "add_wav_header": "False",
        "stream_chunk_size": stream_chunk_size,
    }

    res = requests.post(
        f"{server_url}/tts_stream",
        headers=headers, 
        stream=True
    )
    end = time.perf_counter()
    print(f"Time to make POST: {end-start}s", file=sys.stderr)

    if res.status_code != 200:
        print("Error:", res.text)
        sys.exit(1)

    first = True
    for chunk in res.iter_content(chunk_size=512):
        if first:
            end = time.perf_counter()
            print(f"Time to first chunk: {end-start}s", file=sys.stderr)
            first = False
        if chunk:
            yield chunk

    print("⏱️ response.elapsed:", res.elapsed)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--text",
        default="It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
        help="text input for TTS"
    )
    parser.add_argument(
        "--language",
        default="en",
        help="Language to use default is 'en'  (English)"
    )
    parser.add_argument(
        "--output_file",
        default=None,
        help="Save TTS output to given filename"
    )
    parser.add_argument(
        "--ref_file",
        default=None,
        help="Reference audio file to use, when not given will use default"
    )
    parser.add_argument(
        "--server_url",
        default="http://localhost:8000",
        help="Server url http://localhost:8000 default, change to your server location "
    )
    parser.add_argument(
        "--stream_chunk_size",
        default="20",
        help="Stream chunk size , 20 default, reducing will get faster latency but may degrade quality"
    )
    args = parser.parse_args()

    with open("./default_speaker.json", "r") as file:
        speaker = json.load(file)

    if args.ref_file is not None:
        print("Computing the latents for a new reference...")

    audio = stream_ffplay(
        tts(
            args.text,
            args.language,
            args.server_url,
            args.stream_chunk_size
        ), 
        args.output_file,
        save=bool(args.output_file)
    )

CMD:
python test.py --text "This is a Test." --language en --server_url "http://localhost:8001" --stream_chunk_size 145

Expected behavior

No response

Logs

TypeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

Environment

transformers: 4.43.1
torch: 2.2.0
torchaudio: 2.2.0
TTS: 0.22.0
Platform: Docker

Additional context

No response

parameter values for inference

Hello,

What are the exact parameter values need to pass. 1. speaker embedding and 2.gpt_conda_latent

Thanks,
santhosh

Not working properly. RuntimeError: shape '[-1, 1024]' is invalid for input of size 1

curl -X 'POST' \
  'http://localhost1:8000/tts_stream' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "speaker_embedding": [
    0
  ],
  "gpt_cond_latent": [
    [
      0
    ]
  ],
  "text": "this is a test.",
  "language": "en",
  "add_wav_header": true,
  "stream_chunk_size": "20"
}'

  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap
    |     await func()
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/responses.py", line 262, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/concurrency.py", line 63, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, iterator)
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 49, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2103, in run_sync_in_worker_thread
    |     return await future
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 823, in run
    |     result = context.run(func, *args)
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/concurrency.py", line 53, in _next
    |     return next(iterator)
    |   File "/app/main.py", line 134, in predict_streaming_generator
    |     torch.tensor(parsed_input.gpt_cond_latent).reshape((-1, 1024)).unsqueeze(0)
    | RuntimeError: shape '[-1, 1024]' is invalid for input of size 1
    +------------------------------------

asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ff72233f610

@app.post("/api/xtts_stream")
def gpt_xtts_stream(inputs: xtts_stream_inputs):
    try:
        speaker["text"] = inputs.text
        speaker['language'] = inputs.language
        speaker['stream_chunk_size'] = inputs.stream_chunk_size

        res = requests.post(
            f"http://0.0.0.0:8004/tts_stream",
            json=speaker,
            stream=True,
        )

        def iterms_gene(res):
            if res.status_code != 200:
                print("Error:", res.text)
                sys.exit(1)
            first = True
            for chunk in res.iter_content(chunk_size=512):
                if first:
                    first = False
                if chunk:
                    yield chunk
        # 返回音频流文件
        return StreamingResponse(iterms_gene(res), media_type="audio/wav")
    except Exception as e:
        logging.error(f"Error: {e}")
        return {"error": str(e)}

i want to use main.py in /tts_stream, so up.
but, i get down error:

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 259, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 255, in wrap
await func()
File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 232, in listen_for_disconnect
message = await receive()
File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 538, in receive
await self.message_event.wait()
File "/home/search/miniconda3/envs/XTTS/lib/python3.9/asyncio/locks.py", line 226, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ff72233f610

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last):
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
| return await self.app(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/applications.py", line 116, in call
| await self.middleware_stack(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in call
| raise exc
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in call
| await self.app(scope, receive, _send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
| raise exc
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
| raise exc
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
| await app(scope, receive, sender)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/routing.py", line 746, in call
| await route.handle(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/routing.py", line 75, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
| raise exc
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
| await app(scope, receive, sender)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/routing.py", line 73, in app
| await response(scope, receive, send)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 259, in call
| await wrap(partial(self.listen_for_disconnect, receive))
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 255, in wrap
| await func()
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/responses.py", line 244, in stream_response
| async for chunk in self.body_iterator:
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/concurrency.py", line 57, in iterate_in_threadpool
| yield await anyio.to_thread.run_sync(_next, iterator)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
| return await future
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
| result = context.run(func, *args)
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/starlette/concurrency.py", line 47, in _next
| return next(iterator)
| File "/data/search/bei/TTS/xtts-streaming-server/server/main.py", line 131, in predict_streaming_generator
| for i, chunk in enumerate(chunks):
| File "/home/search/miniconda3/envs/XTTS/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
| response = gen.send(None)
| File "/data/search/bei/TTS/TTS/tts/models/xtts.py", line 633, in inference_stream
| gpt_cond_latent = gpt_cond_latent.to(self.device)
| RuntimeError: CUDA error: device-side assert triggered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
| Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
|
+------------------------------------

main.py don`t change!

Error while adding new voice

When I try to create new voice from wav-file...

.../xtts-streaming-server/.venv/lib/python3.10/site-packages/gradio/components/dropdown.py:176: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. return gr.Dropdown(...) instead of return gr.Dropdown.update(...).
warnings.warn(

pip freeze:
...
gradio==3.50.2
gradio_client==0.6.1
...

Streaming input to streaming TTS

Hello Team,

Is it possible to run TTS streaming with streaming input text with same file name?

Example:

def llm_write(prompt: str):

    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        message=[{"role": "user", "content": prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choice"][0]["delta"].get("content")) is not None:
            yield text_chunk


text_stream = llm_write("Hello, what is LLM?")

audio = stream_ffplay(
    tts(
        args.text,
        speaker,
        args.language,
        args.server_url,
        args.stream_chunk_size
    ), 
    args.output_file,
    save=bool(args.output_file)
)

With minimum words to the TTS api.

Thanks,
Santhosh