We are losing whole sentences or the end of sentences when we generate the audio. At f

Losing whole sentences when generating the wav file about xtts-webui HOT 4 OPEN

daswer123 commented on August 17, 2024

Losing whole sentences when generating the wav file

from xtts-webui.

Comments (4)

78Alpha commented on August 17, 2024

That is a byproduct of the sentence splitter. It will just drop things here and there.

The painful alternative is to do it sentence by sentence. An automated alternative would be to split the text beforehand (like by actual sentence) but, when it is phonemized it might miss parts of the audio at the end.

Example:

The old sentence splitter splits up until it hits the phoneme limit.

if text_split_length is not None and len(text) >= text_split_length:
        text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            if len(text_splits[-1]) + len(str(sentence)) <= text_split_length:
                # if the last sentence + the current sentence is less than the text_split_length
                # then add the current sentence to the last sentence
                text_splits[-1] += " " + str(sentence)
                text_splits[-1] = text_splits[-1].lstrip()
            elif len(str(sentence)) > text_split_length:
                # if the current sentence is greater than the text_split_length
                for line in textwrap.wrap(
                    str(sentence),
                    width=text_split_length,
                    drop_whitespace=True,
                    break_on_hyphens=False,
                    tabsize=1,
                ):
                    text_splits.append(str(line))
            else:
                text_splits.append(str(sentence))

        if len(text_splits) > 1:
            if text_splits[0] == "":
                del text_splits[0]
    else:
        text_splits = [text.lstrip()]

    return text_splits

I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")

if text_split_length is not None and len(text) >= text_split_length:
        #text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            sentence = str(sentence).replace(". ", ". <>")
            frags = sentence.split("<>")
            text_splits += frags
            #text_splits.append(str(sentence))
            print(sentence)
    else:
        text_splits = [text.lstrip()]

    return text_splits

So it might introduce some abnormalities or behave unusually.

from xtts-webui.

GamingDaveUk commented on August 17, 2024

That is a byproduct of the sentence splitter. It will just drop things here and there.

Example:

The old sentence splitter splits up until it hits the phoneme limit.

if text_split_length is not None and len(text) >= text_split_length:
        text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            if len(text_splits[-1]) + len(str(sentence)) <= text_split_length:
                # if the last sentence + the current sentence is less than the text_split_length
                # then add the current sentence to the last sentence
                text_splits[-1] += " " + str(sentence)
                text_splits[-1] = text_splits[-1].lstrip()
            elif len(str(sentence)) > text_split_length:
                # if the current sentence is greater than the text_split_length
                for line in textwrap.wrap(
                    str(sentence),
                    width=text_split_length,
                    drop_whitespace=True,
                    break_on_hyphens=False,
                    tabsize=1,
                ):
                    text_splits.append(str(line))
            else:
                text_splits.append(str(sentence))

        if len(text_splits) > 1:
            if text_splits[0] == "":
                del text_splits[0]
    else:
        text_splits = [text.lstrip()]

    return text_splits

I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")

if text_split_length is not None and len(text) >= text_split_length:
        #text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            sentence = str(sentence).replace(". ", ". <>")
            frags = sentence.split("<>")
            text_splits += frags
            #text_splits.append(str(sentence))
            print(sentence)
    else:
        text_splits = [text.lstrip()]

    return text_splits

So it might introduce some abnormalities or behave unusually.

interesting. I may give that a go. I dont think the developer is too bothered with this issue or is not able to replicate so been looking for a reliable alternative... not having any luck, so if this fixes it I will be very happy.

from xtts-webui.

efh8fh8h commented on August 17, 2024

This is happening when you use a finetuned model with some bad traing data. With the base 2.0.2 everthing works as expected. After manual curating all wav files and the whisper transcript, my finetuned models did not have that issue any more. Give it a try :)

from xtts-webui.

cwmcd commented on August 17, 2024

i'm having the same issue with the collab version. not only is it losing entire blocks of text, but also mixing up and repeating text all while also hallucinating and giving demon voices or the voice morphing into another voice/gender.

from xtts-webui.

Losing whole sentences when generating the wav file about xtts-webui HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs