GithubHelp home page GithubHelp logo

Comments (4)

78Alpha avatar 78Alpha commented on August 17, 2024

That is a byproduct of the sentence splitter. It will just drop things here and there.

The painful alternative is to do it sentence by sentence. An automated alternative would be to split the text beforehand (like by actual sentence) but, when it is phonemized it might miss parts of the audio at the end.

Example:

The old sentence splitter splits up until it hits the phoneme limit.

if text_split_length is not None and len(text) >= text_split_length:
        text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            if len(text_splits[-1]) + len(str(sentence)) <= text_split_length:
                # if the last sentence + the current sentence is less than the text_split_length
                # then add the current sentence to the last sentence
                text_splits[-1] += " " + str(sentence)
                text_splits[-1] = text_splits[-1].lstrip()
            elif len(str(sentence)) > text_split_length:
                # if the current sentence is greater than the text_split_length
                for line in textwrap.wrap(
                    str(sentence),
                    width=text_split_length,
                    drop_whitespace=True,
                    break_on_hyphens=False,
                    tabsize=1,
                ):
                    text_splits.append(str(line))
            else:
                text_splits.append(str(sentence))

        if len(text_splits) > 1:
            if text_splits[0] == "":
                del text_splits[0]
    else:
        text_splits = [text.lstrip()]

    return text_splits

I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")

if text_split_length is not None and len(text) >= text_split_length:
        #text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            sentence = str(sentence).replace(". ", ". <>")
            frags = sentence.split("<>")
            text_splits += frags
            #text_splits.append(str(sentence))
            print(sentence)
    else:
        text_splits = [text.lstrip()]

    return text_splits

So it might introduce some abnormalities or behave unusually.

from xtts-webui.

GamingDaveUk avatar GamingDaveUk commented on August 17, 2024

That is a byproduct of the sentence splitter. It will just drop things here and there.

The painful alternative is to do it sentence by sentence. An automated alternative would be to split the text beforehand (like by actual sentence) but, when it is phonemized it might miss parts of the audio at the end.

Example:

The old sentence splitter splits up until it hits the phoneme limit.

if text_split_length is not None and len(text) >= text_split_length:
        text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            if len(text_splits[-1]) + len(str(sentence)) <= text_split_length:
                # if the last sentence + the current sentence is less than the text_split_length
                # then add the current sentence to the last sentence
                text_splits[-1] += " " + str(sentence)
                text_splits[-1] = text_splits[-1].lstrip()
            elif len(str(sentence)) > text_split_length:
                # if the current sentence is greater than the text_split_length
                for line in textwrap.wrap(
                    str(sentence),
                    width=text_split_length,
                    drop_whitespace=True,
                    break_on_hyphens=False,
                    tabsize=1,
                ):
                    text_splits.append(str(line))
            else:
                text_splits.append(str(sentence))

        if len(text_splits) > 1:
            if text_splits[0] == "":
                del text_splits[0]
    else:
        text_splits = [text.lstrip()]

    return text_splits

I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")

if text_split_length is not None and len(text) >= text_split_length:
        #text_splits.append("")
        nlp = get_spacy_lang(lang)
        nlp.add_pipe("sentencizer")
        doc = nlp(text)
        for sentence in doc.sents:
            sentence = str(sentence).replace(". ", ". <>")
            frags = sentence.split("<>")
            text_splits += frags
            #text_splits.append(str(sentence))
            print(sentence)
    else:
        text_splits = [text.lstrip()]

    return text_splits

So it might introduce some abnormalities or behave unusually.

interesting. I may give that a go. I dont think the developer is too bothered with this issue or is not able to replicate so been looking for a reliable alternative... not having any luck, so if this fixes it I will be very happy.

from xtts-webui.

efh8fh8h avatar efh8fh8h commented on August 17, 2024

This is happening when you use a finetuned model with some bad traing data. With the base 2.0.2 everthing works as expected. After manual curating all wav files and the whisper transcript, my finetuned models did not have that issue any more. Give it a try :)

from xtts-webui.

cwmcd avatar cwmcd commented on August 17, 2024

i'm having the same issue with the collab version. not only is it losing entire blocks of text, but also mixing up and repeating text all while also hallucinating and giving demon voices or the voice morphing into another voice/gender.

from xtts-webui.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.