GithubHelp home page GithubHelp logo

Comments (15)

SergiJuanola avatar SergiJuanola commented on May 25, 2024

Funny thing, I rolled one small line back from your 1.1.3 version. In token.py, I changed like 25:

d = list(bytearray(text.encode('UTF-8')))

With:

d = list(bytearray(text))

Clearly it looks like it's gonna stop working in the early future, but it perfectly worked for me, with both Arabic and Spanish tests being done.

from gtts.

pndurette avatar pndurette commented on May 25, 2024

Hmm. The non-ASCII was breaking some new code to generate tokens that would yield 403s. Thanks for this, the encoding etc bits have been moving a lot lately. I'll have to poke and figure out a good solution.

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

I think the problem is that the text from the argument is incorrectly encoded. I actually wrote a test for this in gTTS-token. Perhaps decoding the args to ASCII and then encoding to UTF-8 might work, although I'm not sure this will work on all platforms.

from gtts.

chrisliu529 avatar chrisliu529 commented on May 25, 2024

https://github.com/pndurette/gTTS/blob/master/gtts/token.py#L25 doesn't work with Chinese text:
d = list(bytearray(text.encode('UTF-8')))

A fix can be:
d = list(bytearray(unicode(text, 'UTF-8').encode('UTF-8')))

A very simple demo:
>>> s='你好'
>>> s.encode('UTF-8')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> unicode(s, 'UTF-8').encode('UTF-8')
'\xe4\xbd\xa0\xe5\xa5\xbd'

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

@chrisliu529 The problem isn't with the code. It is with your demo. Writing s = '你好' means that you get an str with an ASCII encoding containing the UTF-8 bytes. Thus encoding it to UTF-8 will fail, because the string is invalid. If you write s = u'你好' this will be a unicode (unicode is actually a type) and thus has the UTF-8 encoding and the UTF-8 bytes and thus will succeed.

This is even demonstrated by the error message. ASCII encoding only ranges from 0 - 127, whereas e4 is actually 228.

So Python expects all str to be in ASCII and all unicode to be in UTF-8. You can test this with type('è') and type(u'è'). (Although I think that would only work for Python 2 and in Python 3 all strings are unicode, but I might be wrong.)

However you are right that the unicode(text, 'UTF-8') would probably fix this. UTF-8 (and I think all unicode) is backwards compatible with ASCII, so both ASCII and UTF-8 encoding should get processed properly.

from gtts.

chrisliu529 avatar chrisliu529 commented on May 25, 2024

@Boudewijn26
Your reply about string type makes sense.
However, it seems it is not possible to sepcify u'你好' in a CLI interface or in a text file so I reproduced UnicodeDecodeError in the way above..
I just found my fix above crashed the unit test. I'll provide another fix then issue a pull request later today.

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

Please make the pull request to gTTS-token. That project was made to avoid code duplication across other Python projects and I expect that gTTS will depend on it in the near future.

from gtts.

chrisliu529 avatar chrisliu529 commented on May 25, 2024

Fine. I'll make the pull request to gTTS-token :)

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

Also I was able to specify text = u'你好' in the CLI. As can be seen in this test. Only Python 3.2 doesn't support this syntax (which is why support for Python 3.2 has been dropped).

from gtts.

chrisliu529 avatar chrisliu529 commented on May 25, 2024

@Boudewijn26
That's a good point but actually the CLI interface I mentioned was about gtts-cli interface like
gtts-cli -l 'zh' -o 'test.mp3' '你好'

from gtts.

SergiJuanola avatar SergiJuanola commented on May 25, 2024

Yeah, @Boudewijn26, that's actually the point the whole time. As @chrisliu529 said, you can't tell through command line that the string you're passing is either Unicode or ASCII.

from gtts.

chrisliu529 avatar chrisliu529 commented on May 25, 2024

I just made a pull request but it didn't work on python 3..
Are we supposed to make gtts work on python 3?

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

@chrisliu529 Sorry, my bad. I misunderstood.

Yes, Python 3 compatibility is required. I think I have a fix. I will implement it this afternoon.

from gtts.

Boudewijn26 avatar Boudewijn26 commented on May 25, 2024

This should be fixed in gTTS-token 1.1.0.

from gtts.

pndurette avatar pndurette commented on May 25, 2024

Closing this, as gTTS 1.1.4 now does its token calculations from @Boudewijn26's gTTS-token.

from gtts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.