Comments (15)
Funny thing, I rolled one small line back from your 1.1.3 version. In token.py
, I changed like 25:
d = list(bytearray(text.encode('UTF-8')))
With:
d = list(bytearray(text))
Clearly it looks like it's gonna stop working in the early future, but it perfectly worked for me, with both Arabic and Spanish tests being done.
from gtts.
Hmm. The non-ASCII was breaking some new code to generate tokens that would yield 403s. Thanks for this, the encoding etc bits have been moving a lot lately. I'll have to poke and figure out a good solution.
from gtts.
I think the problem is that the text from the argument is incorrectly encoded. I actually wrote a test for this in gTTS-token. Perhaps decoding the args to ASCII and then encoding to UTF-8 might work, although I'm not sure this will work on all platforms.
from gtts.
https://github.com/pndurette/gTTS/blob/master/gtts/token.py#L25 doesn't work with Chinese text:
d = list(bytearray(text.encode('UTF-8')))
A fix can be:
d = list(bytearray(unicode(text, 'UTF-8').encode('UTF-8')))
A very simple demo:
>>> s='你好'
>>> s.encode('UTF-8')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> unicode(s, 'UTF-8').encode('UTF-8')
'\xe4\xbd\xa0\xe5\xa5\xbd'
from gtts.
@chrisliu529 The problem isn't with the code. It is with your demo. Writing s = '你好'
means that you get an str with an ASCII encoding containing the UTF-8 bytes. Thus encoding it to UTF-8 will fail, because the string is invalid. If you write s = u'你好'
this will be a unicode (unicode is actually a type) and thus has the UTF-8 encoding and the UTF-8 bytes and thus will succeed.
This is even demonstrated by the error message. ASCII encoding only ranges from 0 - 127, whereas e4 is actually 228.
So Python expects all str to be in ASCII and all unicode to be in UTF-8. You can test this with type('è')
and type(u'è')
. (Although I think that would only work for Python 2 and in Python 3 all strings are unicode, but I might be wrong.)
However you are right that the unicode(text, 'UTF-8') would probably fix this. UTF-8 (and I think all unicode) is backwards compatible with ASCII, so both ASCII and UTF-8 encoding should get processed properly.
from gtts.
@Boudewijn26
Your reply about string type makes sense.
However, it seems it is not possible to sepcify u'你好' in a CLI interface or in a text file so I reproduced UnicodeDecodeError in the way above..
I just found my fix above crashed the unit test. I'll provide another fix then issue a pull request later today.
from gtts.
Please make the pull request to gTTS-token. That project was made to avoid code duplication across other Python projects and I expect that gTTS will depend on it in the near future.
from gtts.
Fine. I'll make the pull request to gTTS-token :)
from gtts.
Also I was able to specify text = u'你好'
in the CLI. As can be seen in this test. Only Python 3.2 doesn't support this syntax (which is why support for Python 3.2 has been dropped).
from gtts.
@Boudewijn26
That's a good point but actually the CLI interface I mentioned was about gtts-cli interface like
gtts-cli -l 'zh' -o 'test.mp3' '你好'
from gtts.
Yeah, @Boudewijn26, that's actually the point the whole time. As @chrisliu529 said, you can't tell through command line that the string you're passing is either Unicode or ASCII.
from gtts.
I just made a pull request but it didn't work on python 3..
Are we supposed to make gtts work on python 3?
from gtts.
@chrisliu529 Sorry, my bad. I misunderstood.
Yes, Python 3 compatibility is required. I think I have a fix. I will implement it this afternoon.
from gtts.
This should be fixed in gTTS-token 1.1.0.
from gtts.
Closing this, as gTTS
1.1.4 now does its token calculations from @Boudewijn26's gTTS-token
.
from gtts.
Related Issues (20)
- 0xA0 is causing gtts-cli to send EOF. HOT 2
- Error: Unable to find token seed! Did https://translate.google.com change? HOT 2
- Cant use 'tr' language
- gTTS throws unknown error for some languages, help me find why. HOT 3
- gtts.tts.gTTSError: 200 (OK) from TTS API. Probable cause: Unknown HOT 8
- Can Any one tell me even after tld set to com why it's connecting to translate.google.en? HOT 12
- Loosen dependencies if possible or at least make `click` optional HOT 3
- test fails: test_file_ascii and test_file_utf8: AssertionError HOT 3
- No pauses if 100 characters limit HOT 6
- Timestamps of the spoken words HOT 1
- Proxy setting parameters need to be added HOT 4
- Chinese example doesn't work HOT 3
- gtts_cli breaks when text starts with '-' HOT 3
- Add ability to adjust speed HOT 4
- 2mn latency at the save step HOT 3
- Add Support for Farsi/ Persian HOT 1
- Want support for LT lithuanian lang HOT 2
- Possible to add support for neural wavenet and studio? HOT 1
- readthedocs out of date HOT 1
- GTTS is adding in a word when speaking. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtts.