Somehow you've landed here 🤔
I am probably building something or learning some stuff. Feel free to look around, check my contributions below, website and LinkedIn profile.
BabelFish is a Python library to work with countries and languages
License: BSD 3-Clause "New" or "Revised" License
Somehow you've landed here 🤔
I am probably building something or learning some stuff. Feel free to look around, check my contributions below, website and LinkedIn profile.
Currently babelfish implements the hash function to override default behavior and make it possible for objects to be keys in dictionaries and other useful features.
While this is a good thing, the way it is implemented makes it prone to some weird errors as explained in this lyft blog post:
>>> import babelfish
>>> fr = babelfish.Language("fra")
>>> fr_fr = babelfish.Language("fra", "FR")
>>> s = set([fr])
>>> fr in s
True
>>> fr_fr in s
False
All that is great, but if we modify the objects, things get weird because python expect the result of hash not to change:
>>> fr.country = babelfish.Country("FR")
>>> fr
<Language [fr-FR]>
>>> fr in s
False
>>> list(s)[0]
<Language [fr-FR]>
>>> fr_fr in s
False
I want to have true immutability of babelfish objects by making use of tuples (and derivatives) or at least faking it maybe with dataclasses frozen options.
This will surely be a breaking change.
Because resource_stream may return a StringIO that does not support the with statement we need to use close explicitly.
Seems like: http://ridingpython.blogspot.fr/2011/08/stream-from-pkgresourcesresourcestream.html
Friendly reminder that pkg_resources is deprecated in favor of importlib.resources.
https://setuptools.pypa.io/en/latest/pkg_resources.html
https://docs.python.org/3/library/importlib.resources.html
Hi @Diaoul,
I've fixed an issue in guessit (guessit-io/guessit#183) and find out babelfish is also impacted. In newer setuptools version, load(require=False)
display a deprecation warning at babelfish/init.py.
I'll make a PR to fix this deprecation warning the same way it's done in guessit.
Useful to differentiate the writing of the same language
I'm looking for a way to better identify languages in media tracks (mainly audio and subtitle tracks). Usually default tags from media tracks are not precise. Very rarely you get an audio or subtitle track with the correct IETF tag for pt-BR
or es-MX
or other languages. 99% of the time they are just marked as pt
or es
and it's very common to have 2 or more tracks with the same language code:
{
"codec": "SubRip/SRT",
"id": 19,
"properties": {
"codec_id": "S_TEXT/UTF8",
"codec_private_length": 0,
"default_track": false,
"enabled_track": true,
"encoding": "UTF-8",
"forced_track": false,
"language": "por",
"language_ietf": "pt",
"number": 20,
"text_subtitles": true,
"track_name": "Português",
"uid": 1602227994484803173
},
"type": "subtitles"
},
{
"codec": "SubRip/SRT",
"id": 20,
"properties": {
"codec_id": "S_TEXT/UTF8",
"codec_private_length": 0,
"default_track": false,
"enabled_track": true,
"encoding": "UTF-8",
"forced_track": false,
"language": "por",
"language_ietf": "pt",
"number": 21,
"text_subtitles": true,
"track_name": "Português (Brasil)",
"uid": 17784914655403220205
},
"type": "subtitles"
},
In order to solve this, most likely an approach like guessit
is needed. While analysing a large dataset from audio tracks and subtitle tracks, part of them use the official language name in english with the country demonym:
Brazilian Portuguese
British English
American English
French Canadian
I know babelfish is a very concise library that does one thing and it does it well. And to solve this issue I'll need to create extensions (language and country converters) that are outside babelfish scope.
But this little piece related to country demonyms seems a nice feature to be included in babelfish. Maybe something like this:
>>> import babelfish
>>> babelfish.Country.fromname('France')
<Country [FR]>
>>> babelfish.Country.fromdemonym('French')
<Country [FR]>
>>> import babelfish
>>> babelfish.Language.fromname('Portuguese')
<Language [pt]>
>>> babelfish.Language.fromname('Brazilian Portuguese')
<Language [pt-BR]>
>>> babelfish.Language.fromname('Swiss German')
<Language [de-CH]>
I believe babelfish could have at least the demonyms in English and use that to parse the language.
I could try to contribute with this part if you think it makes sense to be part of babelfish.
Some references:
https://en.wikipedia.org/wiki/List_of_adjectival_and_demonymic_forms_for_countries_and_nations
https://github.com/porimol/countryinfo#demonym
https://gist.github.com/consti/e2c7ddc64f0aa044a8b4fcd28dba0700
https://github.com/mledoze/countries/blob/master/countries.json
(sub)hadim boromir ~ $ subliminal /media/hadim/MediaHadi2/movies/Un.Prophete.2009/ -l fr -v
INFO:subliminal.video:Scanning directory u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009/'
INFO:subliminal.video:Scanning video u'Un.Prophete.2009.mkv' in u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009'
INFO:enzyme.mkv:Reading Segment element
INFO:enzyme.parsers.ebml.core:MasterElement EBML ignored
INFO:enzyme.parsers.ebml.core:Maximum level 0 reached for children of MasterElement Segment
INFO:enzyme.mkv:Reading SeekHead element
INFO:enzyme.mkv:Processing element Info from SeekHead at position 4410
INFO:enzyme.mkv:Processing element Tracks from SeekHead at position 4485
Traceback (most recent call last):
File "/home/hadim/local/virtualenvs/sub/bin/subliminal", line 9, in <module>
load_entry_point('subliminal==0.7.1', 'console_scripts', 'subliminal')()
File "build/bdist.linux-x86_64/egg/subliminal/cli.py", line 70, in subliminal
File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 279, in scan_videos
File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in scan_video
File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in <setcomp>
File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/language.py", line 51, in fromcode
return cls(*CONVERTERS[converter].reverse(code))
File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/converters/alpha3b.py", line 34, in reverse
raise ReverseError(alpha3b)
babelfish.exceptions.ReverseError: u'fra'
Shouldn't you catch systematic error ? I guess mkv have a lot of possible code for language... You should only display an error.
In ISO 639-2 dut
is used as alternative to nld
for Dutch language.
Can that be added to babelfish?
I can't get modules from babelfish to do anything. From python see:
from nltk.misc import babelfish #this part works, did pip install bablefish
print babelfish.translate('cookbook', 'english', 'spanish')
AttributeError Traceback (most recent call last)
in ()
----> 1 print babelfish.translate('cookbook', 'english', 'spanish')
AttributeError: 'module' object has no attribute 'translate'
import inspect
inspect.getmembers(babelfish, predicate=inspect.ismethod)
[] # no modules loaded from babelfish.
Partial IETF representation of a Language should be possible: fr-FR, en-US, de
It is currently used in __str__
with alpha3
Once #3 is implemented, the script subtag can be represented aswell: be-Cyrl, fr-Latn
Language.fromname('Greek')
fails. While 'Greek'
by itself is not listed specifically, without qualifiers it would generally be assumed to refer to modern Greek. The current way to get the language from the name would be using the string 'Modern Greek (1453-)'
which is fairly cumbersome and counterintuitive.
Steps to recreate:
from babelfish import Language
Language.fromname('Greek')
Tested on Win 10 / Python 3.5 x64
Is there any plan for babelfish to support IETF regions?
https://en.wikipedia.org/wiki/IETF_language_tag
An optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;
The main point here is to be able to handle es-419 (Spanish (Latin America))
which is a very common way to classify media tracks (in movies or series) for these Spanish languages
Using you're subliminal backend, I construct the languages (with babelfish) ahead of time. But I had one report from a user using Python 2.7.3 who gets an import error (of alpha2) by the time it gets to babelfish's sub libraries.
I don't expect you to support my script at all... I was just curious on how to interpret the last part of the exception being thrown. Maybe it's a bug with babel fish? Maybe it isn't? Have you seen something like that? Perhaps an obvious issue I'm overlooking or a syntax error on my part?
This syntax works for me, so i can't reproduce it
import babelfish
babelfish.Language.fromietf('EN')
Here is the output passed along to me in it's entirety:
./Subliminal.py -S /media/data/Movies/ -l EN -f
2015-10-09 09:26:55,195 - 7241 - INFO - Found 43 matched file(s).
2015-10-09 09:26:55,196 - 7241 - INFO - Using advanced search mode
2015-10-09 09:26:55,201 - 7241 - ERROR - Fatal Exception:
Traceback (most recent call last):
File "./Subliminal/nzbget/ScriptBase.py", line 2398, in run
exit_code = main_function(*args, **kwargs)
File "./Subliminal.py", line 1478, in main
use_nzbheaders=False,
File "./Subliminal.py", line 955, in subliminal_fetch
lang = set( babelfish.Language.fromietf(l) for l in lang )
File "./Subliminal.py", line 955, in <genexpr>
lang = set( babelfish.Language.fromietf(l) for l in lang )
File "./Subliminal/babelfish/language.py", line 124, in fromietf
language = cls.fromalpha2(language_subtag)
File "./Subliminal/babelfish/language.py", line 110, in fromcode
return cls(*language_converters[converter].reverse(code))
File "./Subliminal/babelfish/converters/__init__.py", line 250, in __getitem__
plugin = ep.load(require=False)
File "./Subliminal/pkg_resources.py", line 1948, in load
entry = __import__(self.module_name, globals(),globals(), ['__name__'])
ImportError: No module named alpha2
Thoughts?
Calling some inspect functions with Python 3.4 on babelfish objects crash (maximum recursion depth exceeded). see guessit-io/guessit#109
>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
[....]
RuntimeError: maximum recursion depth exceeded while calling a Python object
I just installed SCALe on a Ubuntu 17.10 VM. I think. I say that because when I do:
cd $SCALE_HOME/scale.app
bundle exec thin start --port 8080
on the VM and went to http://127.0.0.1 on the same VM (on firefox) I get a prompt that says:
----->http://127.0.0.1 is requesting your username and password. The site says: "Application"
I can't tell if this is coming from SCALe or something else. If it is from SCALe, what username and password should I give this prompt.
I have a syntax error with Python 2.6
File "/home/travis/virtualenv/python2.6/lib/python2.6/site-packages/babelfish/converters/alpha3b.py", line 14
SYMBOLS = {iso_language.alpha3: iso_language.alpha3b for iso_language in LANGUAGE_MATRIX if iso_language.alpha3b}
^
SyntaxError: invalid syntax
Hi Diaoul I'm trying to play with babelfish code for future import on sb.
Trying the code in docs and I have this problem:
language = babelfish.Language('por', 'BR')
language return:
Traceback (most recent call last):
File "<pyshell#7>", line 1, in
language
File "C:\Python27\lib\site-packages\babelfish\language.py", line 150, in repr
return '<Language [%s]>' % self
File "C:\Python27\lib\site-packages\babelfish\language.py", line 154, in str
s = self.alpha2
File "C:\Python27\lib\site-packages\babelfish\language.py", line 132, in getattr
raise AttributeError(name)
AttributeError: alpha2
I've installed babelfish without using setup but copying babelfish directory in my python site-packaged directory....I know this is not the best way to install but in this way I can test if the library can work without problem in sickbeard.
thanks.
Mr_Orange.
As per subject. Can they be included?
Language.fromname('Divehi')
fails. 'Divehi'
is a valid alternate spelling (per ISO 639-2) of 'Dhivehi'
which works correctly.
Steps to recreate:
from babelfish import Language
Language.fromname('Divehi')
Tested on Win 10 / Python 3.5 x64
Language.fromname('Pashto')
fails. 'Pashto'
is a valid alternate spelling (per ISO 639-2) of 'Pushto'
which works correctly.
Steps to recreate:
from babelfish import Language
Language.fromname('Pashto')
Tested on Win 10 / Python 3.5 x64
With Python 3.10 on the horizon and #29 implemented to fix the issue of importing from collections
instead of from collections.abc
PyPi could use a new release that reflects this fix.
I have an error in guessit with Python3 (3.3.3 x86 on windows). It runs without any problem on Python 2.7.
For: Movies/Persepolis (2007)/[XCT] Persepolis [H264+Aac-128(Fr-Eng)+ST(Fr-Eng)+Ind].mkv
Traceback (most recent call last):
File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 156, in convert
return self.to_symbol[alpha3]
KeyError: 'xct'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python\x86\Python33\Lib\runpy.py", line 160, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "D:\Python\x86\Python33\Lib\runpy.py", line 73, in _run_code
exec(code, run_globals)
File ".\guessit\__main__.py", line 160, in <module>
main()
File ".\guessit\__main__.py", line 154, in main
advanced=options.advanced)
File ".\guessit\__main__.py", line 37, in detect_filename
print('GuessIt found:', guess_file_info(filename, filetype, info).nice_string(advanced))
File ".\guessit\__init__.py", line 134, in guess_file_info
result.append(_guess_filename(filename, filetype))
File ".\guessit\__init__.py", line 105, in _guess_filename
mtree = IterativeMatcher(filename, filetype=filetype)
File ".\guessit\matcher.py", line 120, in __init__
self._apply_transfo(transformer)
File ".\guessit\matcher.py", line 132, in _apply_transfo
transformer.process(self.match_tree, *all_args, **all_kwargs)
File ".\guessit\transfo\guess_language.py", line 118, in process
SingleNodeGuesser(self.guess_language, None, self.log, *args, **kwargs).process(mtree)
File ".\guessit\transfo\__init__.py", line 151, in process
find_and_split_node(node, strategy, self.skip_nodes, self.logger)
File ".\guessit\transfo\__init__.py", line 80, in find_and_split_node
matcher_result = matcher(*all_args)
File ".\guessit\transfo\guess_language.py", line 37, in guess_language
guess = search_language(string)
File ".\guessit\language.py", line 366, in search_language
if language != 'mul' and not hasattr(language, 'alpha2'):
File ".\guessit\language.py", line 221, in alpha2
return self.lang.alpha2
File "D:\devel\workspace\babelfish\babelfish\language.py", line 130, in __getattr__
return get_language_converter(name).convert(alpha3, country, script)
File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 158, in convert
raise LanguageConvertError(alpha3, country, script)
babelfish.exceptions.LanguageConvertError: xct
As of python 3.8 this will no longer work.
Calling some inspect functions with Python 3.4 on babelfish objects crash with stack overflow. see guessit-io/guessit#109
>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
return getattr(cls, name)
There are some tricks with babelfish currently.
I'm not sure it is the right thing to do to load the entry points during import.
Workflow:
import babelfish
loads entry point subliminal.converter.addic7ed:Addic7edConverter
subliminal.converter.addic7ed:Addic7edConverter
triggers import subliminal
import subliminal
triggers import babelfish
if there is this statement in subliminal/__init__.py
Boom.
Should we use lazy loading?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.