GithubHelp home page GithubHelp logo

xwmx / iso-639 Goto Github PK

View Code? Open in Web Editor NEW
61.0 4.0 19.0 78 KB

Ruby gem with ISO 639-1 and ISO 639-2 language code entries and convenience methods.

License: MIT License

Ruby 100.00%
ruby rubygems iso-639-1 iso-639 iso-639-2 languages

iso-639's Introduction

ISO 639

Build Status Gem Version Gem Downloads

A Ruby gem that provides the ISO 639-2 and ISO 639-1 data sets along with some convenience methods for accessing different entries and entry fields. The data comes from the LOC ISO 639-2 UTF-8 data set.

The ISO 639-1 specification uses a two-letter code to identify a language and is often the recommended way to identify languages in computer applications. The ISO 639-1 specification covers most developed and widely used languages.

The ISO 639-2 (Wikipedia) specification uses a three-letter code, is used primarily in bibliography and terminology and covers many more languages than the ISO 639-1 specification.

Installation

To install from RubyGems:

gem install iso-639

To install with Bundler, add the following to your Gemfile:

gem 'iso-639'

Then run bundle install

Usage

require 'iso-639'

To find a language entry:

# by alpha-2 or alpha-3 code
ISO_639.find_by_code("en")
# or
ISO_639.find("en")
# by English name
ISO_639.find_by_english_name("Russian")
# by French name
ISO_639.find_by_french_name("français")

The ISO_639.search class method searches across all fields and will match names in cases where a record has multiple names. This method always returns an array of 0 or more results. For example:

ISO_639.search("spanish")
# => [["spa", "", "es", "Spanish; Castilian", "espagnol; castillan"]]

Entries are arrays with convenience methods for accessing fields:

@entry = ISO_639.find("slo")
# => ["slo", "slk", "sk", "Slovak", "slovaque"]
@entry.alpha3_bibliographic
# => "slo"
@entry.alpha3 # shortcut for #alpha3_bibliographic
# => "slo"
@entry.alpha3_terminologic
# => "slk"
@entry.alpha2
# => "sk"
@entry.english_name
# => "Slovak"
@entry.french_name
# => "slovaque"

The full data set is available through the ISO_639::ISO_639_1 and ISO_639::ISO_639_2 constants.

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2010 William Melody. See LICENSE for details.

iso-639's People

Contributors

dependabot[bot] avatar merlos avatar msievers avatar xwmx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

iso-639's Issues

Support ISO-639-3

Hi,

Could you support the ISO-639-3 standard? A lot of codes are currently missing (for example for mandarin or cantonese). Here is a list of missing codes I found :

arz
yue
swh
wuu
zsm
qya
pes
nan
acm
lvs
orv
lzh
sjn
nov
ksh
tpw
lld
pms
pnb
npi
avk
prg
crs
ckt
zlm
cbk
lkt
arq
pcd
bar
mhr
mrj
osx
pfl
mgm
dng
liv
vro
apc
jdt
pdc
ppl
shs
mnw
ngt
hif
lzz
oar
brx
mww
hak
nlv
ngu
vec
lou
fuc
gag
lfn
kjh
cyo
urh
kzj
lmo
egl
dtp
max
fuv
nch
hoc
gbm
mvv
ary
kxi
rif
kek
aii
mfe
bvy
bcl
hnj
nst
afb
quc
tmw
bjn
cjy
hsn
gan
tzl
dws
ldn
sgs
vep
rue
tly
ext
swg
izh
jam
cmo
kpv
koi

`qaa-qtz` causes issues

The list at https://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt has one language row as follows:

qaa-qtz|||Reserved for local use|réservée à l'usage local

Running ISO_639.find('qaa-qtz') returns nil. Idem for 'qaa' or 'qtz'.

I would expect ISO_639.find('qaa-qtz') to return a match instead as it seems to be a valid (albeit irregular) alpha3 code. I don't quite understand what this code is actually used for in practice, but it shouldn't break things.

I believe the bug is here: https://github.com/xwmx/iso-639/blob/master/lib/iso-639.rb#L75. It should be 3 or 7, not just 3.

Make a case-insensitive method and split on semicolon

Thanks for making this gem. I am using it to parse code from the web that I didn't write. Unfortunately, the data arrives in all caps, like this:

ENGLISH
GERMAN

I could have downcased it and then upcased the first letter, but I made a proper case-insensitive method instead. I know there is the ISO_639.search() method, but I only want it to match if exact.

Also, Spanish is stored as "Spanish; Castilian". Therefore, find_by_english_name("Spanish") will never find it, so I made this method split on the semicolon.

The following will work:
get_lang_entry("ENGLISH")
get_lang_entry("SPANISH")

require 'iso-639'

def get_lang_entry(lang)
  lang = lang.gsub(/[[:space:]]+/,'')
  lang = lang.downcase
  
  ISO_639::ISO_639_2.each do |e|
    ls = e.english_name
    ls = ls.split(';')
    
    ls.each do |l|
      l = l.gsub(/[[:space:]]+/,'')
      l = l.downcase
      
      return e if l == lang
    end
  end
    
  return nil
end

Lookup fails for 'Dutch' and 'Spanish'

Both lookups return null, although one would think the input is acceptable. This fails, because it is not the exact string representation of the entry here and here.

ISO_639.find_by_english_name("Dutch")
=> nil
ISO_639.find_by_english_name("Spanish")
=> nil

Freeze

There are a lot of duplicate string objects (especially empty strings) being allocated.

These are really meant to be read-only lookup tables, so freezing makes sense in that regard.

I have the following monkey-patch, I wonder if this would be worthwhile to freeze everything for optimization:

require 'iso-639'

ISO_639::ISO_639_2.each do |entry|
  entry.each do |str|
    str.freeze
  end
  entry.freeze
end
ISO_639::ISO_639_2.freeze

Feature idea: 1:1 mapping of code to native name

I tried this gem out briefly for use in a language switcher widget, where I want to take a list of available locales — the language codes from 639-1 or 639-2 — and turn it into a select widget with all the native names of the languages.

I was hoping for a method to get that directly, like:

ISO_639.find_native_name('am') # => 'አማርኛ'

The iso-639-1 npm module does something similar.

missing file in latest release!

lib/data/ISO-639-2_utf-8.txt is missing from gem release 0.3.0, so the gem update will not run. Please include file and rerelease.

unknown encoding name - bom|utf-8

When upgrading to iso-639 0.3.1 (from 0.2.8) I get this error:

  Gem Load Error is: unknown encoding name - bom|utf-8
  Backtrace for gem load error is:
  /home/travis/.rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/csv.rb:1532:in `find'
  /home/travis/.rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/csv.rb:1532:in `initialize'
  /home/travis/.rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/csv.rb:1280:in `new'
  /home/travis/.rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/csv.rb:1280:in `open'
  /home/travis/.rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/csv.rb:1141:in `foreach'
  /home/travis/build/sul-dlss/gis-robot-suite/vendor/bundle/ruby/2.5.0/gems/iso-639-0.3.1/lib/iso-639.rb:24:in `each'
  /home/travis/build/sul-dlss/gis-robot-suite/vendor/bundle/ruby/2.5.0/gems/iso-639-0.3.1/lib/iso-639.rb:24:in `block in <class:ISO_639>'

Add csv dependency

I see this warning in my app:

/opt/app/purl/purl/shared/bundle/ruby/3.3.0/gems/zeitwerk-2.6.15/lib/zeitwerk/kernel.rb:34: warning: csv was loaded from the standard library, but will no longer be part of the default gems since Ruby 3.4.0. Add csv to your Gemfile or gemspec. Also contact author of iso-639-0.3.6 to add csv into its gemspec.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.