ltgoslo / simple_elmo Goto Github PK

View Code? Open in Web Editor NEW

51.0 8.0 3.0 176 KB

Simple library to work with pre-trained ELMo models in TensorFlow

Home Page: https://pypi.org/project/simple-elmo/

License: GNU General Public License v3.0

Python 100.00%

elmo embeddings nlp tensorflow

simple_elmo's People

Stargazers

Watchers

Forkers

kiselev-nikolay rnekrasov-msk ksr313

simple_elmo's Issues

KeyError: "There is no item named 'vocab.txt' in the archive"

Thanks for great work!
I have downloaded 170-model from NLPL word embeddings repository (Russian CoNLL17 corpus, ELMo) - it does not have vocab.txt in it and KeyError: "There is no item named 'vocab.txt' in the archive" occurred when trying to load it into the model, but I thought it was optional as described in docs. Any hints how to resolve?

     64                 )
     65             zf = zipfile.ZipFile(directory)
---> 66             vocab_file = zf.open("vocab.txt")
     67             options_file = zf.open("options.json")
     68             weight_file = zf.open("model.hdf5")

~/.pyenv/versions/3.7.3/lib/python3.7/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1465         else:
   1466             # Get info object for name
-> 1467             zinfo = self.getinfo(name)
   1468 
   1469         if mode == 'w':

~/.pyenv/versions/3.7.3/lib/python3.7/zipfile.py in getinfo(self, name)
   1393         if info is None:
   1394             raise KeyError(
-> 1395                 'There is no item named %r in the archive' % name)
   1396 
   1397         return info

KeyError: "There is no item named 'vocab.txt' in the archive"```

read model data directly from archive file

to avoid the need for users to copy the files out of the NLPL vectors repository, how much effort would be required to make the code read its data directly out of the zip archive?

Not having any "vocab.txt" doesn't work when loading

In the README file, it says the following:

One can also provide a vocab.txt/vocab.txt.gz file in the same directory: a one-word-per-line vocabulary of words to be cached (as character id representations) before inference. Even if it is not present at all, ELMo will still process all words normally. However, providing the vocabulary file can slightly increase inference speed when working with very large corpora (by reducing the amount of word to char ids conversions).

However, when I tried to load the model with the zip file it shows the following error:

KeyError Traceback (most recent call last)
Cell In[57], line 1
----> 1 model.load('./elmo-english.zip')

File ~\anaconda3\envs\python_old\lib\site-packages\simple_elmo\elmo_helpers.py:84, in ElmoModel.load(self, directory, max_batch_size, limit, full)
80 raise SystemExit(
81 "Error: loading models from ZIP archives requires Python >= 3.7."
82 )
83 zf = zipfile.ZipFile(directory)
---> 84 vocab_file = zf.read("vocab.txt").decode("utf-8")
85 options_file = zf.read("options.json").decode("utf-8")
86 weight_file = zf.open("model.hdf5")

File ~\anaconda3\envs\python_old\lib\zipfile.py:1475, in ZipFile.read(self, name, pwd)
1473 def read(self, name, pwd=None):
1474 """Return file bytes for name."""
-> 1475 with self.open(name, "r", pwd) as fp:
1476 return fp.read()

File ~\anaconda3\envs\python_old\lib\zipfile.py:1514, in ZipFile.open(self, name, mode, pwd, force_zip64)
1511 zinfo._compresslevel = self.compresslevel
1512 else:
1513 # Get info object for name
-> 1514 zinfo = self.getinfo(name)
1516 if mode == 'w':
1517 return self._open_to_write(zinfo, force_zip64=force_zip64)

File ~\anaconda3\envs\python_old\lib\zipfile.py:1441, in ZipFile.getinfo(self, name)
1439 info = self.NameToInfo.get(name)
1440 if info is None:
-> 1441 raise KeyError(
1442 'There is no item named %r in the archive' % name)
1444 return info

KeyError: "There is no item named 'vocab.txt' in the archive"

I used python version 3.8.16 in jupyter notebook. The function model.load() doesn't work when there is no 'vocab.txt' in the archive.

Tensorflow version

Please, specify the version of Tensorflow.
Thank you!

Loading of ELMOForManyLangs models

It would be good to transparently load the ELMOForManyLangs models.

Ошибка загрузки модели под Win

Добрый день,

при загрузке ELMO модели (немного доработанный get_elmo_vectors.py) в анаконде под Win с python 3.7.1 столкнулся с ошибкой в строке bilm/data.py:29

Traceback (most recent call last):
  File "run_elmo1.py", line 17, in <module>
    batcher, sentence_character_ids, elmo_sentence_input = load_elmo_embeddings(elmo_dir)
  File "E:\github\simple_elmo\elmo_helpers.py", line 81, in load_elmo_embeddings
    batcher = Batcher(vocab_file, 50)
  File "E:\github\simple_elmo\bilm\data.py", line 207, in __init__
    lm_vocab_file, max_token_length
  File "E:\github\simple_elmo\bilm\data.py", line 118, in __init__
    super(UnicodeCharsVocabulary, self).__init__(filename, **kwargs)
  File "E:\github\simple_elmo\bilm\data.py", line 29, in __init__
    for line in f:
  File "C:\Users\eek\Anaconda3\envs\tf21\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 300: character maps to <undefined>

После добавления руками encoding='utf-8' в вызове open() в строке 27 модель загрузилась и отработал инференс.

FileNotFoundError: [Errno 2] No such file or directory: '*\options.json'

Hi, Thank you for your great job, i installed simple-elmo and I downloaded arabic model; and after I run this line :
`from simple_elmo import ElmoModel

model = ElmoModel()
model.load("*/136")
model.get_elmo_vectors(sents)`

I get this error :
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/admin/Desktop/Data/136\\options.json'

Thank you.

Questions about integrating your models & code

Thanks for this repo. I am interested in integrating some of your ELMo models (eg, ID 162 for Latin [http://vectors.nlpl.eu/repository/20/162.zip]) into the CLTK (an NLP framework for ancient/dead languages).

Would you be the right one to answer a few questions about loading these models? I have some questions that may seem very simple to you :)

You have pinned your code to tensorflow to 1.15.2. Is there a specific reason for this? If including TF for my users, I would prefer it to be a little newer. Also, do you have any idea whether keras could load these files?
What other libraries are available to load these models? I ask because there are multiple Python projects capable of using (and sometimes fine-tuning) ELMo, however their conventions for naming files are different. For example, the ELMo directory from NLPL has (config.json, meta.json, word.dic, char.dic, encoder.pkl, token_embedder.pkl) yet I do not see such file types when looking at the "big" ELMo libraries like https://github.com/allenai/allennlp/ .

ltgoslo / simple_elmo Goto Github PK

simple_elmo's People

Stargazers

Watchers

Forkers

simple_elmo's Issues

KeyError: "There is no item named 'vocab.txt' in the archive"

read model data directly from archive file

Not having any "vocab.txt" doesn't work when loading

Tensorflow version

Loading of ELMOForManyLangs models

Ошибка загрузки модели под Win

FileNotFoundError: [Errno 2] No such file or directory: '*\options.json'

Questions about integrating your models & code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs