studio-ousia / luke Goto Github PK

View Code? Open in Web Editor NEW

703.0 703.0 102.0 4.17 MB

LUKE -- Language Understanding with Knowledge-based Embeddings

License: Apache License 2.0

Python 28.62% Jupyter Notebook 71.38%

luke's People

Contributors

Stargazers

Watchers

Forkers

albertvillanova randomwalker300 manishiitg valmeau yifding joelniklaus dadelani nair-p blizda aiedward zjigin kankajan814 raabia-asif haonguyen1915 emanuelaboros nielsrogge rogervaas phimachine ngobibibnbe patelrajnath mughilm peleiden pombredanne szalata nguyenvanhoang7398 praveenvattem seanswyi theblackcat102 gazzola shuaidop ruiqizhu26 trinh-hoang-hiep issifuabdulmajeed rungsiman zouning68 vsocrates aerigon aqa6666 pervrosen 18106574249 hirokiu jinpeng01 krotonus jaedukseo techthiyanes afiqmuzaffar sarthakksu swing-zhou rpiryani altraman xuzf-git xhuangsh mehdibenamorr jli56 yoreg123 sreyan88 yonghyunryu meghanaverma12 22842219 helw150 lemaoliu kunlun-zhu scherkadi lucidviews nicemartin wangyuxuan93 ryokan0123 tdtrinh11 csanny brycee2003 valdegg phucty agolo katsumata420 krishnan-chandra lukeha98 mrzilinxiao liuxiaoqun kyunghyunlim fulayjan tricktwog hertera1 bodoralsabti michal-olek wuyangfeng sra1197 anasalrimawi chantera dashixiongs leiyuhan427 gabrielandrade2 apoudelnd qbitsofalchemy singletongue hironaga-1 rubybit txti chuyg1005 banyous asherbond

luke's Issues

QA task mismatch issues

Hi, I met a problem when I run the code, do you have time to give me some suggestions? Thank you very much!

The error is as below:

Traceback (most recent call last):
File "/data/zhangxy/condenv/torch12/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/data/zhangxy/condenv/torch12/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/zhangxy/LUKE/luke-master/examples/cli.py", line 132, in
cli()
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/data/zhangxy/LUKE/luke-master/examples/utils/trainer.py", line 32, in wrapper
return func(*args, **kwargs)
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "/data/zhangxy/LUKE/luke-master/examples/reading_comprehension/main.py", line 123, in run
model.load_state_dict(torch.load(args.checkpoint_file, map_location="cpu"))
File "/data/zhangxy/LUKE/luke-master/luke/model.py", line 234, in load_state_dict
super(LukeEntityAwareAttentionModel, self).load_state_dict(new_state_dict, *args, **kwargs)
File "/data/zhangxy/condenv/torch12/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for LukeForReadingComprehension:
size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50268, 1024]) from checkpoint, the shape in current model is torch.Size([50265, 1024]).
size mismatch for entity_embeddings.entity_embeddings.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([500000, 256]).

and I use the following command to run the code：
python -m examples.cli --model-file=./resources/luke_large_500k.tar.gz --output-dir=./output/ reading-comprehension run --data-dir=./resources/SQuAD11/ --checkpoint-file=./resources/pytorch_model.bin --no-negative --wiki-link-db-file=./resources/luke_squad_wikipedia_data/enwiki_20160305.pkl --model-redirects-file=./resources/luke_squad_wikipedia_data/enwiki_20181220_redirects.pkl --link-redirects-file=./resources/luke_squad_wikipedia_data/enwiki_20160305_redirects.pkl --no-train

I tried many times but failed, I cannot find the problem. /a little sad…

Size mismatches on base model SQuAD experiment

Hello,
I was attempting to run the SQuAD experiment from the checkpoint file. I quickly got an out of memory on my GPU when using the large model, so I attempted to use the base model. Unfortunately, when I ran the command, after a while I got a huge list of size mismatches on the model layers, mostly between 1024 and 768. I am a bit unclear on where the error is originating from and how to debug it. Any assistance is greatly appreciated.

I am currently working on an EC2 machine consisting of Tesla T4s. The error is below:

Traceback (most recent call last):
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/dl/mughil/luke/examples/cli.py", line 132, in <module>
    cli()
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/dl/mughil/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/dl/mughil/luke/examples/reading_comprehension/main.py", line 123, in run
    model.load_state_dict(torch.load(args.checkpoint_file, map_location="cpu"))
  File "/home/dl/mughil/luke/luke/model.py", line 234, in load_state_dict
    super(LukeEntityAwareAttentionModel, self).load_state_dict(new_state_dict, *args, **kwargs)
  File "/usr/local/finra/anaconda/anaconda3/envs/LUKE_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for LukeForReadingComprehension:
        size mismatch for encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.0.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.0.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.0.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.0.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.0.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.1.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.1.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.1.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.1.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.1.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.2.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.2.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.2.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.2.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.2.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.3.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.3.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.3.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.3.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.3.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.4.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.4.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.4.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.4.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.4.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.5.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.5.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.5.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.5.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.5.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.6.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.6.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.6.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.6.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.6.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.7.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.7.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.7.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.7.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.7.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.8.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.8.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.8.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.8.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.8.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.9.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.9.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.9.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.9.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.9.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.10.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.10.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.10.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.10.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.10.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.key.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.key.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.value.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.value.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.w2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.w2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.e2w_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.e2w_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.self.e2e_query.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.self.e2e_query.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.output.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for encoder.layer.11.attention.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.attention.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.intermediate.dense.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for encoder.layer.11.intermediate.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for encoder.layer.11.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for encoder.layer.11.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.layer.11.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for pooler.dense.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for pooler.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50265, 1024]) from checkpoint, the shape in current model is torch.Size([50265, 768]).
        size mismatch for embeddings.position_embeddings.weight: copying a param with shape torch.Size([514, 1024]) from checkpoint, the shape in current model is torch.Size([514, 768]).
        size mismatch for embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([1, 1024]) from checkpoint, the shape in current model is torch.Size([1, 768]).
        size mismatch for embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for entity_embeddings.entity_embedding_dense.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 256]).
        size mismatch for entity_embeddings.position_embeddings.weight: copying a param with shape torch.Size([514, 1024]) from checkpoint, the shape in current model is torch.Size([514, 768]).
        size mismatch for entity_embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([1, 1024]) from checkpoint, the shape in current model is torch.Size([1, 768]).
        size mismatch for entity_embeddings.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for entity_embeddings.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for qa_outputs.weight: copying a param with shape torch.Size([2, 1024]) from checkpoint, the shape in current model is torch.Size([2, 768]).

Will the model evaluation not work on the base model and I must use the large model?
Thanks!

releasing running steps and pre-trained model

Really great work~
By injecting entity linking as a pre-train task with masked-language-model task, LUKE obtained great performance than previous BERT related model.
Was wondering when the running steps, evaluation parts and pre-trained models are going to be released?

Two questions: 1.Release entity vocab's wikipedia pageid? 2. Does [mask] occupy bert's 512 input?

Right now, some titles in entity vocab can not align to a unique wikipedia paged or wikidata entity id: some are missing, and some titles refer to a same pageid. Can you release the mapping between entity vocab's title and wikipedia pageid / wikidata entity id?
It seems that [mask]s for span representation don't occupy bert's 512 input? For example, I have a sequence with 512 tokens, and I want to use LUKE to extract 10 spans, then I can input 512 tokens+10 [mask], rather than 502 tokens + 10 [mask]? (as long as [mask] position embedding is correctly aligned to the 10 spans?)

NER labels problem

I want to know the purpose of the "NIL" label. because the CoNLL03 datasets don't contain it

ONNX the model?

The base roberta-large model is well, large. It takes a lot of GPU to use this model. Is it possible to ONNX it such that we can run it on CPU?

I believe roberta is supported by ONNX but due to complicated nature of the Luke model, I found it difficult to just use the instructions outlined by HF team.

Apex and Pyramid version question

Hi. May I ask what version of Apex and Pyramid you used when you ran your model? Right now my Apex version is 0.9.10.dev0 and Pyramid is 2.0.

Whenever I try to import Apex I get the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/apex/__init__.py", line 13, in <module>
    from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

Judging by my impression that Apex isn't really actively maintained, I've tried downgrading Pyramid to a compatible version (i.e., > 1.1.2) but that returns a different error.

I'm just curious what versions you guys used in order to try and match that. Thanks.

Base size model

Hi,

Thanks for releasing the great code and model. Could you plan to release the roberta-base size model to meet low computation resource scenario?

Best
Deming

cannot import name 'BertLayerNorm' from 'transformers.modeling_bert'

Assertion error with CONLL03

Hi, here I met another problem when using luke on NER dataset CONLL03...
When creating features from examples, the variable entity_labels is empty at some examples, like train-945:

guid=train-945
words=['SOCCER', '-', 'ENGLISH', 'SOCCER', 'RESULTS', '.', 'LONDON', '1996-08-30', 'Results', 'of', 'English', 'league', 'matches', 'on', 'Friday', ':', 'Division', 'two', 'Plymouth', '2', 'Preston', '1', 'Division', 'three', 'Swansea', '1', 'Lincoln', '2']
labels=['O', 'O', 'B-MISC', 'O', 'O', 'O', 'B-LOC', 'O', 'O', 'O', 'B-MISC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'B-ORG', 'O', 'O', 'O', 'B-ORG', 'O', 'B-ORG', 'O']

and the code here throws an AssertionError:

luke/examples/ner/utils.py

Line 239 in 9323b21

assert not entity_labels

Do you have any idea what's wrong with these examples?

The result of conll2003

Hello, what are your specific hyperparameters in the entity recognition task? I can’t reproduce the results on the paper. Thank you so much.

Benchmark on FewRel dataset

Let me start off my saying this that this is one of the cleanest repository from any published paper I have ever seen.

Standing ovation for the maintainers 👏👏👏👏👏👏

Now my question : Can we also provide instructions on fine-tuning and evaluating on FewRel data (Relation classification)? TacRED is hidden behind a paywall and has complicated procedures to obtain for any commercial usage.

Unable to reproduce NER numbers using the released checkpoint

Hi, i was trying to evaluate your NER checkpoint on CONLL2003 test set. I was getting close to 0.72F1. Did you by chance face this issue? I gave a try using the test sets from here and here. Both evaluations have similar F1. Could someone please look into this issue. Thank you!

       precision    recall  f1-score   support

  PER     0.9639    0.4338    0.5984      1602
 MISC     0.8680    0.5720    0.6896       701
  LOC     0.9596    0.7407    0.8360      1666
  ORG     0.9418    0.5792    0.7173      1647

micro avg 0.9431 0.5848 0.7219 5616
macro avg 0.9442 0.5848 0.7151 5616

Cannot Install Pretrained File. Network Error.

Unable to download the pre-trained file. Network error after ~150 Mb.

Can this be uploaded to S3?

Poetry Issue

Hi, I'm new to using poetry, but when I run $poetry install, several packages fail to install, including h5py, scipy, and grpci. Here is the beginning of the error message for the h5py failure along with the poetry and system version:

and here is the end of that error message:

I've updated the python version of poetry to 3.9 and installed the pip 21.0 but still get similar issues.

Also, here is the beginning of the scipy error:

and its end:

About the version of transformers

Hi, respected authors

I want to know what version of transformers is used for this project.

I met some issues when I run the code, which said "ModuleNotFoundError: No module named 'transformers.modeling_bert“，and"ImportError: cannot import name 'BertLayerNorm'", something like this. So I guess it was because I used wrong version of transformers, which is 4.5.0. Thank you!

Extract contextualized word representations for Bi-LSTM-CRF NER models?

Hi, thanks for your code and instructions! I want to try to use Luke to get contextualized word representations (like BERT) and apply them for distributed representations of input to be integrated into my word-level sequence labeling model. However, the performance of my model dropped significantly (on the CoNLL2003 test dataset, the f1 score dropped from 90%+ to 20%+). My method of extracting contextualized word representations from LUKE is here:
I download the “luke_large_500k.tar.gz" pre-trained model and use the code for NER to get word representations for input.
Specifically, I gather the “word_hidden_states” tensor with self-constructed index to obtain the representation of each word in the sentence.
I have checked the entire process and found no errors. What I am confused about is why the use of the extracted word representation from LUKE will degrade the performance of the model so much. Is it inappropriate to use LUKE to obtain contextualized word representations and apply them to other sequence labeling models?
Thank you very much!

How to evaluate using the saved checkpoint?

Hi, I wonder how to evaluate using the saved checkpoint after training.
And I meet TypeError when I train Named Entity Recognition on CoNLL-2003 Dataset:

python -m examples.cli --model-file=luke_large_500k.tar.gz --output-dir=<OUTPUT_DIR> ner run --data-dir=<DATA_DIR> --fp16 --train-batch-size=2 --gradient-accumulation-steps=2 --learning-rate=1e-5 --num-train-epochs=5

The error is :

Eval: 100%|##########| 181/181 [04:43<00:00,  1.56s/it]
Traceback (most recent call last):
  File "/data1/zekangli/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data1/zekangli/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/apdcephfs/share_47076/zekangli/multilingual/luke/examples/cli.py", line 135, in <module>
    cli()
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/apdcephfs/share_47076/zekangli/multilingual/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/apdcephfs/share_47076/zekangli/multilingual/luke/examples/ner/main.py", line 90, in run
    results.update({f"dev_{k}": v for k, v in evaluate(args, model, "dev", dev_output_file).items()})
  File "/apdcephfs/share_47076/zekangli/multilingual/luke/examples/ner/main.py", line 155, in evaluate
    print(seqeval.metrics.classification_report(final_labels, final_predictions, digits=4))
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/seqeval/metrics/sequence_labeling.py", line 697, in classification_report
    suffix=suffix
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/seqeval/metrics/sequence_labeling.py", line 139, in precision_recall_fscore_support
    extract_tp_actual_correct=extract_tp_actual_correct
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/seqeval/metrics/v1.py", line 122, in _precision_recall_fscore_support
    check_consistent_length(y_true, y_pred)
  File "/data1/zekangli/anaconda3/lib/python3.7/site-packages/seqeval/metrics/v1.py", line 97, in check_consistent_length
    raise TypeError('Found input variables without list of list.')
TypeError: Found input variables without list of list.
The number of labels: 51362

[Entity Span QA] The shape of logits is not consistant with the shape of labels

Hi~ Thank you for your released code. But when I run the Entity Span QA task, I get the error:

It was invoked by this line

In my understanding, logits is every token's propbility, but labels is only entities' labels, so maybe there is an error?

Entity Disambiguation Supporting Files

I saw that there is a folder for entity disambiguation but it seems to require more files such as the "person names" file. Could you add some instructions on how to run the entity disambiguation system please? Thanks

Question Regarding Entity Embeddings

Hi. I just had a question regarding the entity embeddings used in the model. Specifically, I'm trying to run the LUKE model on relation extraction.

I've noticed that the entity_vocab_size is set to 3, and the entity_ids are hard-coded to be [1, 2]. I'm just curious why this might be? My impression for [1, 2] is that the TACRED dataset assumes that there are only two entities in a given text and therefore the entity IDs would be the same, but please let me know if I'm incorrect.

Also, do you think it would be possible to run LUKE on datasets where a data sample may have a variable number of entity mentions? My impression is that the implementation wouldn't be straightforward because I'm getting the feeling that utilizing the EntityEmbeddings may be a bit tricky.

Thanks for your help.

Pretraining on multiple GPU's

First off, thanks for the amazing work. However, I have some trouble getting pretraining to work on multiple GPU's.

I am trying to use the pretraining code for training LUKE in another language. To start pretraining, I run python -m luke.cli pretrain with the --parallel flag. On this line, a command starting with luke is made, but this fails when given to subprocess.Popen. Does luke refer to something I need in my path, a shell script not in the repo, some result of click, or something entirely different?

Wiki data for pretraining

Hi, I'm very interested in your nice work and planning to reproduce the model results.
Could you please provide the download link of the wiki data for LUKE pretraining?
Thanks in advance.

Entity type embedding

Thank you for your impressive work!

I have a question about "entity type embedding", e. I'm wondering why entity type embedding is necessary. Your LUKE has two position embeddings for word and entity, C_i and D_i, respectively.

To my understanding, D_i can contain information of e. We can create the logically same model by changing parameter values as follows: D_i <- D_i + e, e <- 0. In other words, D_i and e seem to be redundant.

Why is entity type embedding necessary for LUKE? Does it help to make training more stable?

About the results of CoNLL 2003 NER

Hi, I'm very interested in your great work! In the paper, results are reported based on document-level context following BERT. Since a lot of previous work (including some of the results in the paper) used sentence-level context only, I'm wondering how LUKE performs on the CoNLL NER dataset with sentence-level context. It will be very helpful if you can report the results based on the sentence-level context.

Model training duration

Hello ! Great paper and really cool idea overall !
Could you please report how long pretraining took (with the number of gpus) as well as training duration for the NER task on Conll 2003 ?

two question about CONLL2003

I get 94.3 f1 value in the test dataset of CONLL2003. But I have some question:

the label '' O '' don't participate in the evaluation process right?
how to deal with the prefix of label, such as 'B-PER' and 'I-PER' since I can only see the result of 'PER' ?
Thanks a lot~

A problem with evaluation on CoNLL dataset

Hi, thanks for the great work and very good documentation of your work.

I came across the following error:

The number of labels: 51362
Traceback (most recent call last):
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/ha1/czuk/nlp/eclipse/workspace_deepner/luke/luke/examples/cli.py", line 135, in <module>
    cli()
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/ha1/czuk/nlp/eclipse/workspace_deepner/luke/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/ha1/czuk/nlp/eclipse/workspace_deepner/luke/luke/examples/ner/main.py", line 90, in run
    results.update({f"dev_{k}": v for k, v in evaluate(args, model, "dev", dev_output_file).items()})
  File "/ha1/czuk/nlp/eclipse/workspace_deepner/luke/luke/examples/ner/main.py", line 155, in evaluate
    print(seqeval.metrics.classification_report(final_labels, final_predictions, digits=4))
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/seqeval/metrics/sequence_labeling.py", line 697, in classification_report
    suffix=suffix
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/seqeval/metrics/sequence_labeling.py", line 139, in precision_recall_fscore_support
    extract_tp_actual_correct=extract_tp_actual_correct
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/seqeval/metrics/v1.py", line 122, in _precision_recall_fscore_support
    check_consistent_length(y_true, y_pred)
  File "/home/czuk/anaconda3/envs/luke/lib/python3.6/site-packages/seqeval/metrics/v1.py", line 97, in check_consistent_length
    raise TypeError('Found input variables without list of list.')
TypeError: Found input variables without list of list.

I guess it might be related to the version of seqeval. I am using seqeval==1.2.2. Is it ok?

Results of ner is not displayed in logs

Hi, a small bug here: the results are not rendered in logging info

luke/examples/ner/main.py

Line 93 in 9323b21

logger.info("Results: %s", json.dumps(results, indent=2, sort_keys=True))

This line should be logger.info("Results: %s" % json.dumps(results, indent=2, sort_keys=True)) right?^_^

Adding LUKE to HuggingFace Transformers

Hi,
Is there a possibility to reproduce results for NER on CPU instead of the default GPU configuration? I am unable to find any resource for this on the repo.

I am using the following command, but there seems to be no flag/argument available to switch between CPU and GPU?

python -m examples.cli --model-file=luke_large_500k.tar.gz --output-dir=<OUTPUT_DIR> ner run --data-dir=<DATA_DIR> --fp16 --train-batch-size=2 --gradient-accumulation-steps=2 --learning-rate=1e-5 --num-train-epochs=5

Thanks in advance!

Redirect entity titles in entity disambigution

Hi, nice work !
I am interested in how LUKE performs on Entity Disambigution tasks. However, I find there is a gap on the entity titles between these benchmark datasets and the pre-training wikipedia dump. And I find that there may be a redirect file

valid_titles = None
if wikipedia_titles_file:
    with open(wikipedia_titles_file) as f:
        valid_titles = frozenset([t.rstrip() for t in f])

redirects = {}
if wikipedia_redirects_file:
    with open(wikipedia_redirects_file) as f:
        for line in f:
            (src, dest) = line.rstrip().split("\t")
            redirects[src] = dest

Could you please share these files? Thanks a lot !

Is there an easy way to perform relationship inference on a single example?

The results from the EMNLP paper look really promising, so I would like to manually try the model on a couple examples of my own data. Is there an easy way to do this? I can easily run my data through spaCy to find entities, but do I need to dump each example to a file in the TACRED format, then run the command line script, etc?

Pretraining instruction

Hi authors,

Awesome work! Thanks for your codes and instructions. Recently, I want to pretrain a new Luke model on my own dataset. Could you write a pretraining instruction so I can learn? Thank you!

ImportError: cannot import name 'amp' from 'apex' (unknown location)

Hi,

First and for most - very exciting work, I've been waiting for the release.

I am trying to run:

python -m examples.cli --model-file=luke_large_500k.tar.gz --output-dir=RelationClassification relation-classification run --data-dir=data/tacred --fp16 --train-batch-size=4 --gradient-accumulation-steps=8 --learning-rate=1e-5 --num-train-epochs=5

But I get an ImportError: ImportError: cannot import name 'amp' from 'apex' (unknown location)

The errror comes from line 53 (luke/examples/utils/trainer.py). I cannot find the correct file needed here?

the result of squad

I use poetry to build experiment environment, and I try to reproduce your paper performance follow your advise, bug failed.
EM F1
paper 90.2 95.4
your checkpoint 89.76 94.97
your finetune 89.04 94.69

do you know the reason?

Training on Custom Dataset for Named entity Recognization

Q1. How Can I train this model using my own dataset? (step-by-step will be helpful)
Q2. What will be the annotations formate?

Is the released model suitable for languages other than English

Thx :p

Running provided RE code returns `RuntimeError: pytorch_model.bin is a zip archive` error

Hi. I downloaded the model and checkpoint you provided and am trying to run relation extraction. Per a previous Issue, I did as you suggested and have managed to set up the packages as you specify.

Specifically, I'm running the command:

python -m examples.cli \
--model-file=/hdd1/user/research/luke/luke_large_500k.tar.gz \
--output-dir=/hdd1/user/research/luke \
relation-classification run \
--data-dir=/hdd1/user/data/TACRED/data/json \
 --checkpoint-file=/hdd1/user/research/luke/pytorch_model.bin \
--no-train

but this returns a RuntimeError: /hdd1/user/research/luke/pytorch_model.bin is a zip archive (did you mean to use torch.jit.load()?). According to this PyTorch Discussion Forum answer it seems this may stem from using differing PyTorch version when saving and loading checkpoints? I'm not sure if I can address this issue though. The PyTorch version that I've installed using the requirements.txt is 1.2.0.

Would you have any idea what may be causing this issue? Thanks.

Here's the entire traceback in case it provides some more info:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'r_v2\nq\x02('

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 1095, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 1037, in frombuf
    chksum = nti(buf[148:156])
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/torch/serialization.py", line 555, in _load
    return legacy_load(f)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/torch/serialization.py", line 466, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 1593, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/home/user/anaconda3/envs/luke/lib/python3.7/tarfile.py", line 2301, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/luke/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/user/github/luke/examples/cli.py", line 132, in <module>
    cli()
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/user/github/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/user/github/luke/examples/relation_classification/main.py", line 110, in run
    model.load_state_dict(torch.load(args.checkpoint_file, map_location="cpu"))
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/user/anaconda3/envs/luke/lib/python3.7/site-packages/torch/serialization.py", line 559, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /hdd1/user/research/luke/pytorch_model.bin is a zip archive (did you mean to use torch.jit.load()?)

Unable to run NER example on Colabs.

In trying to run the CoNLL-2003 example on Google Colabs and faced the following error:

ModuleNotFoundError: No module named 'apex'

While trying to run this command:

python -m examples.cli --model-file=luke_large_500k.tar.gz \ --output-dir=/content/output ner run \ --data-dir=/content/upload --fp16 \ --train-batch-size=2 \ --gradient-accumulation-steps=2 --learning-rate=1e-5 \ --num-train-epochs=5

I used poetry to set up luke but it seems that it does not install apex.
Is there any way to get around this other than fiddling with the poetry lock?
Is this the intended behaviour? If so then why?

The code to reproduce this easily can be found here.

I also have a copy of a portion of the CoNLL-2003 dataset in the same repo which should be uploaded to the colabs before the notebook can function properly.

P.S. There was another issue about the following line (line 70) in luke/examples/ner/utils.py:
assert sentence_boundaries[0] == 0 which failed because the sentence_boundaries variable was initialized without 0 (which marks the first character location), I worked around it by initializing it as sentence_boundaries = [0].

Memory leak in training?

Hi, thanks for your great work!
I was using LUKE for RC on TACRED dataset, but the training always aborts due to OOM problem during the training process, which is unlikely to happen as the memory should stay the same during the training? Is there any possibility of memory leak?

[2020-12-09 15:33:29,940] [INFO] Loading features from cached file data/tacred/cached_roberta_30_train.pkl ([email protected]:188)
epoch: 0 loss: 1.7077439:   1%|▉                                                                                        | 1703/170310 [11:36<19:09:14,  2.45it/s]
Traceback (most recent call last):
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/llq/homework/luke/examples/cli.py", line 135, in <module>
    cli()
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/llq/homework/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/llq/homework/luke/examples/relation_classification/main.py", line 94, in run
    trainer.train()
  File "/home/llq/homework/luke/examples/utils/trainer.py", line 93, in train
    outputs = model(**inputs)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/homework/luke/examples/relation_classification/model.py", line 38, in forward
    entity_attention_mask,
  File "/home/llq/homework/luke/luke/model.py", line 213, in forward
    return self.encoder(word_embeddings, entity_embeddings, attention_mask)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/homework/luke/luke/model.py", line 342, in forward
    word_hidden_states, entity_hidden_states, attention_mask
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/homework/luke/luke/model.py", line 325, in forward
    word_hidden_states, entity_hidden_states, attention_mask
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/homework/luke/luke/model.py", line 308, in forward
    word_self_output, entity_self_output = self.self(word_hidden_states, entity_hidden_states, attention_mask)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/homework/luke/luke/model.py", line 287, in forward
    attention_probs = self.dropout(attention_probs)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/dropout.py", line 54, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/home/llq/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 807, in dropout
    else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 10.91 GiB total capacity; 8.55 GiB already allocated; 8.50 MiB free; 8.75 GiB reserved in total by PyTorch)

Entity Linking Result？

Hi authors,
I notice that

Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities
https://arxiv.org/abs/1909.00426

is also your work, and that one is nearly the same as LUKE except several slight difference. It is also awesome and is Entity Linking SOTA, but why don't you show the Entity Linking perfromance in LUKE?

Unable to reproduce SQuAD1.1 using the released checkpoint

Hi, Thanks for your code and all the instructions below.
I downloaded a checkpoint in README SQuAD 1.1 section, but I can't get a similar score.

Here is the result.
{"exact": 76.16840113528855, "f1": 84.46851789225647, "total": 10570, "HasAns_exact": 76.16840113528855, "HasAns_f1": 84.46851789225647, "HasAns_total": 10570}

I finetuned your pre-trained model and I got about 87 exact match scores at that time(without fp16).
Could you tell me any things to check?

Thanks a lot!

Pretraining Problem

Hi @ikuyamada,

Thanks for your amazing work about this entity-aware language model. I am interested to build a LUKE model for the Indonesian language. Since I couldn't find any documentation about how to train the model, I did the following steps:

Build the dump DB (build-dump-db)
Build the entity vocab (build-entity-vocab)
Build Wiki pretraining dataset (build-wikipedia-pretraining-dataset)
Do the pretraining

However, when starting to do the pretraining, I got some errors:

Traceback (most recent call last):
  File "/usr/playground/luke/luke/pretraining/train.py", line 353, in run_pretraining
    result = model(**batch)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/playground/luke/luke/pretraining/model.py", line 81, in forward
    entity_attention_mask,
  File "/usr/playground/luke/luke/model.py", line 109, in forward
    entity_embedding_output = self.entity_embeddings(entity_ids, entity_position_ids, entity_segment_ids)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/playground/luke/luke/model.py", line 60, in forward
    entity_embeddings = self.entity_embedding_dense(entity_embeddings)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/torch/nn/functional.py", line 1612, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
[2020-12-09 00:43:15,490] [ERROR] Consecutive errors have been observed. Exiting... ([email protected]:379)
Traceback (most recent call last):
  File "/usr/playground/luke/luke/pretraining/train.py", line 352, in run_pretraining
    batch = {k: torch.from_numpy(v).to(device) for k, v in batch.items()}
  File "/usr/playground/luke/luke/pretraining/train.py", line 352, in <dictcomp>
    batch = {k: torch.from_numpy(v).to(device) for k, v in batch.items()}
RuntimeError: CUDA error: an illegal memory access was encountered
Traceback (most recent call last):
  File "./luke/cli.py", line 67, in <module>
    cli()
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/.pyenv/versions/luke/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/playground/luke/luke/pretraining/train.py", line 82, in pretrain
    run_pretraining(Namespace(**kwargs))
  File "/usr/playground/luke/luke/pretraining/train.py", line 352, in run_pretraining
    batch = {k: torch.from_numpy(v).to(device) for k, v in batch.items()}
  File "/usr/playground/luke/luke/pretraining/train.py", line 352, in <dictcomp>
    batch = {k: torch.from_numpy(v).to(device) for k, v in batch.items()}
RuntimeError: CUDA error: an illegal memory access was encountered

Meanwhile, when trying to run the code in cpu, I got this error:
IndexError: index out of range in self

Is it a cuda error or maybe because of the tensors?

Thank you in advance for your help!

Best,
Oryza

Question about input feature entity type.

hello!
I am a new researcher about ner!
I have read your paper and the code, and I have a quesition about the entity type feature.
In your work, the input entity type feature coming from the labeled data.
But when we do the predict, we can't get the entity type from the unlabeled data, so how we create the input feature?

Does the project support Chinese?

Really great work~
But recently I want to do ner and re tasks on the Chinese data set.
@ikuyamada

Fine-tuned models ckpts

Thank you for the well-organized code and instructions!! I wonder if it is possible to get the fine-tuned models ckpts for the down-streaming tasks (e.g. TACRED, CoNLL-2013)?

No module named 'torch' after installing with Poetry

Hi. I ran RE with your code without installing LUKE using Poetry and wasn't able to reproduce the reported results. After searching around I found a similar closed issue here regarding NER and decided to install LUKE using Poetry.

After running poetry install and subsequently poetry shell, however, I get the import error for PyTorch despite already having downloaded it.

I tried running poetry add torch but it claims PyTorch is already present.

Is this normal, and would you by any chance have any idea how I should fix this? Thanks.

How should I run experiments without entity embeddings (i.e., ablation study)?

Hi. I'm trying to reimpement the ablation study regarding the entity inputs as was shown in section 5.1 and am wondering how you did it. Did you simply set the relevant variables (i.e., entity_hidden_states) to 0, or is there some other technique that you used?

Edit

After closer inspection I've noticed in line 108 in luke.mode.LukeModel there's a if entity_ids is not None conditional statement. I'm under the impression that if I didn't want to use entity inputs, then I could simply set entity_ids = None when feeding the input into the model. Is my interpretation correct? Thanks.

Train on dev for final result

Hi @ikuyamada,

just a question regarding to training on development dataset:

In the example code the parameter is set to True:

luke/examples/ner/main.py

Line 42 in 9323b21

@click.option("--train-on-dev-set", is_flag=True)

Did you also train on development dataset for the final paper result 🤔

Thanks and best,

Stefan

Getting RuntimeError for LukeRelationClassification

While trying to replicate results using pre-trained model for Relation Classification, I am getting the following error. I looked at the function load_state_dict(), strict argument is set to False.

Traceback (most recent call last):


  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)

  File "/home/akshay/re_rc/luke/examples/cli.py", line 132, in <module>
    cli()

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)

  File "/home/akshay/re_rc/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)

  File "/home/akshay/re_rc/luke/examples/relation_classification/main.py", line 110, in run
    model.load_state_dict(torch.load(args.checkpoint_file, map_location="cpu"))

  File "/home/akshay/re_rc/luke/luke/model.py", line 236, in load_state_dict
    super(LukeEntityAwareAttentionModel, self).load_state_dict(new_state_dict, *args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))

RuntimeError: Error(s) in loading state_dict for LukeForRelationClassification:
	size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50266, 1024]) from checkpoint, the shape in current model is torch.Size([50267, 1024]).

	size mismatch for entity_embeddings.entity_embeddings.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([3, 256]).

I cannot understand the reason behind this. Can somebody please explain!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble