GithubHelp home page GithubHelp logo

Comments (6)

monologg avatar monologg commented on June 4, 2024 1

pretraining에 사용한 데이터는 저작권 등의 여러 이슈가 존재해서 공개할 수 없습니다ㅠ

nsmc를 바꾸는 것은 https://github.com/monologg/KoBERT-nsmc 참고하셔서 하시면 될 것 같습니다

from distilkobert.

monologg avatar monologg commented on June 4, 2024

데이터의 경우는 그 당시 제가 임시로 학습 instance에 다운받기 위해서 사용했던 겁니다. 제가 깜빡하고 지워놓지 않았네요.

Can't load의 경우는 다른 이슈인 거 같습니다. 코드 작성 당시에는 transformers==2.9.1 기준으로 제작했는데, 최근에는 라이브러리가 많이 업데이트가 되어서 tokenization_kobert.py 에서 호환되지 않는 것 같습니다.

관련하여 라이브러리 버전을 고정하였습니다. 참고하시면 좋을 것 같습니다!

from distilkobert.

gongtigigi avatar gongtigigi commented on June 4, 2024

06/16/2021 16:14:16 - INFO - main - Loading Tokenizer (kobert)
06/16/2021 16:14:16 - INFO - transformers.tokenization_utils - Model name 'kobert' not found in model shortcut name list (). Assuming 'kobert' is a path, a model identifier, or url to a directory containing tokenizer files.
Traceback (most recent call last):
File "scripts/binarized_data.py", line 95, in
main()
File "scripts/binarized_data.py", line 58, in main
tokenizer = KoBertTokenizer.from_pretrained('kobert')
File "/DistilKoBERT/.env/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 902, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/DistilKoBERT/.env/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 1007, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'kobert' was not found in tokenizers model name list (). We assumed 'kobert' was a path, a model identifier, or url to a directory containing vocabulary files named ['tokenizer_78b3253a26.model', 'vocab.txt'] but couldn't find such vocabulary files at this path or url.

버전을 2.9.1로 낮춰서 binarize.sh를 실행하면 위와 같은 오류가 발생해요

from distilkobert.

monologg avatar monologg commented on June 4, 2024

이 부분은 monologg/kobert 로 변경하면 될 것 같습니다.

tokenizer = KoBertTokenizer.from_pretrained('kobert')

trainer 코드는 제가 kobert를 huggingface hub에 배포하기 전에 로컬에서 테스팅을 많이 하던 부분이어서 이런 하드코딩 부분이 존재하는 것 같습니다.

from distilkobert.

monologg avatar monologg commented on June 4, 2024

5a2b750 커밋에 업데이트 했습니다.

from distilkobert.

gongtigigi avatar gongtigigi commented on June 4, 2024

감사합니다.
그런데 데이터랑 predict하는 부분을 못찾긴해서..
제 데이터와 비슷한 nsmc로 넘어가야겠어요

추후에 nsmc를 classification이 아닌 regression으로 수정하려면 이 repo를 수정해서 huggingface에 등록해야겠죠??

from distilkobert.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.