I have a model that uses BloomTokenizerFast, which does not have properties like byte_

How can I get the mapping relationship between byte values and Unicode characters of the fast tokenizer? about tokenizers HOT 5 OPEN

LuoKaiGSW commented on August 21, 2024

How can I get the mapping relationship between byte values and Unicode characters of the fast tokenizer?

from tokenizers.

Comments (5)

ArthurZucker commented on August 21, 2024

Hey! I suppose you are using python and can't see what's inside your tokenizer! #1542 should help you with this 🤗

from tokenizers.

LuoKaiGSW commented on August 21, 2024

Hey! I suppose you are using python and can't see what's inside your tokenizer! #1542 should help you with this 🤗

Thank you for your reply, but I didn't fully understand what you meant. After using tokenizer._tokenizer.model, I got a BPE object, but I didn't see the attribute I wanted in it - that is, the mapping from byte values to Unicode. Could you explain it a bit more clearly, please?

from tokenizers.

ArthurZucker commented on August 21, 2024

You cannot see any attributes because both __repr__ and __str__ are not implemented

from tokenizers.

LuoKaiGSW commented on August 21, 2024

You cannot see any attributes because both __repr__ and __str__ are not implemented

So, is it impossible to read this mapping relationship from the fast tokenizer?

from tokenizers.

ArthurZucker commented on August 21, 2024

It is coming with the PR that I linked 😉

from tokenizers.

How can I get the mapping relationship between byte values and Unicode characters of the fast tokenizer? about tokenizers HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs