<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

a problem of Chinese coding about llamasharp HOT 19 CLOSED

yuydev commented on June 25, 2024

a problem of Chinese coding

from llamasharp.

Comments (19)

martindevans commented on June 25, 2024

Could you please paste the characters that you typed here as text? I can't investigate the problem otherwise since I can't type those characters!

from llamasharp.

yuydev commented on June 25, 2024

你好--你好。我今天能为?????????么? used the UTF-8 character set ;

from llamasharp.

yuydev commented on June 25, 2024

你好--你好。我今天能为?????????么? used the UTF-8 character set ;

from llamasharp.

AsakusaRinne commented on June 25, 2024

are u using Baichuan model?

from llamasharp.

yuydev commented on June 25, 2024

uesd the ggml-vic13b-q5_1.bin

from llamasharp.

yuydev commented on June 25, 2024

from llamasharp.

Jiahui-Young commented on June 25, 2024

hello，are u fix this problem now？I meet the same problem. thx~

from llamasharp.

AsakusaRinne commented on June 25, 2024

hello，are u fix this problem now？I meet the same problem. thx~

Some time ago I met a case that the response went wrong when Chinese and English mixed in the prompt but it was okay when using pure Chinese as prompt. Could you please change the prompt to have a try?

from llamasharp.

martindevans commented on June 25, 2024

This should be fixed in LLamaSharp 0.7.0 for the StatelessExecutor (only the stateless executor, the others need more work). Can you confirm if it's fixed for you?

from llamasharp.

chenzanyu commented on June 25, 2024

InteractiveExecutor，I have the same problem, is there a solution

from llamasharp.

martindevans commented on June 25, 2024

Unfortunately no, not yet. Someone needs to go in and modify the InteractiveExecutor to use the new StreamingTokenDecoder.

from llamasharp.

AsakusaRinne commented on June 25, 2024

I've located as the problem as GB2312 encoding. I'll fix it soon with a partial refactor of the executor with StreamingTokenDecoder as Martin mentioned. And I'll appreciate for any help about it (next week I'll have some busy days). :)

from llamasharp.

martindevans commented on June 25, 2024

PR #293 has just added the new decoder into the base executor. Can anyone here please pull the master branch and test if this is fixed with all executors now? Thanks.

from llamasharp.

AsakusaRinne commented on June 25, 2024

PR #293 has just added the new decoder into the base executor. Can anyone here please pull the master branch and test if this is fixed with all executors now? Thanks.

I've tested yesterday but unfortunately it still doesn't work for Chinese. Windows uses GB2312 encoding by default for Chinese and we can get correct string by adding GB2312 to System.Text.Encoding. However it seems the output of the model is totally a mess. I'm not sure it's because of the tokenizer or detokenizer but I prefer detokenizer to be blamed because the output are all meaningless Chinese, instead of other characters.

from llamasharp.

martindevans commented on June 25, 2024

Ah ok, I thought it was the multi-token-single-character problem, but I guess not :(

from llamasharp.

owxy commented on June 25, 2024

The same problem still exists now, as Chinese characters are decomposed and combined into strange symbols：FlagAlpha-Llama2-Chinese-13b-Chat.Q2_K.gguf

from llamasharp.

AsakusaRinne commented on June 25, 2024

Update: I've found a way to deal with Chinese decoding however still not robust. Please wait for a while and we'll include the fix in the next release. :)

from llamasharp.

AsakusaRinne commented on June 25, 2024

This problem has been fixed with #326 and an example has been added. Could you please take a try with the master branch? Note that in our test, we found some Chinese model such as baichuan has a strange behaviour while others don't. If you're not sure which model to use, please consider llama2-chinese-alpaca.

from llamasharp.

AsakusaRinne commented on June 25, 2024

Close this issue as completed, please feel free to reopen it if there's any problem.

from llamasharp.

a problem of Chinese coding about llamasharp HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs