abacaj / replit-3b-inference Goto Github PK

View Code? Open in Web Editor NEW

152.0 3.0 28.0 17 KB

Run inference on replit-3B code instruct model using CPU

License: MIT License

Python 100.00%

ctransformers ggml replit replit-code

replit-3b-inference's People

Stargazers

Watchers

replit-3b-inference's Issues

Compliments to the chef

I've already added a star.. but also just wanted to say thanks and well done.

I've tested around 20 inference/transformer libraries to run LLMs, particularly with a focus on low resources, but also in general to make fair comparisons and get an understanding of hardware requirements..

This is without a doubt the simplest, cleanest, clearest and concise (least confusing) project I've come across. And yes, I get how basic this project is.. but because of all the dependencies that are required in more ambitious projects, getting them working can be an absolute nightmare.

You could probably extend the ReadMe a little bit to make it clearer how epic this project is.. eg. how easy it is to change the model type by amending a couple of lines in the inference and download_model py files. .. Not because this is a stumbling block, but just so people know this the second they land on the page. Also, worth adding that it works with bin files which it automatically downloads (with the config) without having to manually mess around with wgets or quantising (while still optional if personalised quantised models with other libraries are needed).

Also, while this has the focus on CPU in the blurb .. any hardware will benefit from a lightweight approach.. so it might be worth highlighting this is a lightweight non-bloat tool for CPU and low mem GPU. Which is really important from a cost perspective if you are running virtual GPU with AWS or similar.

Anyways, totally loving your work and hope it continues to grow. :)

Model type 'replit' is not supported. hmm >>> FIXED!!!

(replit) PS H:\ia\replit-3B-inference-main> python.exe .\inference.py
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1001.74it/s]
Model type 'replit' is not supported.
Traceback (most recent call last):
File "H:\ia\replit-3B-inference-main\inference.py", line 46, in
llm = AutoModelForCausalLM.from_pretrained(
File "C:\Users\ultim\miniconda3\envs\replit\lib\site-packages\ctransformers\hub.py", line 157, in from_pretrained
return LLM(
File "C:\Users\ultim\miniconda3\envs\replit\lib\site-packages\ctransformers\llm.py", line 214, in init
raise RuntimeError(
RuntimeError: Failed to create LLM 'replit' from 'H:\ia\replit-3B-inference-main\models\replit-v2-codeinstruct-3b.q4_1.bin'.
(replit) PS H:\ia\replit-3B-inference-main>

How did you quantize the model?

Thanks for sharing! How did you quantize the model?

Is this an sort of Intel OpenVINO Quantised?

Just asking out of curiosity. Does the model is quantized using OpenVINO?

it should be nice to have an UI, don't you think ?

sth like this

and as well the possibility to index documents, and input q and receive answers from those documents

Illegal Instruction error Intel i7-3540M CPU @ 3.00GHz

Messed with thread parameter to see if reducing number of threads would help. No dice

How I can get the embeddings from the model ?

Thank you very much for the repo.

One question, how can I get the embeddings from the model, i.e. given a piece of text how to get the vector embedding representation?

Runtime error in inference

on windows 10 (AMD Ryzen 5 5600) , gets the Runtime error :Failed to create LLM 'replit' from ......\models\replit-v2-codeinstruct-3b.q4_1.bin during inference (python inference.py) . it says Model type 'replit' is not supported. any idea/pointers on fixing this

it seems do not work offline

it seems it do not work offline, when I put my wifi in plane mode, I receive this error

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/teknium/Replit-v2-CodeInstruct-3B/revision/main (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x00000228F9769DE0>: Failed to resolve 'huggingface.co' ([Errno 11001] getaddrinfo failed)"))

questions and suggestions about the model

questions about the model, from ignorance.
Is there a way to set an extension of the created text?
It has occasionally happened to me with an answer that seems to be halfway,

I ask without having any idea, how this particular model works, if I saw in others that they usually add other types of parameters, of response extension, here it only occurs to me to change the number of tokens?

     temperature=0.2,
     top_k=50,
     top_p=0.9,
     repetition_penalty=1.0,
     max_new_tokens=512, #adjust as needed
     seeds=42,
     reset=True, # reset history (cache)
     stream=True, # streaming per word/token
     threads=int(os.cpu_count() / 6), # adjust for your CPU
     stop=["<|endoftext|>"],

I also saw that they mentioned in a video, a 10gb model
https://huggingface.co/replit/replit-code-v1-3b/tree/main
is it possible to use it? is it better, is it the same? Is it worse? Will it work if I lower it?

Assuming that I wanted to take full advantage of the hardware, I ask why it spends so few resources that I don't know if it's using the gpu or the cpu, (I love that) although I'm curious what the limit may be, as far as it creates something.
I use a 12gb rtx2060, with 32gb of ram, on a ryzen3600x
Is there a way to use the gpu if it is not being used?
Is there a way to save what is being generated in a prompt log? and the response, such as query0001.txt
Is there a way to paste, for example, a code already made in the input?
I have tried to copy something to compare results, with things that I ask SAGE for example
but I fragmented the paste, in different lines, with which the result was according to each line. and lacked a meaning
Thank you very much in advance if you can answer my questions,

Open discussion tab

Discussion tab in GitHub will help in between communication between community.

fintune

is there a way to fintune this model

Error while running on colab

Traceback (most recent call last):
File "/content/replit-3B-inference/ctransformers/../inference.py", line 3, in
from ctransformers import AutoModelForCausalLM, AutoConfig
ImportError: cannot import name 'AutoModelForCausalLM' from 'ctransformers' (unknown location)

abacaj / replit-3b-inference Goto Github PK

replit-3b-inference's People

Stargazers

Watchers

Forkers

replit-3b-inference's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs