GithubHelp home page GithubHelp logo

abacaj / replit-3b-inference Goto Github PK

View Code? Open in Web Editor NEW
152.0 3.0 28.0 17 KB

Run inference on replit-3B code instruct model using CPU

License: MIT License

Python 100.00%
ctransformers ggml replit replit-code

replit-3b-inference's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

replit-3b-inference's Issues

Compliments to the chef

I've already added a star.. but also just wanted to say thanks and well done.

I've tested around 20 inference/transformer libraries to run LLMs, particularly with a focus on low resources, but also in general to make fair comparisons and get an understanding of hardware requirements..

This is without a doubt the simplest, cleanest, clearest and concise (least confusing) project I've come across. And yes, I get how basic this project is.. but because of all the dependencies that are required in more ambitious projects, getting them working can be an absolute nightmare.

You could probably extend the ReadMe a little bit to make it clearer how epic this project is.. eg. how easy it is to change the model type by amending a couple of lines in the inference and download_model py files. .. Not because this is a stumbling block, but just so people know this the second they land on the page. Also, worth adding that it works with bin files which it automatically downloads (with the config) without having to manually mess around with wgets or quantising (while still optional if personalised quantised models with other libraries are needed).

Also, while this has the focus on CPU in the blurb .. any hardware will benefit from a lightweight approach.. so it might be worth highlighting this is a lightweight non-bloat tool for CPU and low mem GPU. Which is really important from a cost perspective if you are running virtual GPU with AWS or similar.

Anyways, totally loving your work and hope it continues to grow. :)

Model type 'replit' is not supported. hmm >>> FIXED!!!

(replit) PS H:\ia\replit-3B-inference-main> python.exe .\inference.py
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1001.74it/s]
Model type 'replit' is not supported.
Traceback (most recent call last):
File "H:\ia\replit-3B-inference-main\inference.py", line 46, in
llm = AutoModelForCausalLM.from_pretrained(
File "C:\Users\ultim\miniconda3\envs\replit\lib\site-packages\ctransformers\hub.py", line 157, in from_pretrained
return LLM(
File "C:\Users\ultim\miniconda3\envs\replit\lib\site-packages\ctransformers\llm.py", line 214, in init
raise RuntimeError(
RuntimeError: Failed to create LLM 'replit' from 'H:\ia\replit-3B-inference-main\models\replit-v2-codeinstruct-3b.q4_1.bin'.
(replit) PS H:\ia\replit-3B-inference-main>

Runtime error in inference

on windows 10 (AMD Ryzen 5 5600) , gets the Runtime error :Failed to create LLM 'replit' from ......\models\replit-v2-codeinstruct-3b.q4_1.bin during inference (python inference.py) . it says Model type 'replit' is not supported. any idea/pointers on fixing this

it seems do not work offline

it seems it do not work offline, when I put my wifi in plane mode, I receive this error

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/teknium/Replit-v2-CodeInstruct-3B/revision/main (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x00000228F9769DE0>: Failed to resolve 'huggingface.co' ([Errno 11001] getaddrinfo failed)"))

questions and suggestions about the model

questions about the model, from ignorance.
Is there a way to set an extension of the created text?
It has occasionally happened to me with an answer that seems to be halfway,

I ask without having any idea, how this particular model works, if I saw in others that they usually add other types of parameters, of response extension, here it only occurs to me to change the number of tokens?

     temperature=0.2,
     top_k=50,
     top_p=0.9,
     repetition_penalty=1.0,
     max_new_tokens=512, #adjust as needed
     seeds=42,
     reset=True, # reset history (cache)
     stream=True, # streaming per word/token
     threads=int(os.cpu_count() / 6), # adjust for your CPU
     stop=["<|endoftext|>"],

I also saw that they mentioned in a video, a 10gb model
https://huggingface.co/replit/replit-code-v1-3b/tree/main
is it possible to use it? is it better, is it the same? Is it worse? Will it work if I lower it?

Assuming that I wanted to take full advantage of the hardware, I ask why it spends so few resources that I don't know if it's using the gpu or the cpu, (I love that) although I'm curious what the limit may be, as far as it creates something.
I use a 12gb rtx2060, with 32gb of ram, on a ryzen3600x
Is there a way to use the gpu if it is not being used?
Is there a way to save what is being generated in a prompt log? and the response, such as query0001.txt
Is there a way to paste, for example, a code already made in the input?
I have tried to copy something to compare results, with things that I ask SAGE for example
but I fragmented the paste, in different lines, with which the result was according to each line. and lacked a meaning
Thank you very much in advance if you can answer my questions,

Open discussion tab

Discussion tab in GitHub will help in between communication between community.

fintune

is there a way to fintune this model

Error while running on colab

Traceback (most recent call last):
File "/content/replit-3B-inference/ctransformers/../inference.py", line 3, in
from ctransformers import AutoModelForCausalLM, AutoConfig
ImportError: cannot import name 'AutoModelForCausalLM' from 'ctransformers' (unknown location)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.