GithubHelp home page GithubHelp logo

xnul / code-llama-for-vscode Goto Github PK

View Code? Open in Web Editor NEW
511.0 6.0 28.0 7 KB

Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.

License: MIT License

Python 100.00%
code-llama code llama studio visual vscode llm local continue copilot

code-llama-for-vscode's Introduction

Code Llama for VSCode

An API which mocks llama.cpp to enable support for Code Llama with the Continue Visual Studio Code extension.

As of the time of writing and to my knowledge, this is the only way to use Code Llama with VSCode locally without having to sign up or get an API key for a service. The only exception to this is Continue with Ollama, but Ollama doesn't support Windows or Linux. On the other hand, Code Llama for VSCode is completely cross-platform and will run wherever Meta's own codellama code will run.

Now let's get started!

Setup

Prerequisites:

After you are able to use both independently, we will glue them together with Code Llama for VSCode.

Steps:

  1. Move llamacpp_mock_api.py to your codellama folder and install Flask to your environment with pip install flask.
  2. Run llamacpp_mock_api.py with your Code Llama Instruct torchrun command. For example:
torchrun --nproc_per_node 1 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
  1. Type /config in VSCode with Continue and make changes to config.py so it looks like this.

Restart VSCode or reload the Continue extension and you should now be able to use Code Llama for VSCode!

TODO: Response streaming

code-llama-for-vscode's People

Contributors

teticio avatar xnul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

code-llama-for-vscode's Issues

When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs.

torchrun --nproc_per_node 1 llamacpp_mock_api.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 128 --max_batch_size 4

initializing model parallel with size 1
initializing ddp with size 1
initializing pipeline with size 1
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 16713) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
llamacpp_mock_api.py FAILED


Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-09-04_12:12:41
host : 13edd873e909
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 16713)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 16713

If use 13b or 34b, just download model and change command?

For example, if I want use 13b version, the command should be

torchrun --nproc_per_node 2 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-13b-Instruct/ \
    --tokenizer_path CodeLlama-13b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Import Error with 'jinja2' Package

I followed your instructions and managed to fulfill the prerequisites of downloading and running CodeLlama using Meta's repo. Trying to run the command you provided:

[my userpath]/codellama$ torchrun --nproc_per_node 1 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Yields the following error for me:

  File "/home/fabian/Desktop/AI/Domains/NLP/CodeLlama_vsc/codellama/llamacpp_mock_api.py", line 4, in <module>
    from flask import Flask, jsonify, request
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2' (/home/fabian/anaconda3/lib/python3.9/site-packages/jinja2/__init__.py)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9086) of binary: /home/fabian/anaconda3/bin/python
Traceback (most recent call last):
  File "/home/fabian/anaconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
llamacpp_mock_api.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-27_08:18:29
  host      : lenovo-legion-7.lan
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 9086)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

continue can't recognize the content of json file

I download the codellama-7B, continue config.json config as this:
{"title": "LocalServer",
"provider": "openai",
"model": "codellama-7b-Instruct",
"apiBase": "http://localhost:8000/v1/"}
Then I run the llamacpp_mock_api.py , codeLlama can run rightly in my computer , get the post json from continue, generate LLM content correctly, but when I return the json ,the continue can't reecognize the format and show empty, How do you know the json format of Continue, I see the code add "onesix" to the front of json, I can't find json format definition in continue' docs, Is it possible that the Continue plugin updated the format? The current Json generating code is:
"onesix" + jsonify({"choices": [{"delta": {"role": "assistant", "content": response}}]}).get_data(as_text=True)
How I can generate a right json that Continue can show?

Continue side panel: TypeError: fetch failed

After deployment, when I write any message in Continue, it will report:
Error handling message from Continue side panel: TypeError: fetch failed,
which does not happen when GPT-4 and GPT-3.5-turbo employed.
How to fix it?

missing requirements.txt

As title this repository missing an official requirements.txt to guide developer to install dependencies. Will it come up later?

Link for config doesnt work

Hey,
The link for the edited config file doesnt work.
and you update it? or just upload a config file as an example?

thank you

It seems like have a bug?

When run 13b version? I add a function seems like:

def run_text_completion(prompts):
    geneartor.text_completion(...)

It will be in loop before self.geneaotor in llama.geneartor method, and use geneartor.chat_completion it will be fine. I'm very confused.

How to open the DEBUG mode please? (to view the error and solve 502 error)

Hello, masters

i have a GGML API server(with the llamacpp_mock_api.py) and a continuedev-server, on a same linux server

when i use the continuedev-server send request to Ollama-api, the continuedev-server return Error calling /chat/completions endpoint: 502

I not sure what request was send to GGML, i thought this should be the reason to this problem

i want to see the GGML API log to find out these,
but dont know where the log is, so i came to ask about that

I only have the continue-dev server stdout, says "DEBUG=off", i thought, if DEBUG opened, the log will be showned


(codellama) root@********# torchrun --nproc_per_node 1 llamacpp_mock_api.py     --ckpt_dir CodeLlama-7b-Instruct/     --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model     --max_seq_len 1024 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/root/anaconda3/envs/codellama/lib/python3.10/site-packages/torch/__init__.py:614: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
  _C._set_default_tensor_type(t)
Loaded in 7.50 seconds
 * Serving Flask app 'llamacpp_mock_api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:8888
Press CTRL+C to quit


the continue-server log are as below

[2023-11-02 16:56:53] [ERROR] Error while running step: 
Traceback (most recent call last):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/core/autopilot.py", line 218, in _run_singular_step
    async for update in step.run(self.sdk):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/plugins/steps/chat.py", line 50, in run
    async for chunk in generator:

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/base.py", line 475, in stream_chat
    async for chunk in self._stream_complete(prompt=prompt, options=options):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/ggml.py", line 271, in _stream_complete
    async for chunk in self._raw_stream_complete(prompt, options):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/ggml.py", line 134, in _raw_stream_complete
    raise Exception(

Exception: Error calling /chat/completions endpoint: 502

Error calling /chat/completions endpoint: 502

I thought there should be somewhere to open the DEBUG-MODE
because the DEBUG-MODE is off, should be this causing the log dont display

I have see the Google, this github repository, and something else, but dont found something valuable


Thanks to everyone, hope you all have a good and nice day and life!

BTW, my issue when deploying the continue server is issues#570
the newest reply was send by me, this contained the logs etc. i haven't add these here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.