xnul / code-llama-for-vscode Goto Github PK

View Code? Open in Web Editor NEW

511.0 6.0 28.0 7 KB

Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.

License: MIT License

Python 100.00%

code-llama code llama studio visual vscode llm local continue copilot

code-llama-for-vscode's Introduction

Code Llama for VSCode

An API which mocks llama.cpp to enable support for Code Llama with the Continue Visual Studio Code extension.

As of the time of writing and to my knowledge, this is the only way to use Code Llama with VSCode locally without having to sign up or get an API key for a service. The only exception to this is Continue with Ollama, but Ollama doesn't support Windows or Linux. On the other hand, Code Llama for VSCode is completely cross-platform and will run wherever Meta's own codellama code will run.

Now let's get started!

Setup

Prerequisites:

After you are able to use both independently, we will glue them together with Code Llama for VSCode.

Steps:

Move llamacpp_mock_api.py to your codellama folder and install Flask to your environment with pip install flask.
Run llamacpp_mock_api.py with your Code Llama Instruct torchrun command. For example:

torchrun --nproc_per_node 1 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Type /config in VSCode with Continue and make changes to config.py so it looks like this.

Restart VSCode or reload the Continue extension and you should now be able to use Code Llama for VSCode!

TODO: Response streaming

code-llama-for-vscode's People

Contributors

Stargazers

Watchers

code-llama-for-vscode's Issues

When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs.

torchrun --nproc_per_node 1 llamacpp_mock_api.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 128 --max_batch_size 4

initializing model parallel with size 1
initializing ddp with size 1
initializing pipeline with size 1
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 16713) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
llamacpp_mock_api.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-09-04_12:12:41
host : 13edd873e909
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 16713)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 16713

Error: special tags are not allowed as part of the prompt.

It will return Error: special tags are not allowed as part of the prompt. as an error. No settings adjusted, completely fresh instance of vscode and the continue extension

If use 13b or 34b, just download model and change command?

For example, if I want use 13b version, the command should be

torchrun --nproc_per_node 2 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-13b-Instruct/ \
    --tokenizer_path CodeLlama-13b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Is this says codellama run only on CPU with this project?

I saw that you mock llama.cpp but I still have gpu resources, event I also have enough cpu & RAM.

Just want to figure out it's right scene to deploying.

Import Error with 'jinja2' Package

I followed your instructions and managed to fulfill the prerequisites of downloading and running CodeLlama using Meta's repo. Trying to run the command you provided:

[my userpath]/codellama$ torchrun --nproc_per_node 1 llamacpp_mock_api.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Yields the following error for me:

  File "/home/fabian/Desktop/AI/Domains/NLP/CodeLlama_vsc/codellama/llamacpp_mock_api.py", line 4, in <module>
    from flask import Flask, jsonify, request
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2' (/home/fabian/anaconda3/lib/python3.9/site-packages/jinja2/__init__.py)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9086) of binary: /home/fabian/anaconda3/bin/python
Traceback (most recent call last):
  File "/home/fabian/anaconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/fabian/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
llamacpp_mock_api.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-27_08:18:29
  host      : lenovo-legion-7.lan
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 9086)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

continue can't recognize the content of json file

I download the codellama-7B, continue config.json config as this:
{"title": "LocalServer",
"provider": "openai",
"model": "codellama-7b-Instruct",
"apiBase": "http://localhost:8000/v1/"}
Then I run the llamacpp_mock_api.py , codeLlama can run rightly in my computer , get the post json from continue, generate LLM content correctly, but when I return the json ,the continue can't reecognize the format and show empty, How do you know the json format of Continue, I see the code add "onesix" to the front of json, I can't find json format definition in continue' docs, Is it possible that the Continue plugin updated the format? The current Json generating code is:
"onesix" + jsonify({"choices": [{"delta": {"role": "assistant", "content": response}}]}).get_data(as_text=True)
How I can generate a right json that Continue can show?

Continue side panel: TypeError: fetch failed

After deployment, when I write any message in Continue, it will report:
Error handling message from Continue side panel: TypeError: fetch failed,
which does not happen when GPT-4 and GPT-3.5-turbo employed.
How to fix it?

missing requirements.txt

As title this repository missing an official requirements.txt to guide developer to install dependencies. Will it come up later?

Link for config doesnt work

Hey,
The link for the edited config file doesnt work.
and you update it? or just upload a config file as an example?

thank you

It seems like have a bug?

When run 13b version? I add a function seems like:

def run_text_completion(prompts):
    geneartor.text_completion(...)

It will be in loop before self.geneaotor in llama.geneartor method, and use geneartor.chat_completion it will be fine. I'm very confused.

How to open the DEBUG mode please? (to view the error and solve 502 error)

Hello, masters

i have a GGML API server(with the llamacpp_mock_api.py) and a continuedev-server, on a same linux server

when i use the continuedev-server send request to Ollama-api, the continuedev-server return Error calling /chat/completions endpoint: 502

I not sure what request was send to GGML, i thought this should be the reason to this problem

i want to see the GGML API log to find out these,
but dont know where the log is, so i came to ask about that

I only have the continue-dev server stdout, says "DEBUG=off", i thought, if DEBUG opened, the log will be showned


(codellama) root@********# torchrun --nproc_per_node 1 llamacpp_mock_api.py     --ckpt_dir CodeLlama-7b-Instruct/     --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model     --max_seq_len 1024 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/root/anaconda3/envs/codellama/lib/python3.10/site-packages/torch/__init__.py:614: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
  _C._set_default_tensor_type(t)
Loaded in 7.50 seconds
 * Serving Flask app 'llamacpp_mock_api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:8888
Press CTRL+C to quit

the continue-server log are as below

[2023-11-02 16:56:53] [ERROR] Error while running step: 
Traceback (most recent call last):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/core/autopilot.py", line 218, in _run_singular_step
    async for update in step.run(self.sdk):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/plugins/steps/chat.py", line 50, in run
    async for chunk in generator:

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/base.py", line 475, in stream_chat
    async for chunk in self._stream_complete(prompt=prompt, options=options):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/ggml.py", line 271, in _stream_complete
    async for chunk in self._raw_stream_complete(prompt, options):

  File "/root/anaconda3/envs/continue-dev/lib/python3.10/site-packages/continuedev/libs/llm/ggml.py", line 134, in _raw_stream_complete
    raise Exception(

Exception: Error calling /chat/completions endpoint: 502

Error calling /chat/completions endpoint: 502

I thought there should be somewhere to open the DEBUG-MODE
because the DEBUG-MODE is off, should be this causing the log dont display

I have see the Google, this github repository, and something else, but dont found something valuable

Thanks to everyone, hope you all have a good and nice day and life!

BTW, my issue when deploying the continue server is issues#570
the newest reply was send by me, this contained the logs etc. i haven't add these here

xnul / code-llama-for-vscode Goto Github PK

code-llama-for-vscode's Introduction

Code Llama for VSCode

Setup

code-llama-for-vscode's People

Contributors

Stargazers

Watchers

Forkers

code-llama-for-vscode's Issues

Failures: <NO_OTHER_FAILURES>

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Failures:
<NO_OTHER_FAILURES>