huggingface / llm-ls Goto Github PK

View Code? Open in Web Editor NEW

568.0 21.0 43.0 344 KB

LSP server leveraging LLMs for code completion (and more?)

License: Apache License 2.0

Rust 100.00%

ai code-generation huggingface ide llamacpp llm lsp lsp-server openai self-hosted

llm-ls's Introduction

llm-ls

Important

This is currently a work in progress, expect things to be broken!

llm-ls is a LSP server leveraging LLMs to make your development experience smoother and more efficient.

The goal of llm-ls is to provide a common platform for IDE extensions to be build on. llm-ls takes care of the heavy lifting with regards to interacting with LLMs so that extension code can be as lightweight as possible.

Features

Prompt

Uses the current file as context to generate the prompt. Can use "fill in the middle" or not depending on your needs.

It also makes sure that you are within the context window of the model by tokenizing the prompt.

Telemetry

Gathers information about requests and completions that can enable retraining.

Note that llm-ls does not export any data anywhere (other than setting a user agent when querying the model API), everything is stored in a log file (~/.cache/llm_ls/llm-ls.log) if you set the log level to info.

Completion

llm-ls parses the AST of the code to determine if completions should be multi line, single line or empty (no completion).

Multiple backends

llm-ls is compatible with Hugging Face's Inference API, Hugging Face's text-generation-inference, ollama and OpenAI compatible APIs, like the python llama.cpp server bindings.

Compatible extensions

Roadmap

support getting context from multiple files in the workspace
add suffix_percent setting that determines the ratio of # of tokens for the prefix vs the suffix in the prompt
add context window fill percent or change context_window to max_tokens
filter bad suggestions (repetitive, same as below, etc)
oltp traces ?

llm-ls's People

Contributors

Stargazers

Watchers

llm-ls's Issues

Use as backend for chat-style UI

Could this be used as a backend also for a chat-style UI such as the one CopilotChat.nvim provides?

It allows you to interact with an LLM by asking questions about selected parts of code in the open buffer, and to easily apply suggestions as diffs to avoid manual copy-pasting between the chat and editor, as one might do when using ChatGPT in the browser. The issue with CopilotChat.nvim in particular though is that it's not backend agnostic, and only supports GitHub's Copilot.

Maybe this could be something for llm.nvim as well.

Quick demo of how it looks like:

copilotnvim.mov

Deepseek Coder not working

When trying to use deepseek coder (via ollama) and its tokenizer and tokens for fim, the result seems completely irrelevant (or, maybe, cut off). However, when using the prompt I would expect to go to the model directly in the ollama, everything works fine:

Here is my config for llm.nvim:

require("llm").setup({
    model = "deepseek-coder:1.3b-base",
	enable_suggestions_on_startup = true,
	accept_keymap = "<C-M-j>",
	dismiss_keymap = "<C-M-k>",
	tokens_to_clear = {
        "<|endoftext|>",
    },
	fim = {
		enabled = true,
                prefix = "<｜fim▁begin｜>",
                middle = "<｜fim▁hole｜>",
                suffix = "<｜fim▁end｜>"
	},
	backend = "ollama",
	debounce_ms = 0,
	url = "http://localhost:11434/api/generate",
	context_window = 240,
	-- cf https://github.com/ollama/ollama/blob/main/docs/api.md#parameters
	request_body = {
		-- Modelfile options for the model you use
		options = {
			num_predict = 4,
			temperature = 0.2,
			top_p = 0.95,
		},
	},
	lsp = {
		bin_path = vim.api.nvim_call_function("stdpath", { "data" }) .. "/mason/bin/llm-ls",
	},
	tokenizer = {
                repository = "deepseek-ai/deepseek-vl-1.3b-base", -- not working for some reason
	},
})

I believe it is a problem with how llm-ls handles it, but if I am wrong, I will open an issue on the llm.nvim github

[Suggestion] Metrics support

First of all, amazing project!

We've started experimenting with the project on an on-premise offline environment, so far it works great!

We need our extensions to send metrics and events to a centralized backend in order to have usage statistics for our company users.

Do you think it fits in the LLM LS or should it go in the extension itself?

Are you guys planning on adding support for other (optional & opt-in) telemetry events?

Thanks! 🙃

emacs support?

Cannot build testbed on Windows

https://github.com/huggingface/llm-ls/blob/2a433cdf75dc0a225e95753256f2601161bc6747/crates/testbed/src/main.rs#L346C24-L346C24

The linked statement results in the following error.

error[E0425]: cannot find function `symlink` in module `fs`
   --> crates\testbed\src\main.rs:346:21
    |
346 |                 fs::symlink(link_target, dst_path.clone()).await?;
    |                     ^^^^^^^ not found in `fs`
    |
note: found an item that was configured out
   --> C:\Users\noahw\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.32.0\src\fs\mod.rs:114:9
    |
114 |     mod symlink;
    |         ^^^^^^^
note: found an item that was configured out
   --> C:\Users\noahw\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.32.0\src\fs\mod.rs:115:28
    |
115 |     pub use self::symlink::symlink;
    |

This is available in unix only
https://docs.rs/tokio/latest/tokio/fs/fn.symlink.html

Unauthenticated warning should not show when a custom backend is used

When supplying a custom backend for the model parameter along with no access token (such as a self-hosted TGI), the following message shows:

You are currently unauthenticated and will get rate limited. To reduce rate limiting, login with your API Token and consider subscribing to PRO: https://huggingface.co/pricing#pro"

This should not show when a custom backend is being used.

Relevant code:

llm-ls/crates/llm-ls/src/main.rs

Line 616 in 16606e5

if params.api_token.is_none() {

Too much logging

Hey @McPatate,

There is way too much logging going on:

I set

   "llm.lsp.logLevel": "warn",

but it logs multiple times a second, this is not necessary. How do I stop this, the "llm.lsp.logLevel": "warn", does nothing to change things.

[BUG] Server doesn't start after reboot

I'm using an intel based mac (ventura 13.4) using vscode (1.83.1) with the llm-vscode extension (v0.1.6) which I believe is using llm-ls 0.4.0.

If I shutdown my mac with vscode running and the extension enabled, the next time I start vscode, the llm-ls server fails to start with the following error: 2023-10-23 17:15:31.351 [info] thread 'main' panicked at crates/llm-ls/src/main.rs:796:18:.

Seems like the error is related to the time check so I added the Instant::now() value to the output error message, recompiled the binary, and pointed llm-vscode extension to it. I got the following error: 2023-10-23 17:16:40.711 [info] thread 'main' panicked at 'Expected instant to be in bounds. Value of Instant.now is: Instant { t: 190814174805 }, tried to subtract a duration of 3600s', crates/llm-ls/src/main.rs:796:36

The time is way off for some reason causing the check to fail. I then deleted lines 795 and 796, recompiled, and now llm-vscode works across reboots. This is not a proper solution to this problem and I'm not a Rust developer but I wanted to bring the issue to your attention.

When the backend is 'tgi', `build_url(...)` should append `/generate` to the URL

When the backend is tgi, the build_url(...) function is simply returning the supplied URL parameter. When a user passes in the base URL of their TGI server, the result is that the a request is made against the root path and it is routed to /compat_generate. Most users would not expect to pass in ${TGI_BASE_URL}/generate. In addition, llm-ls doesn't appear to be compatible with the /generate_stream endpoint so there is no value in allowing the user to choose between the two.

Suggestion: detect and append /generate to the URL (or build out some more robust logic).

[LLM] missing field `request_params`

I'm using neovim with llm.nvim and I'm getting this error when calling LLMSuggestion command:
[LLM] missing field `request_params`

llm.nvim config:

require('llm').setup({
  backend = "ollama",
  model = "llama3:text",
  url = "http://localhost:11434/api/generate",
  request_body = {
    parameters = {
      temperature = 0.2,
      top_p = 0.95,
    }
  },
})

What's the purpose of this project?

Would love to know what problem this project wants to tackle. Happy to help as well!

support for helix editor

just add supporting for helix editor

[BUG] Server doesn't start on NixOS

The plugin v0.1.0 installs fine on NixOS (23.05) and VSCodium v1.82.2.23257 but crashes on launch. This is the error that I get:

[Error - 2:52:13 PM] LLM VS Code client: couldn't create connection to server.
Launching server using command /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls failed. Error: spawn /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls ENOENT

Trying to run the server from terminal results error "No such file or directory", and the problem doesn't seem to be in execution rights:

# The file and containing folder exists
erkkimon@nixos:~/ > ls /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server
llm-ls

# Calling the binary doesn't compute
erkkimon@nixos:~/ > /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls 
zsh: no such file or directory: /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls

# Making sure the binary is executable
erkkimon@nixos:~/ > chmod a+x /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls

# Still no compute
erkkimon@nixos:~/ > /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls          
zsh: no such file or directory: /home/erkkimon/.vscode-oss/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls

Any ideas? If I figure out something, I will definitely let the internet know.

Help: Starting

Is there a tutorial on how to start the server with a downloaded model from hugging face?

So far, I have figured out how to build:

cargo build -r

Thought I have downloaded the WizardCoder-Python-34B-V1.0 but can't start the server. Ive tried:

❯ ./target/release/llm-ls /vault/models/WizardCoder/

Content-Length: 75

{"jsonrpc":"2.0","error":{"code":-32700,"message":"Parse error"},"id":null}%

❯ ./target/release/llm-ls /vault/models/WizardCoder/WizardCoder-Python-34B-V1.0.bin

Content-Length: 75

{"jsonrpc":"2.0","error":{"code":-32700,"message":"Parse error"},"id":null}%

Add support for properly interpreting `context.selectedCompletionInfo`

When vscode shows a popup Completion item (i.e. what they used to call intellisense: a regular language syntax or function that vscode knows about), any inline completion is supposed to start with the Completion item. That is to say, the completion item should be added to the end of the prefix. Take the following python example:

file_path = '/tmp/my-file'
with open(file_path, "r") as handle:
   # imagine the developer is in the middle of typing the period below
   obj = json.
   if obj.myField:
       print('my field is present')

So imagine the developer is typing the . in the line obj = json., vscode will pop up possible completions for json, and likely the method loads will be the top completion. The prefix that is sent to the LLM should use a value of obj = json.loads for that line. The suffix that comes after should also be included as normal.

The range that should be returned for the vscode.InlineCompletionItem should be properly adjusted for this as well. However, this portion is probably not related to this project.

In Windows, it will cause bug sometimes.

llm-ls/crates/llm-ls/src/main.rs

Line 795 in f58085b

.checked_sub(MAX_WARNING_REPEAT)

Instant::now() maybe return a value smaller than MAX_WARNING_REPEAT (3600s) in my Win11, and it will cause the checked_sub return None, that make program show error message "instant to be in bounds" and exit exceptionally.

I think it could be the reason: https://doc.rust-lang.org/std/time/struct.Instant.html#underlying-system-calls

Windows use QueryPerformanceCounter as underlying system call.

My OS is Win11 Home Edition, 23H2 version.

Use tokenizer to fit context window

llm-ls does not check if the prompt fits the context window.

We should use https://github.com/huggingface/tokenizers to count the number of tokens being added to the prompt so that a request does not error.

Can not using the ollama in docker container. ERROR: [LLM] http error

It is a great plugin and I love it. But I found an error here.

[LLM] http error: error sending request for url (http://localhost:11434/api/generate): connection closed before message completed

Following the config in the readme.

{
    "huggingface/llm.nvim",
    opts = {
      -- cf Setup
    },
    config = function()
      local llm = require("llm")
      llm.setup({
        api_token = nil, -- cf Install paragraph
        -- for ollama backend
        backend = "ollama", -- backend ID, "huggingface" | "" | "openai" | "tgi"
        model = "starcoder2:7b",
        url = "http://localhost:11434/api/generate",
        tokens_to_clear = { "<|endoftext|>" }, -- tokens to remove from the model's output
        -- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
        request_body = {
          parameters = {
            max_new_tokens = 60,
            temperature = 0.2,
            top_p = 0.95,
          },
        },
        -- set this if the model supports fill in the middle
        fim = {
          enabled = true,
          prefix = "<fim_prefix>",
          middle = "<fim_middle>",
          suffix = "<fim_suffix>",
        },
        debounce_ms = 150,
        accept_keymap = "<C-y>",
        dismiss_keymap = "<C-n>",
        tls_skip_verify_insecure = false,
        -- llm-ls configuration, cf llm-ls section
        lsp = {
          bin_path = nil,
          host = nil,
          port = nil,
          version = "0.5.2",
        },
        tokenizer = {
          repository = "bigcode/starcoder2-7b",
         
        }, -- cf Tokenizer paragraph
        -- tokenizer = nil, -- cf Tokenizer paragraph
        context_window = 4096, -- max number of tokens for the context window
        enable_suggestions_on_startup = true,
        enable_suggestions_on_files = "*", -- pattern matching syntax to enable suggestions on specific files, either a string or a list of strings
      })
    end,
  }

The MOST wirred thing is that I can curl the answer to the same model & api url, and my vscode continue plugin can communicate this is ollama which is running on a docker container but this plugin cannot!

Thank you for your time and reply!

won't work on GLIBC==2.31

I've tried on linux with ldd --version: 2.3.1

But the VSCode return client is not running.

Then I set the log verbose level to debug and find that llm-ls says the glibc should be 2.3.2-2.3.4, which is not installed in my OS.

How to fix this?

GLIBC_2.32 not found when running under Ubuntu 20.04.6

Getting this:

[Error - 1:44:15 PM] The LLM VS Code server crashed 5 times in the last 3 minutes. The server will not be restarted. See the output for more information.
/home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)
/home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)
/home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /home/user/.vscode-server/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)

Proposal: Launching LLM server as a daemon

Proof-of-concept: https://github.com/blmarket/llm-ls

Hi,

I'm wondering llm-ls can incorporate dedicated LLM server provider within LSP server, preferably as a shared instance via daemonize. Idea was inspired by bazel client/server.

It works as following:

There's a lock file to ensure only 1 LLM server is running
If there's no usage of LLM (i.e. LSP exits) for 10 seconds, LLM server is terminated
- There is a heartbeat to let LLM process know there is a client
Multiple LSP instances will share single LLM server

Still many things are hardcoded (such as LLM path, various file path, model path), but it's usable with existing llm.XXX editor plugins.

Want to know whether this is something llm-ls would like to support as one of its backends, or better be kept as different project.

Response from generated text includes tokens when using extension

Hey @McPatate -

If you do not mind me asking, if I am not using the text-generation-inference sserver from huggingface, and instead instead: https://github.com/LucienShui/huggingface-vscode-endpoint-server

Is there something in the extension that was supposed to take the and automatically clean out the tokens and place in the editor or am I supposed to scrub the response on the server side only?

Respect XDG environment variables

After installing the LS it seems that it ignores XDG environment variables, specifically ${XDG_CACHE_HOME} variable. It has ~/.cache directory hard coded. I suggest adding a functionality that would check for set XDG variables.

refactor: adaptor list should be an enum

llm-ls/crates/llm-ls/src/adaptors.rs

Lines 209 to 213 in 16606e5

 const TGI: &str = "tgi"; 

 const HUGGING_FACE: &str = "huggingface"; 

 const OLLAMA: &str = "ollama"; 

 const OPENAI: &str = "openai"; 

 const DEFAULT_ADAPTOR: &str = HUGGING_FACE;

could be replaced by an enum

Can't process response from llamacpp server

I have nvim/llm working with ollama, which uses llm-ls-x86_64-unknown-linux-gnu-0.5.3.
I tried to switch the config to use OpenAI API to connect to llamacpp server,
because this does support my AMD GPU, which ollama does not.

I can see in Wireshark that the request is sent and the llamacppserver send back a successful response.
However I don' get any completion in nvim, which is probably caused by llm-ls processing the response.

I get this error in nvim: [LLM] serde json error: data did not match any variant of untagged enum OpenAIAPIResponse

request:

{
  "model": "models/codellama-7b.Q4_K_M.gguf",
  "options": {
    "temperature": 0.2,
    "top_p": 0.95
  },
  "parameters": {
    "max_new_tokens": 60,
    "temperature": 0.2,
    "top_p": 0.95
  },
  "prompt": "<PRE> #include <stdio.h>\n\nfloat multiply(float a, float b)\n{\n     <SUF>\n}\n\nint main(int argc, char *argv[])\n{\n    return 0;\n}\n\n\n <MID>",
  "stream": false
}

response:

{
  "content": "return a * b; <EOT>",
  "id_slot": 0,
  "stop": true,
  "model": "models/codellama-7b.Q4_K_M.gguf",
  "tokens_predicted": 6,
  "tokens_evaluated": 54,
  "generation_settings": {
    "n_ctx": 512,
    "n_predict": -1,
    "model": "models/codellama-7b.Q4_K_M.gguf",
    "seed": 4294967295,
    "temperature": 0.800000011920929,
    "dynatemp_range": 0,
    "dynatemp_exponent": 1,
    "top_k": 40,
    "top_p": 0.949999988079071,
    "min_p": 0.05000000074505806,
    "tfs_z": 1,
    "typical_p": 1,
    "repeat_last_n": 64,
    "repeat_penalty": 1,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "penalty_prompt_tokens": [],
    "use_penalty_prompt_tokens": false,
    "mirostat": 0,
    "mirostat_tau": 5,
    "mirostat_eta": 0.10000000149011612,
    "penalize_nl": false,
    "stop": [],
    "n_keep": 0,
    "n_discard": 0,
    "ignore_eos": false,
    "stream": false,
    "logit_bias": [],
    "n_probs": 0,
    "min_keep": 0,
    "grammar": "",
    "samplers": [
      "top_k",
      "tfs_z",
      "typical_p",
      "top_p",
      "min_p",
      "temperature"
    ]
  },
  "prompt": "<PRE> #include <stdio.h>\n\nfloat multiply(float a, float b)\n{\n     <SUF>\n}\n\nint main(int argc, char *argv[])\n{\n    return 0;\n}\n\n\n <MID>",
  "truncated": false,
  "stopped_eos": true,
  "stopped_word": false,
  "stopped_limit": false,
  "stopping_word": "",
  "tokens_cached": 59,
  "timings": {
    "prompt_n": 54,
    "prompt_ms": 601.562,
    "prompt_per_token_ms": 11.140037037037038,
    "prompt_per_second": 89.76630837719138,
    "predicted_n": 6,
    "predicted_ms": 315.451,
    "predicted_per_token_ms": 52.57516666666667,
    "predicted_per_second": 19.020386684461293
  }
}

I hope you can fix that or tell me what I did wrong if its my fault.

feat: add support for llama.cpp

add a flag to differentiate between different APIs
add parsing llama.cpp response

docs: https://github.com/ggerganov/llama.cpp/tree/master/examples/server

feat: add support for ollama

Asked by @gtnbssn in huggingface/llm.nvim#43.

add a flag to differentiate between different APIs
add parsing ollama response

docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md

feat: add support for self-signed certificates

First of all: very cool project! I think this intermediate layer between plugin and llm backend makes a lot of sense.

Now regarding this issue: There is currently no way (that I am aware of) to let llm-ls trust custom self-signed certificates. This makes it impossible to communicate securely with backends that use such certificates.

After looking into this a bit, it seems that this could be alleviated relatively easily by not using the rustls-tls feature when importing reqwest:

reqwest = { version = "0.11", default-features = false, features = ["json", "rustls-tls"] }

By default, reqwest will then use system-native TLS (see: https://docs.rs/reqwest/latest/reqwest/#tls), which would allow the user to simply trust self-signed certificates within the OS and llm-ls should then also automatically trust this certificate.

Completions not displaying in some cases

Hello,
i'm having completions that do not display, and I've managed to track this to the should_complete function.
Here's what happens:

def test():
  if {cursor_position}

In this case, tree.root_node().descendant_for_point_range and tree.root_node() are equal, and a CompletionType::MultiLine is returned.
However, if I try to complete this:

for a in range(5):
  {cursor_position}
  if a == 2:
    break

Then the tree.root_node().descendant_for_point_range evaluates to for_node, and CompletionType:SingLine is returned. Because most of my completions start with a \n, nothing is displayed (due to the way SingleLine completions are handled)

I'm having trouble understanding the logic behind the should_complete function, is there some documentation I could find on the expected output ?
Thanks a lot !

Can't accept completions

I have the plugin installed and configured. I see ghost text followed by "^M" but I do not seem to be able to figure out how to accept the suggestion.
I am running nvim in WSL2 on Windows...

cargo install error when update rust 1.80.0

After I update to rust 1.80.0, I can not build llm-ls anymore

Command:

cargo install --locked --git https://github.com/huggingface/llm-ls llm-ls

Error:

➜  ~ cargo install --locked --git https://github.com/huggingface/llm-ls llm-ls
    Updating git repository `https://github.com/huggingface/llm-ls`
  Installing llm-ls v0.5.3 (https://github.com/huggingface/llm-ls#59febfea)
    Updating crates.io index
   Compiling libc v0.2.147
   Compiling proc-macro2 v1.0.66
   Compiling unicode-ident v1.0.11
   Compiling memchr v2.6.3
   Compiling autocfg v1.1.0
   Compiling regex-syntax v0.7.5
   Compiling cfg-if v1.0.0
   Compiling serde v1.0.188
   Compiling once_cell v1.18.0
   Compiling itoa v1.0.9
   Compiling pin-project-lite v0.2.13
   Compiling aho-corasick v1.1.2
   Compiling syn v1.0.109
   Compiling futures-core v0.3.28
   Compiling regex-automata v0.3.8
   Compiling quote v1.0.33
   Compiling cc v1.0.83
   Compiling syn v2.0.31
   Compiling tracing-core v0.1.31
   Compiling slab v0.4.9
   Compiling futures-sink v0.3.28
   Compiling crossbeam-utils v0.8.16
   Compiling regex v1.9.5
   Compiling tree-sitter v0.20.10
   Compiling futures-task v0.3.28
   Compiling futures-channel v0.3.28
   Compiling bytes v1.4.0
   Compiling ring v0.16.20
   Compiling serde_json v1.0.105
   Compiling ryu v1.0.15
   Compiling futures-util v0.3.28
   Compiling scopeguard v1.2.0
   Compiling socket2 v0.5.3
   Compiling serde_derive v1.0.188
   Compiling tracing-attributes v0.1.26
   Compiling futures-macro v0.3.28
   Compiling tokio-macros v2.1.0
   Compiling mio v0.8.8
   Compiling num_cpus v1.16.0
   Compiling memoffset v0.9.0
   Compiling log v0.4.20
   Compiling futures-io v0.3.28
   Compiling tracing v0.1.37
   Compiling smallvec v1.11.0
   Compiling tinyvec_macros v0.1.1
   Compiling version_check v0.9.4
   Compiling pin-utils v0.1.0
   Compiling tinyvec v1.6.0
   Compiling tokio v1.32.0
   Compiling crossbeam-epoch v0.9.15
   Compiling percent-encoding v2.3.0
   Compiling untrusted v0.7.1
   Compiling strsim v0.10.0
   Compiling ident_case v1.0.1
   Compiling fnv v1.0.7
   Compiling unicode-normalization v0.1.22
   Compiling darling_core v0.14.4
   Compiling form_urlencoded v1.2.0
   Compiling getrandom v0.2.10
   Compiling indexmap v1.9.3
   Compiling unicode-bidi v0.3.13
   Compiling httparse v1.8.0
   Compiling idna v0.4.0
   Compiling darling_macro v0.14.4
   Compiling http v0.2.9
   Compiling rand_core v0.6.4
   Compiling proc-macro-error-attr v1.0.4
   Compiling rustls v0.21.7
   Compiling tokio-util v0.7.8
   Compiling ppv-lite86 v0.2.17
   Compiling rayon-core v1.12.0
   Compiling hashbrown v0.12.3
   Compiling lazy_static v1.4.0
   Compiling bitflags v1.3.2
   Compiling pkg-config v0.3.27
   Compiling tower-service v0.3.2
   Compiling rand_chacha v0.3.1
   Compiling onig_sys v69.8.1
   Compiling crossbeam-deque v0.8.3
   Compiling url v2.4.1
   Compiling darling v0.14.4
   Compiling rustls-webpki v0.101.4
   Compiling sct v0.7.0
   Compiling proc-macro-error v1.0.4
   Compiling lock_api v0.4.10
   Compiling try-lock v0.2.4
   Compiling utf8parse v0.2.1
   Compiling parking_lot_core v0.9.8
   Compiling paste v1.0.14
   Compiling regex-syntax v0.6.29
   Compiling either v1.9.0
   Compiling anstyle-parse v0.2.2
   Compiling want v0.3.1
   Compiling derive_builder_core v0.12.0
   Compiling h2 v0.3.21
   Compiling regex-automata v0.1.10
   Compiling rand v0.8.5
   Compiling http-body v0.4.5
   Compiling pin-project-internal v1.1.3
   Compiling serde_repr v0.1.16
   Compiling socket2 v0.4.9
   Compiling colorchoice v1.0.0
   Compiling anstyle-query v1.0.0
   Compiling thiserror v1.0.50
   Compiling httpdate v1.0.3
   Compiling anstyle v1.0.4
   Compiling minimal-lexical v0.2.1
   Compiling async-trait v0.1.73
   Compiling esaxx-rs v0.1.10
   Compiling overload v0.1.1
   Compiling nu-ansi-term v0.46.0
   Compiling nom v7.1.3
   Compiling anstream v0.6.4
   Compiling hyper v0.14.27
   Compiling pin-project v1.1.3
   Compiling lsp-types v0.94.1
   Compiling tokio-rustls v0.24.1
   Compiling matchers v0.1.0
   Compiling derive_builder_macro v0.12.0
   Compiling rayon v1.8.0
   Compiling itertools v0.11.0
   Compiling sharded-slab v0.1.4
   Compiling tracing-log v0.1.3
   Compiling tracing-serde v0.1.3
   Compiling monostate-impl v0.1.9
   Compiling thiserror-impl v1.0.50
   Compiling tree-sitter-go v0.20.0
   Compiling tree-sitter-java v0.20.2
   Compiling tree-sitter-cpp v0.20.3
   Compiling tree-sitter-scala v0.20.2
   Compiling tree-sitter-objc v3.0.0
   Compiling tree-sitter-rust v0.20.4
   Compiling tree-sitter-bash v0.20.3
   Compiling tree-sitter-r v0.19.5
   Compiling tree-sitter-c v0.20.6
   Compiling tree-sitter-kotlin v0.3.1
   Compiling tree-sitter-erlang v0.4.0
   Compiling tree-sitter-javascript v0.20.1
   Compiling tree-sitter-swift v0.4.0
   Compiling tree-sitter-c-sharp v0.20.0
   Compiling tree-sitter-json v0.20.1
   Compiling tree-sitter-lua v0.0.19
   Compiling tree-sitter-elixir v0.1.0
   Compiling tree-sitter-ruby v0.20.0
   Compiling tree-sitter-typescript v0.20.3
   Compiling tree-sitter-md v0.1.5
   Compiling tree-sitter-python v0.20.4
   Compiling tree-sitter-html v0.20.0
   Compiling thread_local v1.1.7
   Compiling clap_lex v0.6.0
   Compiling base64 v0.13.1
   Compiling heck v0.4.1
   Compiling time-core v0.1.1
   Compiling deranged v0.3.8
   Compiling unicode-segmentation v1.10.1
   Compiling base64 v0.21.3
   Compiling macro_rules_attribute-proc_macro v0.2.0
   Compiling tower-layer v0.3.2
   Compiling hashbrown v0.14.0
   Compiling time v0.3.28
   Compiling macro_rules_attribute v0.2.0
   Compiling tower v0.4.13
   Compiling spm_precompiled v0.1.4
error[E0282]: type annotations needed for `Box<_>`
  --> /Users/anon/.cargo/registry/src/index.crates.io-6f17d22bba15001f/time-0.3.28/src/format_description/parse/mod.rs:83:9
   |
83 |     let items = format_items
   |         ^^^^^
...
86 |     Ok(items.into())
   |              ---- type must be known at this point
   |
help: consider giving `items` an explicit type, where the placeholders `_` are specified
   |
83 |     let items: Box<_> = format_items
   |              ++++++++

   Compiling dashmap v5.5.3
   Compiling rustls-pemfile v1.0.3
   Compiling clap_derive v4.4.7
For more information about this error, try `rustc --explain E0282`.
error: could not compile `time` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
error: failed to compile `llm-ls v0.5.3 (https://github.com/huggingface/llm-ls#59febfea)`, intermediate artifacts can be found at `/var/folders/l7/q2h2fyb5279gggx0f758mzr40000gn/T/cargo-install4kGtfm`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

codellama unusable with llm-ls 0.5.1

I'm not sure if this is an issue for this repository or llm-ls, but it is unusable with the latest update. The <PRE>, <SUF>, etc. are not being properly stripped out anymore.

Things were working before with 0.4.0. I updated llm-ls with Mason and am at 38d6724f868211dc9a68a2a87a3c8caf3d1dbe65 with this repository (updated by Packer).

My config is as follow:

    require("llm").setup({
        tokens_to_clear = { "<EOT>" },
        fim = {
            enabled = true,
            prefix = "<PRE> ",
            middle = " <MID>",
            suffix = " <SUF>",
        },
        model = "codellama/CodeLlama-13b-hf",
        context_window = 4096,
        tokenizer = {
            repository = "codellama/CodeLlama-13b-hf",
        },
        lsp = {
            bin_path = vim.api.nvim_call_function("stdpath", { "data" })
                .. "/mason/bin/llm-ls",
        },
        accept_keymap = "<S-CR>",
        dismiss_keymap = "<C-CR>",
    })

Went a few lines into a file and did :LLMSuggestion and this is what I get. The <PRE> to <SUF> is what was above the line where the cursor is.

})
<PRE> require("mason").setup()
require("mason-lspconfig").setup({
    ensure_installed = {
        "gopls",
        "jedi_language_server",
        "jsonls",
        "lua_ls",
        "rust_analyzer",
    },
})
 <SUF>
local null_ls = require("null-ls")

null_ls.setup({
    sources = {
        null_ls.builtins.formatting.beautysh.with({
            extra_args = { "--force-function-style", "fnpar" },
        }),
        null_ls.builtins.formatting.black.with({
            extra_args = { "-l", "79", "--preview" },
        }),
        null_ls.builtins.formatting.clang_format,
        null_ls.builtins.formatting.gofmt,
        null_ls.builtins.formatting.isort.with({
            extra_args = { "--profile", "black" },
        }),
        null_ls.builtins.formatting.prettier.with({
            extra_args = { "--print-width=80", "--prose-wrap=always" },
        }),
        null_ls.builtins.formatting.rustfmt,
        null_ls.builtins.formatting.stylua.with({
            extra_args = { "--column-width", "80" },
        }),

        null_ls.builtins.diagnostics.cppcheck,
        null_ls.builtins.diagnostics.flake8.with({
            extra_args = { "--extend-ignore", "E203,E501" },
        }),
        null_ls.builtins.diagnostics.markdownlint.with({
            extra_args = { "--disable", "line-length", "--" },
        }),
        null_ls.builtins.diagnostics.mypy.with({
            extra_args = { "--strict" },
        }),
        null_ls.builtins.diagnostics.shellcheck,
        null_ls.builtins.diagnostics.zsh,

        null_ls.builtins.code_actions.shellcheck,
    },
})

vim.keymap.set("n", "gd", vim.lsp.buf.definition, {})
vim.keymap.set("n", "gt", vim.lsp.buf.type_definition, {})
vim.keymap.set("n", "gr", vim.lsp.buf.references, {})
vim.keymap.set({ "n", "i" }, "<c-k>", vim.lsp.buf.signature_help, {})
vim.keymap.set("n", "K", vim.lsp.buf.hover, {})
vim.keymap.set("n", "<leader>cr", vim.lsp.buf.rename, {})
vim.keymap.set("n", "<leader>cf", vim.lsp.buf.format, {})
vim.keymap.set("n", "<leader>ca", vim.lsp.buf.code_action, {})
vim.keymap.set("n", "<leader>ce", function()
    vim.diagnostic.open_float({ border = "rounded" })
end, {})

vim.keymap.set("n", "<leader>cm", "<cmd>Make<cr>", {})

-- Change style of LSP borders.
vim.lsp.handlers["textDocument/hover"] = vim.lsp.with(vim.lsp.handlers.hover, {
    border = "rounded",
})
vim.lsp.handlers["textDocument/signatureHelp"] =
    vim.lsp.with(vim.lsp.handlers.signature_help, {
        border = "rounded",
    })

vim.cmd([[highlight! link FloatBorder Comment]])
 <MID>
require("nvim-treesitter.configs").setup({
    ensure_installed = {
        "bash",
        "c",
        "cmake",
        "comment",
        "cpp",
        "css",
        "dockerfile",

	const TGI: &str = "tgi";
	const HUGGING_FACE: &str = "huggingface";
	const OLLAMA: &str = "ollama";
	const OPENAI: &str = "openai";
	const DEFAULT_ADAPTOR: &str = HUGGING_FACE;