Comments (12)
Is there any particular PR that we're waiting for before cutting the release?
The model support for Phi and Deepseek
from vllm.
I am going to try to get these in
probably will not make it but tracking to v0.4.4:
from vllm.
Sounds like we may want to include #4894 @rkooo567?
from vllm.
- Fix for mistral-v0.3: #5005
from vllm.
With the patch, like you, running fp16 models (Mistral 7B for example) with no issues.
Not only fp16, but AQLM works well too (#5058)
from vllm.
re: #4409 --> I did not have any issues running an fp16 model on a P40 when I installed from source.
Hi @robertgshaw2-neuralmagic - was this without the patch? I couldn't get a source build to run on P100's without the patch of #4409. With the patch, like you, running fp16 models (Mistral 7B for example) with no issues.
P40 requires building with the patch.
from vllm.
Hi, is it possible to include the following PRs?
from vllm.
Thanks for bring these up @sasha0552!
#4167 is unlikely to be finished in time.
#4409 might need a little bit more discussion given what features are supported for Pascal GPUs and whether building from source might be a better option.
#4638 can be included if it gets merged in time.
We do commit to biweekly release cadence so don't worry many of these will get into soon enough!
from vllm.
Thanks for bring these up @sasha0552!
#4167 is unlikely to be finished in time. #4409 might need a little bit more discussion given what features are supported for Pascal GPUs and whether building from source might be a better option. #4638 can be included if it gets merged in time.
We do commit to biweekly release cadence so don't worry many of these will get into soon enough!
re: #4409 --> I did not have any issues running an fp16 model on a P40 when I installed from source.
from vllm.
Yeah +1 on that PR @njhill
from vllm.
re: #4409 --> I did not have any issues running an fp16 model on a P40 when I installed from source.
Hi @robertgshaw2-neuralmagic - was this without the patch? I couldn't get a source build to run on P100's without the patch of #4409. With the patch, like you, running fp16 models (Mistral 7B for example) with no issues.
from vllm.
Is there any particular PR that we're waiting for before cutting the release?
from vllm.
Related Issues (20)
- [Bug]: The vllm is disconnected after running for some time HOT 1
- [Feature]: Adopt Colossal Inference Features (55% speedup over vLLM)
- [Installation]: Error when importing LLM from vllm HOT 1
- [Bug]: Gemma model fails with GPTQ marlin HOT 6
- [Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors. HOT 1
- [Bug]: Can't run vllm distributed inference with vLLM + Ray
- [New Model]: IBM Granite Code Models HOT 1
- [Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run HOT 8
- [Bug]: can not clean up the memory usage after instantiating the LLM class. HOT 1
- [Bug]: async engine failure when placing multi lora adapter under load HOT 2
- [Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM HOT 1
- [ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite HOT 3
- [Bug]: vLLM embeddings example code doesn't work HOT 2
- [Bug]: Crash sometimes using LLM entrypoint and LoRA adapters HOT 1
- [Usage]: Multiple samplig params with OpenAI library HOT 5
- [Bug]: The tail problem HOT 1
- [New Model]: LLaVA-NeXT-Video support
- [Usage]: extractive question answering using VLLM
- [Feature]: Triton GPTQ
- [Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.