Name: Xidong Wang
Type: User
Company: PHD@The Chinese University of Hong Kong, Shenzhen, BA@Beijing Institute of Technology,
Bio: Towards (Medical) LLMs’ interpretability and interactivity
Location: [email protected]
Blog: https://scholar.google.com/citations?user=WJeSzQMAAAAJ&hl=en
Xidong Wang's Projects
Repository for the ACL 2023 conference website
Basic Linear Algebra Subprograms testbench
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Repository containing the website for the EMNLP 2023 conference
Firefly(流萤): 中文对话式大语言模型(全量微调+QLoRA),支持微调Baichuan2、CodeLlama、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya、Bloom等大模型
Fast and memory-efficient exact attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Inference code for Mistral and Mixtral hacked up into original Llama implementation
Port of Facebook's LLaMA model in C/C++
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Various SFT acceleration framework scripts and codes
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Best practice for training LLaMA models in Megatron-LM
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Homework and Notes of CS224N
Use OpenAIAPI stably and quickly
OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets.
A Ray-based High-performance RLHF framework (for 7B on RTX4090 and 34B on A100)
Optimized LLM.cpp codes(LLaMa.cpp BLoomz.cpp Whisper.cpp) with Matrix Multiplication implemented by BLIS
Memory management for the AI Applications and AI Agents
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
The repository for the code of the UltraFastBERT paper