tom-caozh Goto Github PK

followers: 40.0 following: 101.0 repos: 39.0 gists: 0.0

Name: Zhang Cao

Type: User

Bio: Interested in KV Stores, Cache, Disaggregated Memory(RDMA and CXL) and LLM.

Twitter: tomcaottt

Location: China

Blog: https://tom-caozh.github.io/

Zhang Cao's Projects

cs106b

my sections and homeworks of the standford class

Curve is a high-performance, lightweight-operation, cloud-native open source distributed storage system. Curve can be applied to: 1) mainstream cloud-native infrastructure platforms OpenStack and Kubernetes; 2) high-performance storage for cloud-native databases; 3) cloud storage middleware using S3-compatible object storage as a data storage.

cxl-101

Contain some materials about CXL.

ditto

This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.

dotfiles

to stroe my config files

flexgen

Running large language models on a single GPU for throughput-oriented scenarios.

learningos_record

Record my daily process when learning os-comp2022-winter

leetcode

my solutions to some leetcode problems

leveldbread

To record some notes when I read the leveldb source code

llama-factory

Unify Efficient Fine-Tuning of 100+ LLMs

llama.cpp

LLM inference in C/C++

memkind

Memkind is an easy-to-use, general-purpose allocator which helps to fully utilize various kinds of memory available in the system, including DRAM, NVDIMM, and HBM

mit_6.824

to record my study of mit 6.824

notes-pictures

oncomatcher

opendal

OpenDAL: Access data freely, painlessly, and efficiently

paper_readings

Keep track of the papers I have read and to be read

powerinfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

rocksdbread

To record some notes when I read the rocksdb source code

rpc_imp

implement a rpc framework using golang, just for exercise

runc

CLI tool for spawning and running containers according to the OCI specification

rust_study

My rust study based on the cs110l course

tensorrt

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

tensorrt-llm

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

tom-caozh Goto Github PK

Zhang Cao's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs