yiakwy-xpu-ml-framework-team's Projects
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
HIP: C++ Heterogeneous-Compute Interface for Portability
ROCm BLAS marshalling library
AMD's Machine Intelligence Library
Dockerfiles for the various software layers defined in the ROCm software platform
AMD ROCm™ Software - GitHub Home
TensorFlow ROCm port
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A library for efficient similarity search and clustering of dense vectors.
Fast Segment Anything
A distributed deep learning framework.
Gaussian Belief Propagation for Bundle adjustment and pose graph estimation.
TensorFlow for the IPU
Model parallel transformers in JAX and Haiku
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)
GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing those programs on GroqChip™ processors.
Useful tutorials and recipes for developers doing low-level work with the Graphcore IPU
Best practice for HPC with IPU backend for scientific/AI(Deep Learning Framework) algorithm and software development
DOOM (1993) on IPU 👿
IPU programming in Julia
Experimental JAX for Graphcore IPUs
NVIDIA NCCL Tests for Distributed Training
libavif - Library for encoding and decoding .avif files
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Python bindings for llama.cpp
Port of Facebook's LLaMA model in C/C++