Light

weixiongust Goto Github PK

followers: 105.0 following: 20.0 repos: 24.0 gists: 0.0

Name: Wei Xiong

Type: User

Bio: Ph.D. Student in computer science at UIUC; machine learning theory and RLHF.

Blog: https://weixiongust.github.io/WeiXiongUST/index.html

Hi there 👋

I am Wei Xiong, currently a first-year Ph.D. student in computer science at UIUC. I work on RLHF for aligning language models.

Previously, I have spent time on the mathematical foundation of RL, where I was fortunate to collaborate with many great senior mentors and talented peers. I also spent time on deep RL at Microsoft Research Asia.

You can find more information about me at:

Wei Xiong's Projects

awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl)

awesome-rlhf

A curated list of reinforcement learning with human feedback resources (continually updated)

banditlib

Library of contextual bandits algorithms

decentralized-proximal-algorithm-with-variance-reduction

This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.

functionary

Chat language model that can use tools and interpret the results

iterative-rlhf

lmflow_raft_dev

This is a sub-branch for developing RAFT algorithm.

markdown4zhihu

一键解决知乎导入Markdown文件时图片和公式等问题。

math6913u

mpmab_beacon

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

multi-armed-bandit-test-framework

This is the code about multi_armed bandit used for my undergraduate thesis.

multi_player_multi_armed_bandit_algorithms

Implementation of state-of-the-art multi-player multi-armed bandit problem algorithms.

nemo-skills

A pipeline to improve skills of large language models

observe_then_incentivize

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

online-rlhf

reward-bench

RewardBench: the first evaluation tool for reward models.

rlhf-reward-modeling-dev

Recipes to train reward model for RLHF.

sample-efficient-bayesian-rl

Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL

tora

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

ultrafeedback

A large-scale, fine-grained, diverse preference dataset (and models).

vllm_eval

weixiongust

xwin-lm

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

zhihu

我的知乎内容

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble