Wei Xiong's Projects
An index of algorithms for offline reinforcement learning (offline-rl)
A curated list of reinforcement learning with human feedback resources (continually updated)
Library of contextual bandits algorithms
This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.
Chat language model that can use tools and interpret the results
This is a sub-branch for developing RAFT algorithm.
一键解决知乎导入Markdown文件时图片和公式等问题。
This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.
This is the code about multi_armed bandit used for my undergraduate thesis.
Implementation of state-of-the-art multi-player multi-armed bandit problem algorithms.
A pipeline to improve skills of large language models
This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.
RewardBench: the first evaluation tool for reward models.
Recipes to train reward model for RLHF.
Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
A large-scale, fine-grained, diverse preference dataset (and models).
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
我的知乎内容