Topic: rlhf Goto Github
Some thing interesting about rlhf
Some thing interesting about rlhf
rlhf,Rewarded soups official implementation
User: alexrame
rlhf,Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
User: aligner2024
Home Page: https://aligner2024.github.io/
rlhf,RewardBench: the first evaluation tool for reward models.
Organization: allenai
Home Page: https://huggingface.co/spaces/allenai/reward-bench
rlhf,Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
Organization: argilla-io
Home Page: https://docs.argilla.io
rlhf,Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency
Organization: argilla-io
Home Page: https://distilabel.argilla.io
rlhf,pykoi: Active learning in one unified interface
User: cambioml
Home Page: https://www.cambioml.com
rlhf,Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
Organization: cogment
Home Page: https://cogment.ai/cogment_verse
rlhf,A library with extensible implementations of DPO, KTO, PPO, and other human-aware loss functions (HALOs).
Organization: contextualai
Home Page: https://arxiv.org/abs/2402.01306
rlhf,Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
User: csmile-1006
Home Page: https://sites.google.com/view/preference-transformer
rlhf,A Doctor for your data
Organization: docta-ai
rlhf,Aligning Large Language Models with Human: A Survey
User: garyyufei
Home Page: https://arxiv.org/abs/2307.12966
rlhf,A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
User: glgh
rlhf,Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
User: hiyouga
rlhf,Unify Efficient Fine-tuning of 100+ LLMs
User: hiyouga
rlhf,Robust recipes to align language models with human and AI preferences
Organization: huggingface
Home Page: https://huggingface.co/HuggingFaceH4
rlhf,Official release of InternLM2 7B and 20B base and chat models. 200K context support
Organization: internlm
Home Page: https://internlm.intern-ai.org.cn/
rlhf,A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
User: jackaduma
rlhf,A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
User: jackaduma
rlhf,A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
User: jackaduma
rlhf,聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效的金融垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
User: jerry1993-tech
Home Page: https://zhuanlan.zhihu.com/p/633736418
rlhf,The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
User: jianzhnie
Home Page: https://jianzhnie.github.io/machine-learning-wiki/#/deep-rl/papers/RLHF
rlhf,LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
User: joyce94
rlhf,Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
User: l294265421
Home Page: https://88aeeb3aef5040507e.gradio.live/
rlhf,Reproduce alpaca
User: l294265421
Home Page: https://88aeeb3aef5040507e.gradio.live/
rlhf,OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Organization: laion-ai
Home Page: https://open-assistant.io
rlhf,Chain-of-Hindsight, A Scalable RLHF Method
User: lhao499
rlhf,Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
User: liziniu
rlhf,Python client library for managing your LLM data in one place
Organization: log10-io
Home Page: https://log10.io
rlhf,MindSpore online courses: Step into LLM
Organization: mindspore-courses
rlhf,对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
User: miraclemarvel55
rlhf,Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
User: nlp-uoregon
rlhf,A curated list of reinforcement learning with human feedback resources (continually updated)
Organization: opendilab
rlhf,Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Organization: opening-up-chatgpt
Home Page: https://opening-up-chatgpt.github.io/
rlhf,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Organization: pku-alignment
Home Page: https://sites.google.com/view/pku-beavertails
rlhf,Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Organization: pku-alignment
Home Page: https://pku-beaver.github.io
rlhf,The official GitHub page for the survey paper "A Survey of Large Language Models".
Organization: rucaibox
Home Page: https://arxiv.org/abs/2303.18223
rlhf,Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
Organization: snu-mllab
rlhf,An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Organization: tatsu-lab
Home Page: https://tatsu-lab.github.io/alpaca_eval/
rlhf,[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Organization: thudm
rlhf,Code accompanying the paper Pretraining Language Models with Human Preferences
User: tomekkorbak
Home Page: https://arxiv.org/abs/2302.08582
rlhf,ZYN: Zero-Shot Reward Models with Yes-No Questions
User: vicgalle
rlhf,Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
User: voidful
rlhf,🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
User: wangrongsheng
Home Page: https://www.wangrs.co/MedQA-ChatGLM/
rlhf,A recipe to train reward models for RLHF.
User: weixiongust
rlhf,Implementation of Reinforcement Learning from Human Feedback (RLHF)
User: xrsrke
Home Page: https://xrsrke.github.io/instructGOOSE/
rlhf,Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
Organization: xtreme1-io
Home Page: https://www.basic.ai
rlhf,中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
User: ymcui
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.