GithubHelp home page GithubHelp logo

e-xperiments / datawarden Goto Github PK

View Code? Open in Web Editor NEW
5.0 0.0 2.0 14 KB

This repository is dedicated to providing cutting-edge tools and methodologies to evaluate and curate datasets specifically designed for Large Language Models (LLMs). Leveraging the capabilities of LLMs themselves, combined with programmatic best practices, our toolkit ensures a robust evaluation and refinement process for your datasets.

License: Apache License 2.0

Python 100.00%

datawarden's Introduction

datawarden

This repository is dedicated to providing cutting-edge tools and methodologies to evaluate and curate datasets specifically designed for Large Language Models (LLMs). Leveraging the capabilities of LLMs themselves, combined with programmatic best practices, our toolkit ensures a robust evaluation and refinement process for your datasets.

datawarden's People

Contributors

tokenbender avatar pulkitmishra avatar

Stargazers

Pushkar Patel avatar Pratik Desai avatar  avatar  avatar  avatar

datawarden's Issues

QA Pair Interruption Detection

Avoiding Abrupt Interruptions in QA Pairs

Develop a mechanism to detect and flag QA pairs with abrupt interruptions. This will improve the quality of QA pairs by ensuring they are coherent and contextually complete.

Code Snippet Syntax Validation

Create a validator to check the syntactical correctness of code snippets in answer fields, ensuring that they are valid code.

Standardize QA Pair Format

Define and implement a consistent format for QA pairs to improve dataset readability and maintainability.

Setup CI CD to Package and Distribute as PyPI Package

Set up a setup.py file with package metadata.
Create a requirements.txt file for dependencies.
Prepare documentation on how to install and use the library.
Test the package installation process locally.
Create a GitHub release for the first version.

Register the package on PyPI
Configure PyPI credentials for automated uploads.
Create a workflow to automate the deployment process.
Publish the library on PyPI using the workflow.
Verify that the library is accessible via pip install.

Choose a CI/CD service with GitHub Actions
Create a CI pipeline that runs tests on every push.
Configure code quality checks (e.g., linting, code formatting).
Create a CD pipeline to deploy new releases automatically.
Ensure CD pipeline publishes releases to PyPI.

QA Pair Relevance to Coding Tasks

Implement a filter to ensure that all questions in the dataset are related to coding or programming tasks, making the dataset highly relevant to the fine-tuning task.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.