e-xperiments / datawarden Goto Github PK

This repository is dedicated to providing cutting-edge tools and methodologies to evaluate and curate datasets specifically designed for Large Language Models (LLMs). Leveraging the capabilities of LLMs themselves, combined with programmatic best practices, our toolkit ensures a robust evaluation and refinement process for your datasets.

License: Apache License 2.0

Python 100.00%

datawarden's Introduction

datawarden

datawarden's People

Contributors

Stargazers

Forkers

pulkitmishra joey00072

datawarden's Issues

QA Pair Interruption Detection

Avoiding Abrupt Interruptions in QA Pairs

Develop a mechanism to detect and flag QA pairs with abrupt interruptions. This will improve the quality of QA pairs by ensuring they are coherent and contextually complete.

Code Snippet Syntax Validation

Create a validator to check the syntactical correctness of code snippets in answer fields, ensuring that they are valid code.

Code Snippet Best Practices and Updates

Establish guidelines for code snippets to follow best practices and stay up-to-date with industry standards.

Implement Descriptive Answer Validation

Create a validation process to ensure that answers provided in the QA pairs are descriptive and provide comprehensive information.

Standardize QA Pair Format

Define and implement a consistent format for QA pairs to improve dataset readability and maintainability.

Setup CI CD to Package and Distribute as PyPI Package

Set up a setup.py file with package metadata.
Create a requirements.txt file for dependencies.
Prepare documentation on how to install and use the library.
Test the package installation process locally.
Create a GitHub release for the first version.

Register the package on PyPI
Configure PyPI credentials for automated uploads.
Create a workflow to automate the deployment process.
Publish the library on PyPI using the workflow.
Verify that the library is accessible via pip install.

Choose a CI/CD service with GitHub Actions
Create a CI pipeline that runs tests on every push.
Configure code quality checks (e.g., linting, code formatting).
Create a CD pipeline to deploy new releases automatically.
Ensure CD pipeline publishes releases to PyPI.

e-xperiments / datawarden Goto Github PK

datawarden's Introduction

datawarden

datawarden's People

Contributors

Stargazers

Forkers

datawarden's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs