GithubHelp home page GithubHelp logo

firattamur / llmdantic Goto Github PK

View Code? Open in Web Editor NEW
39.0 3.0 1.0 1.85 MB

Structured Output Is All You Need!

License: MIT License

Python 99.33% Shell 0.67%
langchain langchain-python llm llms pydantic pydantic-v2

llmdantic's Introduction

image

Structured Output Is All You Need!


LLMdantic is a powerful and efficient Python library that simplifies the integration of Large Language Models (LLMs) into your projects. Built on top of the incredible Langchain package and leveraging the power of Pydantic models, LLMdantic provides a seamless and structured approach to working with LLMs.

Features 🚀

  • 🌐 Wide range of LLM support through Langchain integrations
  • 🛡️ Ensures data integrity with Pydantic models for input and output validation
  • 🧩 Modular and extensible design for easy customization
  • 💰 Cost tracking and optimization for OpenAI models
  • 🚀 Efficient batch processing for handling multiple data points
  • 🔄 Robust retry mechanism for smooth and uninterrupted experience

Getting Started 🌟

Requirements

Before using LLMdantic, make sure you have set the required API keys for the LLMs you plan to use. For example, if you're using OpenAI's models, set the OPENAI_API_KEY environment variable:

export OPENAI_API_KEY="your-api-key"

If you're using other LLMs, follow the instructions provided by the respective providers in Langchain's documentation.

Installation

pip install llmdantic

Usage

1. Define input and output schemas using Pydantic:

  • Use Pydantic to define input and output models with custom validation rules.

Important

Add docstrings to validation rules to provide prompts for the LLM. This will help the LLM understand the validation rules and provide better results

from pydantic import BaseModel, field_validator

class SummarizeInput(BaseModel):
    text: str

class SummarizeOutput(BaseModel):
    summary: str

    @field_validator("summary")  
    def summary_must_not_be_empty(cls, v) -> bool:
        """Summary cannot be empty"""  # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
        if not v.strip():
            raise
        return v

    @field_validator("summary")
    def summary_must_be_short(cls, v) -> bool:  
        """Summary must be less than 100 words"""  # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
        if len(v.split()) > 100:
            raise  
        return v

2. Create an LLMdantic client:

  • Provide input and output models, objective, and configuration.

Tip

The objective is a prompt that will be used to generate the actual prompt sent to the LLM. It should be a high-level description of the task you want the LLM to perform.

The inp_schema and out_schema are the input and output models you defined in the previous step.

The retries parameter is the number of times the LLMdantic will retry the request in case of failure.

from llmdantic import LLMdantic, LLMdanticConfig  
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

config: LLMdanticConfig = LLMdanticConfig(
    objective="Summarize the text", 
    inp_schema=SummarizeInput,
    out_schema=SummarizeOutput, 
    retries=3,
)

llmdantic = LLMdantic(llm=llm, config=config)

Here's the prompt template generated based on the input and output models:

Objective: Summarize the text

Input 'SummarizeInput': 
{input}

Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS

{format_instructions}

3. Generate output using the LLMdantic:

Tip

The invoke method is used for single requests, while the batch method is used for batch processing.

The invoke method returns an instance of LLMdanticResult, which contains the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the LLMdanticResult model for more details.

from llmdantic import LLMdanticResult

data = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)

output: Optional[SummarizeOutput] = result.output

if output:
    print(output.summary)

Here's the actual prompt sent to the LLM based on the input data:

Objective: Summarize the text

Input 'SummarizeInput': 
{'text': 'A long article about natural language processing...'}

Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
{"properties": {"summary": {"title": "Summary", "type": "string"}}, "required": ["summary"]}
  • For batch processing, pass a list of input data.

Important

The batch method returns a list of LLMdanticResult instances, each containing the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the LLMdanticResult model for more details.

The concurrency parameter is the number of concurrent requests to be made. Please check the usage limits of the LLM provider before setting this value.

data: List[SummarizeInput] = [
    SummarizeInput(text="A long article about natural language processing..."),
    SummarizeInput(text="A long article about computer vision...")  
]
results: List[LLMdanticResult] = llmdantic.batch(data, concurrency=2)

for result in results:
    if result.output:
        print(result.output.summary)

4. Monitor usage and costs:

Important

The cost tracking feature is currently available for OpenAI models only.

The usage attribute returns an instance of LLMdanticUsage, which contains the number of input and output tokens, successful requests, cost, and successful outputs. Check out the LLMdanticUsage model for more details.

Please note that the usage is tracked for the entire lifetime of the LLMdantic instance.

  • Use the cost attribute of the LLMdanticResult to track the cost of the request (currently available for OpenAI models).

  • Use the usage attribute of the LLMdantic to track the usage stats overall.

from llmdantic import LLMdanticResult

data: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")  
result: LLMdanticResult = llmdantic.invoke(data)

if result.output:
    print(result.output.summary)

# Track the cost of the request (OpenAI models only)
print(f"Cost: {result.cost}")  

# Track the usage stats
print(f"Usage: {llmdantic.usage}")
Cost: 0.0003665
Overall Usage: LLMdanticUsage(
  inp_tokens=219,
  out_tokens=19,
  total_tokens=238,
  successful_requests=1,
  cost=0.000367,
  successful_outputs=1
)

Advanced Usage 🛠

LLMdantic is built on top of the langchain package, which provides a modular and extensible framework for working with LLMs. You can easily switch between different LLMs and customize your experience.

Switching LLMs

Important

Make sure to set the required API keys for the new LLM you plan to use.

The llm parameter of the LLMdantic class should be an instance of BaseLanguageModel from the langchain package.

Tip

You can use the langchain_community package to access a wide range of LLMs from different providers.

You may need to provide model_name, api_key, and other parameters based on the LLM you want to use. Check out the documentation of the respective LLM provider for more details.

from llmdantic import LLMdantic, LLMdanticConfig
from langchain_community.llm.ollama import Ollama
from langchain.llms.base import BaseLanguageModel

llm: BaseLanguageModel = Ollama()

config: LLMdanticConfig = LLMdanticConfig(
    objective="Summarize the text",
    inp_schema=SummarizeInput, 
    out_schema=SummarizeOutput,
    retries=3,
)

llmdantic = LLMdantic(
    llm=llm,
    config=config
)

Contributing 🤝

Contributions are welcome! Whether you're fixing bugs, adding new features, or improving documentation, your help makes LLMdantic better for everyone. Feel free to open an issue or submit a pull request.

License 📄

LLMdantic is released under the MIT License. Feel free to use it, contribute, and spread the word!

llmdantic's People

Contributors

firattamur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

hbcbh1999

llmdantic's Issues

[FEAT] Improve README.

  • Add references to langchain and pydantic.
  • Add prompt template and example prompts generated with input.
  • Add use case such Translation, Summarization etc.
  • Add a flow of how package works.

[BUG] Output have field with None and text have dict with field with text

Description

Hey, seems like a fun library and tried to use example that's similar to one that's in readme (would be nice to have a script/notebook to check out library). I used AzureChatOpenAI as a llm.

The result (LLMdanticResult) have field with None and result's text have dict that's filled.
text - {"rewritten": "Miód to ulubiony przysmak Kubusia Puchatka."} - Honey is Winnie the Pooh's favorite treat.
output - rewritten=None

To Reproduce
Steps to reproduce the behavior:

# setup
llm = AzureChatOpenAI(
    api_key=AZURE_API_KEY,
    api_version=AZURE_API_VERSION,
    azure_endpoint=RESOURCE_ENDPOINT,
    azure_deployment=AZURE_DEPLOY,
    temperature=0,
    verbose=True,
)
class RewriteInput(BaseModel):
    text: str


class RewriteOutput(BaseModel):
    rewritten: str

    @field_validator("rewritten")
    def rewritten_text_must_be_in_polish(cls, v) -> bool:
        """ Rewritten text must be in polish language """
        ...

    @field_validator("rewritten")
    def rewritten_text_must_not_be_empty(cls, v) -> bool:
        """Rewritten text cannot be empty """
        ...

    @field_validator("rewritten")
    def rewritten_text_must_be_short(cls, v) -> bool:
        """Rewritten must be less than 100 words """
        ...

config: LLMdanticConfig = LLMdanticConfig(
    objective="Rewrite the polish text and convey the same meaning but with different phrasing and structure to ensure it is fresh and engaging.",
    inp_schema=RewriteInput,
    out_schema=RewriteOutput, 
    retry=3,
    verbose=True,
)

llmdantic = LLMdantic(llm=llm, config=config)

# polish for Winnie the Pooh likes honey
data = RewriteInput(text="Kubuś Puchatek lubi miód.")
result = llmdantic.invoke(data)
print(result.text)
# output: {"rewritten": "Miód to ulubiony przysmak Kubusia Puchatka."} - Honey is Winnie the Pooh's favorite treat.
print(result.output)
# output: rewritten=None

Expected behavior
Output should have filled fields from llm

Integration with Autogen and CrewAI for structuring the outputs of agents[FEAT]

Is your feature request related to a problem? Please describe.
The tool is fantastic but it would be great to integrate its functionality with Agent frameworks like CrewAI or Autogen to control the outputs of the agents within the framework. This could potentially be game changing.

Describe the solution you'd like
Autogen and CrewAI allow you to create tools specific to an agent. We should be able to easily create an agent tool to check on outputs and ensure consistency of the agent in getting it's tasks done.

Describe alternatives you've considered
Currently my attempts are to use pydantic structures to achieve this but it doesn't seem to work well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.