The llmdantic from firattamur

Structured Output Is All You Need!

LLMdantic is a powerful and efficient Python library that simplifies the integration of Large Language Models (LLMs) into your projects. Built on top of the incredible Langchain package and leveraging the power of Pydantic models, LLMdantic provides a seamless and structured approach to working with LLMs.

Features 🚀

🌐 Wide range of LLM support through Langchain integrations
🛡️ Ensures data integrity with Pydantic models for input and output validation
🧩 Modular and extensible design for easy customization
💰 Cost tracking and optimization for OpenAI models
🚀 Efficient batch processing for handling multiple data points
🔄 Robust retry mechanism for smooth and uninterrupted experience

Getting Started 🌟

Requirements

Before using LLMdantic, make sure you have set the required API keys for the LLMs you plan to use. For example, if you're using OpenAI's models, set the OPENAI_API_KEY environment variable:

export OPENAI_API_KEY="your-api-key"

If you're using other LLMs, follow the instructions provided by the respective providers in Langchain's documentation.

Installation

pip install llmdantic

Usage

1. Define input and output schemas using Pydantic:

Use Pydantic to define input and output models with custom validation rules.

Important

Add docstrings to validation rules to provide prompts for the LLM. This will help the LLM understand the validation rules and provide better results

from pydantic import BaseModel, field_validator

class SummarizeInput(BaseModel):
    text: str

class SummarizeOutput(BaseModel):
    summary: str

    @field_validator("summary")  
    def summary_must_not_be_empty(cls, v) -> bool:
        """Summary cannot be empty"""  # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
        if not v.strip():
            raise
        return v

    @field_validator("summary")
    def summary_must_be_short(cls, v) -> bool:  
        """Summary must be less than 100 words"""  # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
        if len(v.split()) > 100:
            raise  
        return v

2. Create an LLMdantic client:

Provide input and output models, objective, and configuration.

Tip

The objective is a prompt that will be used to generate the actual prompt sent to the LLM. It should be a high-level description of the task you want the LLM to perform.

The inp_schema and out_schema are the input and output models you defined in the previous step.

The retries parameter is the number of times the LLMdantic will retry the request in case of failure.

from llmdantic import LLMdantic, LLMdanticConfig  
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

config: LLMdanticConfig = LLMdanticConfig(
    objective="Summarize the text", 
    inp_schema=SummarizeInput,
    out_schema=SummarizeOutput, 
    retries=3,
)

llmdantic = LLMdantic(llm=llm, config=config)

Here's the prompt template generated based on the input and output models:

Objective: Summarize the text

Input 'SummarizeInput': 
{input}

Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS

{format_instructions}

3. Generate output using the LLMdantic:

Tip

The invoke method is used for single requests, while the batch method is used for batch processing.

The invoke method returns an instance of LLMdanticResult, which contains the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the LLMdanticResult model for more details.

from llmdantic import LLMdanticResult

data = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)

output: Optional[SummarizeOutput] = result.output

if output:
    print(output.summary)

Here's the actual prompt sent to the LLM based on the input data:

Objective: Summarize the text

Input 'SummarizeInput': 
{'text': 'A long article about natural language processing...'}

Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
{"properties": {"summary": {"title": "Summary", "type": "string"}}, "required": ["summary"]}

For batch processing, pass a list of input data.

Important

The batch method returns a list of LLMdanticResult instances, each containing the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the LLMdanticResult model for more details.

The concurrency parameter is the number of concurrent requests to be made. Please check the usage limits of the LLM provider before setting this value.

data: List[SummarizeInput] = [
    SummarizeInput(text="A long article about natural language processing..."),
    SummarizeInput(text="A long article about computer vision...")  
]
results: List[LLMdanticResult] = llmdantic.batch(data, concurrency=2)

for result in results:
    if result.output:
        print(result.output.summary)

4. Monitor usage and costs:

Important

The cost tracking feature is currently available for OpenAI models only.

The usage attribute returns an instance of LLMdanticUsage, which contains the number of input and output tokens, successful requests, cost, and successful outputs. Check out the LLMdanticUsage model for more details.

Please note that the usage is tracked for the entire lifetime of the LLMdantic instance.

Use the cost attribute of the LLMdanticResult to track the cost of the request (currently available for OpenAI models).
Use the usage attribute of the LLMdantic to track the usage stats overall.

from llmdantic import LLMdanticResult

data: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")  
result: LLMdanticResult = llmdantic.invoke(data)

if result.output:
    print(result.output.summary)

# Track the cost of the request (OpenAI models only)
print(f"Cost: {result.cost}")  

# Track the usage stats
print(f"Usage: {llmdantic.usage}")

Cost: 0.0003665
Overall Usage: LLMdanticUsage(
  inp_tokens=219,
  out_tokens=19,
  total_tokens=238,
  successful_requests=1,
  cost=0.000367,
  successful_outputs=1
)

Advanced Usage 🛠

LLMdantic is built on top of the langchain package, which provides a modular and extensible framework for working with LLMs. You can easily switch between different LLMs and customize your experience.

Switching LLMs

Important

Make sure to set the required API keys for the new LLM you plan to use.

The llm parameter of the LLMdantic class should be an instance of BaseLanguageModel from the langchain package.

Tip

You can use the langchain_community package to access a wide range of LLMs from different providers.

You may need to provide model_name, api_key, and other parameters based on the LLM you want to use. Check out the documentation of the respective LLM provider for more details.

from llmdantic import LLMdantic, LLMdanticConfig
from langchain_community.llm.ollama import Ollama
from langchain.llms.base import BaseLanguageModel

llm: BaseLanguageModel = Ollama()

config: LLMdanticConfig = LLMdanticConfig(
    objective="Summarize the text",
    inp_schema=SummarizeInput, 
    out_schema=SummarizeOutput,
    retries=3,
)

llmdantic = LLMdantic(
    llm=llm,
    config=config
)

Contributing 🤝

Contributions are welcome! Whether you're fixing bugs, adding new features, or improving documentation, your help makes LLMdantic better for everyone. Feel free to open an issue or submit a pull request.

License 📄

LLMdantic is released under the MIT License. Feel free to use it, contribute, and spread the word!

[BUG] Output have field with None and text have dict with field with text

Description

Hey, seems like a fun library and tried to use example that's similar to one that's in readme (would be nice to have a script/notebook to check out library). I used AzureChatOpenAI as a llm.

The result (LLMdanticResult) have field with None and result's text have dict that's filled.
text - {"rewritten": "Miód to ulubiony przysmak Kubusia Puchatka."} - Honey is Winnie the Pooh's favorite treat.
output - rewritten=None

To Reproduce
Steps to reproduce the behavior:

# setup
llm = AzureChatOpenAI(
    api_key=AZURE_API_KEY,
    api_version=AZURE_API_VERSION,
    azure_endpoint=RESOURCE_ENDPOINT,
    azure_deployment=AZURE_DEPLOY,
    temperature=0,
    verbose=True,
)
class RewriteInput(BaseModel):
    text: str


class RewriteOutput(BaseModel):
    rewritten: str

    @field_validator("rewritten")
    def rewritten_text_must_be_in_polish(cls, v) -> bool:
        """ Rewritten text must be in polish language """
        ...

    @field_validator("rewritten")
    def rewritten_text_must_not_be_empty(cls, v) -> bool:
        """Rewritten text cannot be empty """
        ...

    @field_validator("rewritten")
    def rewritten_text_must_be_short(cls, v) -> bool:
        """Rewritten must be less than 100 words """
        ...

config: LLMdanticConfig = LLMdanticConfig(
    objective="Rewrite the polish text and convey the same meaning but with different phrasing and structure to ensure it is fresh and engaging.",
    inp_schema=RewriteInput,
    out_schema=RewriteOutput, 
    retry=3,
    verbose=True,
)

llmdantic = LLMdantic(llm=llm, config=config)

# polish for Winnie the Pooh likes honey
data = RewriteInput(text="Kubuś Puchatek lubi miód.")
result = llmdantic.invoke(data)
print(result.text)
# output: {"rewritten": "Miód to ulubiony przysmak Kubusia Puchatka."} - Honey is Winnie the Pooh's favorite treat.
print(result.output)
# output: rewritten=None

Expected behavior
Output should have filled fields from llm

firattamur / llmdantic Goto Github PK

llmdantic's Introduction

Structured Output Is All You Need!

Features 🚀

Getting Started 🌟

Requirements

Installation

Usage

1. Define input and output schemas using Pydantic:

2. Create an LLMdantic client:

3. Generate output using the LLMdantic:

4. Monitor usage and costs:

Advanced Usage 🛠

Contributing 🤝

License 📄

llmdantic's People

Contributors

Stargazers

Watchers

Forkers

llmdantic's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs