GithubHelp home page GithubHelp logo

parea-ai / parea-sdk-py Goto Github PK

View Code? Open in Web Editor NEW
47.0 2.0 5.0 5.44 MB

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

Home Page: https://docs.parea.ai/sdk/python

License: Apache License 2.0

Makefile 0.62% Python 88.36% Jupyter Notebook 11.02%
llm llm-evaluation llm-tools llmops llms-benchmarking llm-eval llm-evaluation-framework llm-evaluation-toolkit prompt-engineering generative-ai

parea-sdk-py's Introduction

Test, evaluate & monitor your AI application

Test, evaluate & monitor your AI application

PyPI PyPI - Downloads from official pypistats License

๐Ÿฆ Twitter/X ย ย โ€ขย ย  ๐Ÿ“ข Discord ย ย โ€ขย ย  Parea AI ย ย โ€ขย ย  ๐Ÿ“™ Documentation

Parea AI provides a SDK to evaluate & monitor your AI applications. Below you can see quickstarts to:

Our full docs are here.

Installation

pip install -U parea-ai

or install with Poetry

poetry add parea-ai

Evaluating Your LLM App

Testing your AI app means to execute it over a dataset and score it with an evaluation function. This is done in Parea by defining & running experiments. Below you can see can example of how to test a greeting bot with the Levenshtein distance metric.

from parea import Parea, trace
from parea.evals.general import levenshtein

p = Parea(api_key="<<PAREA_API_KEY>>")  # replace with Parea AI API key

# use the trace decorator to score the output with the Levenshtein distance  
@trace(eval_funcs=[levenshtein])
def greeting(name: str) -> str:
    return f"Hello {name}"

data = [
    {"name": "Foo", "target": "Hi Foo"},
    {"name": "Bar", "target": "Hello Bar"},
]

p.experiment(
    name="Greeting",
    data=data,
    func=greeting,
).run()

In the snippet above, we used the trace decorator to capture any inputs & outputs of the function. This decorator also enables to score the output by executing the levenshtein eval in the background. Then, we defined an experiment via p.experiment to evaluate our function (greeting) over a dataset (here a list of dictionaries). Finally, calling run will execute the experiment, and create a report of outputs, scores & traces for any sample of the dataset. You can find a link to the executed experiment here. (todo: fill-in experiment)

More Resources

Read more about how to write, run & analyze experiments in our docs.

Logging & Observability

By wrapping the respective clients, you can automatically log all your LLM calls to OpenAI & Anthropic. Additionally, using the trace decorator you can create hierarchical traces of your LLM application to e.g. associate LLM calls with the retrieval step of a RAG pipeline. You can see the full observability documentation here and our integrations into LangChain, Instructor, DSPy, LiteLLM & more here.

Automatically log all your OpenAI calls

To automatically log any OpenAI call, you can wrap the OpenAI client with the Parea client using the wrap_openai_client method.

from openai import OpenAI
from parea import Parea

client = OpenAI(api_key="OPENAI_API_KEY")

# All you need to do is add these two lines
p = Parea(api_key="PAREA_API_KEY")  # replace with your API key
p.wrap_openai_client(client)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Write a Hello World program in Python using FastAPI.",
        }
    ],
)
print(response.choices[0].message.content)

Automatically log all your Anthropic calls

To automatically log any Anthropic call, you can wrap the Anthropic client with the Parea client using the wrap_anthropic_client method.

import anthropic
from parea import Parea

p = Parea(api_key="PAREA_API_KEY")  # replace with your API key

client = anthropic.Anthropic()
p.wrap_anthropic_client(client)

message = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Write a Hello World program in Python using FastAPI.",
        }
    ],
)
print(message.content[0].text)

Nested traces

By using the trace decorator, you can create hierarchical traces of your LLM application.

from openai import OpenAI
from parea import Parea, trace

client = OpenAI(api_key="OPENAI_API_KEY")  # replace with your API key

p = Parea(api_key="PAREA_API_KEY")  # replace with your API key
p.wrap_openai_client(client)


# We generally recommend creating a helper function to make LLM API calls.
def llm(messages: list[dict[str, str]]) -> str:
    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    return response.choices[0].message.content


# This will give the Span the name of the function.
# Without the decorator the default name for all LLM call logs is `llm-openai`
@trace
def hello_world(lang: str, framework: str):
    return llm([{"role": "user", "content": f"Write a Hello World program in {lang} using {framework}."}])

@trace
def critique_code(code: str):
    return llm([{"role": "user", "content": f"How can we improve this code: \n {code}"}])

# Our top level function is called chain. By adding the trace decorator here,
# all sub-functions will automatically be logged and associated with this trace.
# Notice, you can also add metadata to the trace, we'll revisit this functionality later.
@trace(metadata={"purpose": "example"}, end_user_identifier="John Doe")
def chain(lang: str, framework: str) -> str:
    return critique_code(hello_world(lang, framework))


print(chain("Python", "FastAPI"))

Deploying Prompts

Deployed prompts enable collaboration with non-engineers such as product managers & subject-matter experts. Users can iterate, refine & test prompts on Parea's playground. After tinkering, you can deploy that prompt which means that it is exposed via an API endpoint to integrate it into your application. Checkout our full docs here.

from parea import Parea
from parea.schemas.models import Completion, UseDeployedPrompt, CompletionResponse, UseDeployedPromptResponse


p = Parea(api_key="<PAREA_API_KEY>")

# You will find this deployment_id in the Parea dashboard
deployment_id = '<DEPLOYMENT_ID>'

# Assuming your deployed prompt's message is:
# {"role": "user", "content": "Write a hello world program using {{x}} and the {{y}} framework."}
inputs = {"x": "Golang", "y": "Fiber"}

# You can easily unpack a dictionary into an attrs class
test_completion = Completion(
  **{
    "deployment_id": deployment_id,
    "llm_inputs": inputs,
    "metadata": {"purpose": "testing"}
  }
)

# By passing in my inputs, in addition to the raw message with unfilled variables {{x}} and {{y}}, 
# you we will also get the filled-in prompt:
# {"role": "user", "content": "Write a hello world program using Golang and the Fiber framework."}
test_get_prompt = UseDeployedPrompt(deployment_id=deployment_id, llm_inputs=inputs)


def main():
  completion_response: CompletionResponse = p.completion(data=test_completion)
  print(completion_response)
  deployed_prompt: UseDeployedPromptResponse = p.get_prompt(data=test_get_prompt)
  print("\n\n")
  print(deployed_prompt)

๐Ÿ›ก License

License

This project is licensed under the terms of the Apache Software License 2.0 license. See LICENSE for more details.

๐Ÿ“ƒ Citation

@misc{parea-sdk,
  author = {joel-parea-ai,joschkabraun},
  title = {Parea python sdk},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/parea-ai/parea-sdk}}
}

parea-sdk-py's People

Contributors

dependabot[bot] avatar fergusonrae avatar jalexanderii avatar joschkabraun avatar ss108 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

parea-sdk-py's Issues

Parea wrapper not re-raising root exception

๐Ÿ› Bug Report

The Parea wrapper code returns in the finally block which swallows the exception caught when actually calling the OpenAI models. This makes things very difficult to log/monitor/debug as the root exception is swallowed by Parea and failures occur downstream.

Link to offending code.

return self._cleanup_trace(trace_id, start_time, error, cache_hit, args, kwargs, response)

๐Ÿ”ฌ How To Reproduce

Steps to reproduce the behavior:

  1. I'm not sure how to repro an OpenAI failure but it can be monkey patched if needed. Looking at the offending code probably provides all the context necessary

Code sample

Try running this function. 1 is returned and the exception isn't raised

def run():
    try:
        raise Exception("bad")
    except Exception as e:
        print(e)
        raise e
    finally:
        return 1

Environment

  • OS: MacOS apple silicon
  • Python version: 3.9
python --version

๐Ÿ“ˆ Expected behavior

The error isn't swallowed by Parea and is surfaced to the consumer of the Open API call.

๐Ÿ“Ž Additional context

I ran into this using Langchain with the following (abbreviated) code

    llm = ChatOpenAI(
        openai_api_key=openai_api_key,
        temperature=0,
        model_name=model,
        model_kwargs=llm_kwargs,
        max_retries=3
    )
    response = llm([HumanMessage(content="model query here...any will work")

This became relevant on the OpenAI outage on 2023-10-19

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.