GithubHelp home page GithubHelp logo

Prompt testing about pezzo HOT 5 OPEN

ericflecher avatar ericflecher commented on May 21, 2024 3
Prompt testing

from pezzo.

Comments (5)

arielweinberger avatar arielweinberger commented on May 21, 2024

Hi @ericflecher, really appreciate your involvement and issues. It really helps.

I can clearly see the value in automated tests. The question is, what exactly do we want to test? The nature of the AI models is that the response might change from time to time.

I can think of the ability to define a strict JSON response schema (e.g. this field should be a number, this field should be a string, and so on) and validate it. Would that be useful?

Can you try to think of a concrete example?

from pezzo.

evereq avatar evereq commented on May 21, 2024

@arielweinberger "The nature of the AI models is that the response might change from time to time.", yes, unless you construct the prompt the way that it should return always the same and so with the extremely low probability it will be possible to return something different.

Of course, OpenAI (and some other companies) did not guarantee that, but sending simple requests like:

What was the capital of France in 2022? 
Please answer just a city name, capitalized, and don't add anything else. 
One world, please!

Will return "Paris" with a probability of 100% minus infinitesimal ;)

So I think such prompts can easily be used in basic tests (but I could be wrong haha)

from pezzo.

evereq avatar evereq commented on May 21, 2024

P.S. Or more simply ask: "What is the value of 1+1?" haha, but those math deterministic rules might actually be not good test cases for some systems with general knowledge, not just basic math knowledge...

from pezzo.

arielweinberger avatar arielweinberger commented on May 21, 2024

I've been struck pretty hard in one of my AI powered product when giving perfectly accurate instructions with 0 temperature.

An example is "return the following text's language in ISO-639-2 format".

I'd get a different format sometimes.

I think this highlights the relevance of this testing feature, so we'll add it to our roadmap.

from pezzo.

Maxim-Filimonov avatar Maxim-Filimonov commented on May 21, 2024

The approach we came up with is kinda two step which includes a special proxy.
Maybe pezzo can stand in for that proxy.

What we do at the moment when we run our unit tests is that we run them ONCE against real LLM through a special proxy. That proxy hashes all parameters and prompt and what it's exactly the same parameters and prompts it won't call url and instead return value from the proxy.
Proxy also helps to speed up our integration tests and avoid paying openai every time we push a commit.

When we change either parameters or prompt itself proxy will call real LLM and tests will be able to verify that the changes didn't break anything.

from pezzo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.