Proposal I would love to be able to add prompt test data and expec

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Prompt testing about pezzo HOT 5 OPEN

ericflecher commented on May 21, 2024 3

Prompt testing

from pezzo.

Comments (5)

arielweinberger commented on May 21, 2024

Hi @ericflecher, really appreciate your involvement and issues. It really helps.

I can clearly see the value in automated tests. The question is, what exactly do we want to test? The nature of the AI models is that the response might change from time to time.

I can think of the ability to define a strict JSON response schema (e.g. this field should be a number, this field should be a string, and so on) and validate it. Would that be useful?

Can you try to think of a concrete example?

from pezzo.

evereq commented on May 21, 2024

@arielweinberger "The nature of the AI models is that the response might change from time to time.", yes, unless you construct the prompt the way that it should return always the same and so with the extremely low probability it will be possible to return something different.

Of course, OpenAI (and some other companies) did not guarantee that, but sending simple requests like:

What was the capital of France in 2022? 
Please answer just a city name, capitalized, and don't add anything else. 
One world, please!

Will return "Paris" with a probability of 100% minus infinitesimal ;)

So I think such prompts can easily be used in basic tests (but I could be wrong haha)

from pezzo.

evereq commented on May 21, 2024

P.S. Or more simply ask: "What is the value of 1+1?" haha, but those math deterministic rules might actually be not good test cases for some systems with general knowledge, not just basic math knowledge...

from pezzo.

arielweinberger commented on May 21, 2024

I've been struck pretty hard in one of my AI powered product when giving perfectly accurate instructions with 0 temperature.

An example is "return the following text's language in ISO-639-2 format".

I'd get a different format sometimes.

I think this highlights the relevance of this testing feature, so we'll add it to our roadmap.

from pezzo.

Maxim-Filimonov commented on May 21, 2024

The approach we came up with is kinda two step which includes a special proxy.
Maybe pezzo can stand in for that proxy.

What we do at the moment when we run our unit tests is that we run them ONCE against real LLM through a special proxy. That proxy hashes all parameters and prompt and what it's exactly the same parameters and prompts it won't call url and instead return value from the proxy.
Proxy also helps to speed up our integration tests and avoid paying openai every time we push a commit.

When we change either parameters or prompt itself proxy will call real LLM and tests will be able to verify that the changes didn't break anything.

from pezzo.

Prompt testing about pezzo HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs