Comments (5)
Hi @ericflecher, really appreciate your involvement and issues. It really helps.
I can clearly see the value in automated tests. The question is, what exactly do we want to test? The nature of the AI models is that the response might change from time to time.
I can think of the ability to define a strict JSON response schema (e.g. this field should be a number, this field should be a string, and so on) and validate it. Would that be useful?
Can you try to think of a concrete example?
from pezzo.
@arielweinberger "The nature of the AI models is that the response might change from time to time.", yes, unless you construct the prompt the way that it should return always the same and so with the extremely low probability it will be possible to return something different.
Of course, OpenAI (and some other companies) did not guarantee that, but sending simple requests like:
What was the capital of France in 2022?
Please answer just a city name, capitalized, and don't add anything else.
One world, please!
Will return "Paris" with a probability of 100% minus infinitesimal ;)
So I think such prompts can easily be used in basic tests (but I could be wrong haha)
from pezzo.
P.S. Or more simply ask: "What is the value of 1+1?" haha, but those math deterministic rules might actually be not good test cases for some systems with general knowledge, not just basic math knowledge...
from pezzo.
I've been struck pretty hard in one of my AI powered product when giving perfectly accurate instructions with 0 temperature.
An example is "return the following text's language in ISO-639-2 format".
I'd get a different format sometimes.
I think this highlights the relevance of this testing feature, so we'll add it to our roadmap.
from pezzo.
The approach we came up with is kinda two step which includes a special proxy.
Maybe pezzo can stand in for that proxy.
What we do at the moment when we run our unit tests is that we run them ONCE against real LLM through a special proxy. That proxy hashes all parameters and prompt and what it's exactly the same parameters and prompts it won't call url and instead return value from the proxy.
Proxy also helps to speed up our integration tests and avoid paying openai every time we push a commit.
When we change either parameters or prompt itself proxy will call real LLM and tests will be able to verify that the changes didn't break anything.
from pezzo.
Related Issues (20)
- Support observability for multi-modal (vision)
- Problem getting the app running locally HOT 13
- Billing HOT 3
- "undefined" displayed next to usernames HOT 3
- [🐛 Bug]: Applications crash when the filter component fails to match values. HOT 5
- (Inquiry) Prisma: usage of `binary` engine type
- Support self-hosted models (non-OpenAI flavor)
- Sign in/up with google Error "Access blocked" HOT 4
- window run docker-desktop docker-compose -f docker-compose.infra.yaml up HOT 1
- Angular application cannot build when "@pezzo/client" package is added
- Improve the "chat" UI for the prompt editor to be on par with OpenAI's Playground
- Up-to-date Helm chart HOT 3
- Other than OpenAI models
- Running in local desktop
- Run Pezzo in Local
- Unable to start docker compose
- Customizing OpenAI Models and Endpoint HOT 2
- NX Access Token- not found
- Running locally; always getting "Invalid Pezzo Project ID"
- Asks for .env file when there's nothing like that in the documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pezzo.