mshumer / gpt-prompt-engineer Goto Github PK

License: MIT License

Jupyter Notebook 100.00%

gpt-prompt-engineer's Introduction

gpt-prompt-engineer

Overview

Prompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. gpt-prompt-engineer is a tool that takes this experimentation to a whole new level.

Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.

New 3/20/24: The Claude 3 Opus Version

I've added a new version of gpt-prompt-engineer that takes full advantage of Anthropic's Claude 3 Opus model. This version auto-generates test cases and allows for the user to define multiple input variables, making it even more powerful and flexible. Try it out with the claude-prompt-engineer.ipynb notebook in the repo!

New 3/20/24: Claude 3 Opus -> Haiku Conversion Version

This notebook enables you to build lightning-fast, performant AI systems at a fraction of the typical cost. By using Claude 3 Opus to establish the latent space and Claude 3 Haiku for the actual generation, you can achieve amazing results. The process works by leveraging Opus to produce a collection of top-notch examples, which are then used to guide Haiku in generating output of comparable quality while dramatically reducing both latency and cost per generation. Try it out with the opus-to-haiku-conversion.ipynb notebook in the repo!

Features

Prompt Generation: Using GPT-4, GPT-3.5-Turbo, or Claude 3 Opus, gpt-prompt-engineer can generate a variety of possible prompts based on a provided use-case and test cases.
Prompt Testing: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.

ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.
Classification Version: The gpt-prompt-engineer -- Classification Version notebook is designed to handle classification tasks. It evaluates the correctness of a test case by matching it to the expected output ('true' or 'false') and provides a table with scores for each prompt.

Claude 3 Version: The claude-prompt-engineer notebook is designed to work with Anthropic's Claude 3 Opus model. It auto-generates test cases and allows for multiple input variables, making it even more powerful and flexible.
Claude 3 Opus -> Haiku Conversion Version: Designed to preserve Opus' quality for your use-case while getting the speed + cost benefits of using Haiku.
Weights & Biases Logging: Optional logging to Weights & Biases of your configs such as temperature and max tokens, the system and user prompts for each part, the test cases used and the final ranked ELO rating for each candidate prompt. Set use_wandb to True to use.
Portkey: Optional tool to log and trace all the prompt chains and their responses. Set use_portkey to True to use.

Setup

Open the notebook in Google Colab or in a local Jupyter notebook. For classification, use this one.. For the Claude 3 version, use this one.
Add your OpenAI API key to the line openai.api_key = "ADD YOUR KEY HERE". If you're using the Claude 3 version, add your Anthropic API key to the line ANTHROPIC_API_KEY = "ADD YOUR KEY HERE".

How to Use

If you are using the GPT-4 version, define your use-case and test cases. The use-case is a description of what you want the AI to do. Test cases are specific prompts that you would like the AI to respond to. For example:

description = "Given a prompt, generate a landing page headline." # this style of description tends to work well

test_cases = [
    {
        'prompt': 'Promoting an innovative new fitness app, Smartly',
    },
    {
        'prompt': 'Why a vegan diet is beneficial for your health',
    },
    {
        'prompt': 'Introducing a new online course on digital marketing',
    },
    {
        'prompt': 'Launching a new line of eco-friendly clothing',
    },
    {
        'prompt': 'Promoting a new travel blog focusing on budget travel',
    },
    {
        'prompt': 'Advertising a new software for efficient project management',
    },
    {
        'prompt': 'Introducing a new book on mastering Python programming',
    },
    {
        'prompt': 'Promoting a new online platform for learning languages',
    },
    {
        'prompt': 'Advertising a new service for personalized meal plans',
    },
    {
        'prompt': 'Launching a new app for mental health and mindfulness',
    }
]

For the classification version, your test cases should be in the format:

test_cases = [
    {
        'prompt': 'I had a great day!',
        'output': 'true'
    },
    {
        'prompt': 'I am feeling gloomy.',
        'output': 'false'
    },
    // add more test cases here
]

For the Claude 3 version, you can define input variables in addition to the use-case description:

description = "Given a prompt, generate a personalized email response."

input_variables = [
    {"variable": "SENDER_NAME", "description": "The name of the person who sent the email."},
    {"variable": "RECIPIENT_NAME", "description": "The name of the person receiving the email."},
    {"variable": "TOPIC", "description": "The main topic or subject of the email. One to two sentences."}
]

The test cases will be auto-generated based on the use-case description and input variables.

Choose how many prompts to generate. Keep in mind, this can get expensive if you generate many prompts. 10 is a good starting point.
Call generate_optimal_prompt(description, test_cases, number_of_prompts) to generate a list of potential prompts, and test and rate their performance. For the classification version, just run the last cell. For the Claude 3 version, call generate_optimal_prompt(description, input_variables, num_test_cases, number_of_prompts, use_wandb).
The final ELO ratings will be printed in a table, sorted in descending order. The higher the rating, the better the prompt.

For the classification version, the scores for each prompt will be printed in a table (see the image above).

Contributions are welcome! Some ideas:

have a number of different system prompt generators that create different styles of prompts, to cover more ground (ex. examples, verbose, short, markdown, etc.)
automatically generate the test cases
expand the classification version to support more than two classes using tiktoken

License

This project is MIT licensed.

Contact

Matt Shumer - @mattshumer_

Project Link: https://github.com/mshumer/gpt-prompt-engineer

Lastly, if you want to try something even cooler than this, sign up for HyperWrite Personal Assistant (most of my time is spent on this). It's basically an AI with access to real-time information that a) is incredible at writing naturally, and b) can operate your web browser to complete tasks for you.

gpt-prompt-engineer's People

Contributors

Stargazers

Watchers

Forkers

stalinkay mz0in morganmcg1 seshakiran jcontre905 youminxue gladiopeace chorseng rkp64 gareththomasnz lupo-italiano-420 evgenipetrov manu87ds vesper8 hivewang houlong666 wooodhead navneet-g jan-karsten-kuhnke deshineni ali-wells touristshaun jeffara czhk555 xortical dazzaji brenthyll kennyhuangml100 jnuzxf demolicity apollohuang1 techventurebuilder cslovewl arunprsh zhangjunqiang varjit wodole lpai-org damonclifford kadyec techthiyanes ammarfahmy roon darmenliu weearis phil-mart 0xfreeman-ai jaytoday ericytex cmdr7 jaydubvegas jayinc pikachudratini theovan7 danielreuter michaelhoughtondebox onejb aria1991 royalflush31 codeaudit yazanghafir polya20 artyaltanzaya ohwaiter chironblue kleczekr dr-kaya tomchapin aceonaceon diegosiqueir4 honsa simonsan neuraloverflow trsaso artisr saridsa1 therakeshpurohit rwngwn sarvex rhinojosa mdaffern dejan-stankovic bwry muhammadmuzammilzia27 itsharex frostdude0014 shyamal-anadkat saifrahmed calvinalvin jithinraj brianjking mulxcode reynoldsm88 ideabrian bighomieprompts pirrone574s zhangyueyork hbcbh1999 zeroxclem lengocgiang

gpt-prompt-engineer's Issues

Continue to Get GPT-4 Auth Error after Changing Model to GPT-3.5-Turbo

I keep getting the below authentication error even after I've changed the model to gpt-3.5-turbo:

`---------------------------------------------------------------------------
AuthenticationError Traceback (most recent call last)
in <cell line: 1>()
----> 1 generate_optimal_prompt(description, test_cases, NUMBER_OF_PROMPTS, use_wandb)

6 frames
in generate_optimal_prompt(description, test_cases, number_of_prompts, use_wandb)
108
109 def generate_optimal_prompt(description, test_cases, number_of_prompts=10, use_wandb=False):
--> 110 prompts = generate_candidate_prompts(description, test_cases, number_of_prompts)
111 prompt_ratings = test_candidate_prompts(test_cases, description, prompts)
112

in generate_candidate_prompts(description, test_cases, number_of_prompts)
1 def generate_candidate_prompts(description, test_cases, number_of_prompts):
----> 2 outputs = openai.ChatCompletion.create(
3 model=CANDIDATE_MODEL, # change this to gpt-3.5-turbo if you don't have GPT-4 access
4 messages=[
5 {"role": "system", "content": system_gen_system_prompt},

/usr/local/lib/python3.10/dist-packages/openai/api_resources/chat_completion.py in create(cls, *args, **kwargs)
23 while True:
24 try:
---> 25 return super().create(*args, **kwargs)
26 except TryAgain as e:
27 if timeout is not None and time.time() > start + timeout:

/usr/local/lib/python3.10/dist-packages/openai/api_resources/abstract/engine_api_resource.py in create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
151 )
152
--> 153 response, _, api_key = requestor.request(
154 "post",
155 url,

/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in request(self, method, url, params, headers, files, stream, request_id, request_timeout)
296 request_timeout=request_timeout,
297 )
--> 298 resp, got_stream = self._interpret_response(result, stream)
299 return resp, got_stream, self.api_key
300

/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in _interpret_response(self, result, stream)
698 else:
699 return (
--> 700 self._interpret_response_line(
701 result.content.decode("utf-8"),
702 result.status_code,

/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
761 stream_error = stream and "error" in resp.data
762 if stream_error or not 200 <= rcode < 300:
--> 763 raise self.handle_error_response(
764 rbody, rcode, resp.data, rheaders, stream_error=stream_error
765 )

AuthenticationError: You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.`

Here is the cell that sets the models:

`# K is a constant factor that determines how much ratings change
K = 32

CANDIDATE_MODEL = 'gpt-3.5-turbo'
CANDIDATE_MODEL_TEMPERATURE = 0.9

GENERATION_MODEL = 'gpt-3.5-turbo'
GENERATION_MODEL_TEMPERATURE = 0.8
GENERATION_MODEL_MAX_TOKENS = 60

N_RETRIES = 3 # number of times to retry a call to the ranking model if it fails
RANKING_MODEL = 'gpt-3.5-turbo'
RANKING_MODEL_TEMPERATURE = 0.5

NUMBER_OF_PROMPTS = 10 # this determines how many candidate prompts to generate... the higher, the more expensive, but the better the results will be`

requests not sufficiently defined in opus-to-haiku part

the opus to haiku part is suffering from "requests" not being sufficiently defined
line 60 I think? just guessing here, sorry if this is not sufficient info.

response = requests.post("https://api.anthropic.com/v1/messages", headers=headers, json=data)

Integrate with R2R to generate optimized RAG pipelines?

R2R supports very seamless RAG pipeline deployment and is a great candidate for integrating with gpt-prompt-engineer to create optimizable pipelines.

suggestion

wouldn't it be nice if a user could start with a primary suggestion and then have chatgpt generate permutations of the suggestion instead of the user having to manually think of each them? Maybe have the option to choose how many permutations to generate.

Add LiteLLM support so that we can use different LLM for prompt generation

Instead of just open ai and claud ai support try to add LiteLLM( multi llm support foss solution )support to this project in such a way that we can add our local proxy server api endpoint which support either selfhosted open source llm or hosted open source like groq mistral/llama or proprietary LLM like Google Gemini and use that in this to generate prompt as per our need. I know performance might not be as good as of gpt 4 Still open source are capable to provide many times better solution than 3.5

Add a "plan" mode, to estimate the cost of the job

Refuel's Autolabel has a plan mode, which uses tiktoken to estimate the total cost of the job. It would be great to know in advance how much a gpt-prompt-engineer job will cost.

Fresh clone, can't npm install

Tried Node.js versions 20, 18, and 16 to see if it's a dependency compatibility issue, but the problem persists:

$ npm install
npm ERR! code ERESOLVE
npm ERR! ERESOLVE could not resolve
npm ERR!
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/vue
npm ERR!   vue@"^3.3.4" from [email protected]
npm ERR!   node_modules/nuxt
npm ERR!     dev nuxt@"^3.6.2" from the root project
npm ERR!     peer nuxt@"^3.6.1" from @nuxt/[email protected]
npm ERR!     node_modules/@nuxt/devtools
npm ERR!       dev @nuxt/devtools@"latest" from the root project
npm ERR!     1 more (@vueuse/nuxt)
npm ERR!   peer vue@">=3.0.0" from @vueuse/[email protected]
npm ERR!   node_modules/@vueuse/motion
npm ERR!     dev @vueuse/motion@"^2.0.0" from the root project
npm ERR!
npm ERR! Could not resolve dependency:
npm ERR! vue-echarts@"^6.6.0" from the root project
npm ERR!
npm ERR! Conflicting peer dependency: [email protected]
npm ERR! node_modules/vue
npm ERR!   peer vue@">= 2.5 < 2.7" from @vue/[email protected]
npm ERR!   node_modules/@vue/composition-api
npm ERR!     peerOptional @vue/composition-api@"^1.0.5" from [email protected]
npm ERR!     node_modules/vue-echarts
npm ERR!       vue-echarts@"^6.6.0" from the root project
npm ERR!
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR!

Getting KeyError: 'content' at Step 4

I've set my anthropic key, executed steps 1,2,3. At step 4:

result = run_haiku_conversion_process(task, prompt_exa7m7ple, response_example)
I'm getting an error:


KeyError                                  Traceback (most recent call last)
[<ipython-input-4-76ae4865d687>](https://localhost:8080/#) in <cell line: 1>()
----> 1 result = run_haiku_conversion_process(task, prompt_example, response_example)

1 frames
[<ipython-input-2-a6720f5aded8>](https://localhost:8080/#) in generate_candidate_prompts(task, prompt_example, response_example)
     60     response = requests.post("https://api.anthropic.com/v1/messages", headers=headers, json=data)
     61 
---> 62     response_text = response.json()['content'][0]['text']
     63 
     64     # Parse out the prompts and responses

KeyError: 'content'

是因为我的claude api key 没权限吗

Generating the prompts / responses...

KeyError Traceback (most recent call last)

in <cell line: 1>()
----> 1 result = run_haiku_conversion_process(task, prompt_example, response_example)

1 frames

in generate_candidate_prompts(task, prompt_example, response_example)
60 response = requests.post("https://api.anthropic.com/v1/messages", headers=headers, json=data)
61
---> 62 response_text = response.json()['content'][0]['text']
63
64 # Parse out the prompts and responses

KeyError: 'content'

Got error when running the sample notebook

I got the following error when running the sample gpt_prompt_engineer.ipynb notebook:
:

APIRemovedInV1:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

MODEL GPT-3.5 IS NOT FOUND

Openai ChatCompletion no longer supported?

I am getting an error and I see no one else getting the error. Does anyone have any idea how to fix this?

I have tried the `pip install openai==0.28' suggestion it told me and it didn't work. I've also tried it on Jupyter notebook but it seemed to do worse over there so I stayed over here on Google Colab.

Thoughts on why this tool is useful

After reading the code, I feel that the idea of this tool is to let gpt generate multiple candidate prompts, and then compare the results of multiple prompts. The results are still judged by gpt. Therefore, gpt is both an athlete and a referee, so why is it effective? Personal understanding may come from two aspects:

De-randomness: In this way, the prompt with the highest probability is selected to some extent. The essence is like de-randomization
Through different contexts, let gpt execute logic differently when it is an athlete and a referee, so it is effective

I want to ask if my understanding is correct, and if there are other reasons in it @mshumer

Can we add Gemini api support

I only have the free GPT-3.5 api Key ,but the access limit rate is 3 in 1 min, so I can‘t fiinish the running in the last command
RateLimitError: Rate limit reached for gpt-3.5-turbo in organization org-11tkZ8wacX4TyEJYhIFD378M on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by
Can we have a option to use the Gemini 1.0 pro. Just experiment, thanks

Running generate_optimal_prompt when the variables can hold a null value cause it to crash

I was trying to generate a prompt where my sentence could generate a json object , but a possible value for the json is null, so when is generating the test cases it looks like is using None and this can be put inside the var_dict, the problem is null is a valid output for the json and replacing it will empty string will only make the final output to use empty instead of null

Using Anthropic's Claude API

Could you include the use of Claude from Anthropic?

Here is the API documentation:
https://docs.anthropic.com/claude/reference/getting-started-with-the-api

they have a python library as per the link below. Would it be possible with this information?

https://docs.anthropic.com/claude/reference/client-libraries
https://github.com/anthropics/anthropic-sdk-python

RATE LIMIT EXCEEDED WITH GPT-3.5-TURBO

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

What are "test_cases"?

Hi there

I think this tool could be very useful! Thx for building it.
However, when starting to use it, I did not fully understand what test_cases are and how exactly they relate to the task for which I want to create the optimal prompt.

Could someone elaborate on what to enter here?

To make it a bit more explicit, here are two examples for which I would love to try out this tool:

I would like to find the optimal prompt to generate a catchy headline for a text-based social media app called Threads
I would like to find the optimal prompt to summarize a document such that it retains a lot of the details of the original document and does not turn it into something very generic.

Thanks a lot for the help!

Structured data extraction/known results for test cases

Hi there!

First off, thanks for this - it's great and as-is it's given me some ideas for prompt design 🙏

I'm working on trying to extract dates from arbitrary text and to produce JSON, so that given an optimised system prompt, I can pass GPT-3.5 an arbitrary string and it'll produce an array that corresponds to this TypeScript schema:

type Result =
    {"Millennium": {"year": number, "metadata"?: string}} |
    {"Century": {"year": number, "metadata"?: string}} |
    {"Decade": {"year": number, "metadata"?: string}} |
    {"Year": {"year": number, "metadata"?: string}} |
    {"Month": {"year": number, "month": number, "metadata"?: string}} |
    {"Day": {"year": number, "month": number, "day": number, "metadata"?: string}} |
    {"Range": {"start": Result, "end": Result, "metadata"?: string}} |
    {"Ambiguous": Result[]} |
    {"Present": {"metadata"?: string}}

My existing prompt is a many-shot prompt where I specify how a given date string should be parsed into JSON. This prompt works pretty well, but the prompt itself is over a thousand tokens, making evaluation costly.

If you're curious, here's a subset of the examples:

2024: `[{"Year":{"year":2024}}]`
c. 2016: `[{"Year":{"year":2016}}]`
1930-1937 1942-1945: `[{"Range":{"start":{"Year":{"year":1930}},"end":{"Year":{"year":1937}}}},{"Range":{"start":{"Year":{"year":1942}},"end":{"Year":{"year":1945}}}}]`
7–12 June 1967: `[{"Range":{"start":{"Day":{"year":1967,"month":6,"day":7}},"end":{"Day":{"year":1967,"month":6,"day":12}}}}]`
16, 20-27 March 1924: `[{"Day":{"year":1924,"month":3,"day":16}},{"Range":{"start":{"Day":{"year":1924,"month":3,"day":20}},"end":{"Day":{"year":1924,"month":3,"day":27}}}}]`
12 June 1723 - 26 September 1726: `[{"Range":{"start":{"Day":{"year":1723,"month":6,"day":12}},"end":{"Day":{"year":1726,"month":9,"day":26}}}}]`
14th century: `[{"Century":{"year":1300}}]`

I was hoping to use GPE to find a more optimised prompt by using my existing many-shot examples as test cases, and presenting what they should evaluate to, and then letting GPE/its GPT instances find a prompt that satisfies those test cases and evaluates to the same value without actually specifying each example.

Unfortunately, at the time of writing, GPE only comes in two flavours - the test cases with GPT evaluation and classification with multiple-choice answers.

Using the former, I was able to find a slightly more optimal prompt prelude, but the many-shot cases are still required. The classification flavour appears to be relatively coupled to multiple-choice evaluation, so it wouldn't work for me.

For my use case, I'd like a flavour in between: test cases with known solutions, where each prompt is graded in its ability to match the solution. I was considering hacking up the classification flavour, but I wasn't sure how best to adapt the prompt to handle this.

Is this something that you think would be feasible? I figure that this might come up in other contexts, too - being able to pass a set of input/output pairs to GPE and have it optimise for the best prompt would be wonderful!

More concretely: I'd like to pass in

test_cases = [
{ 'prompt': "The Bank at Burbank", 'output': '[]' },
{ 'prompt': "Red Bull Studios, AWOLSTUDIO, Avatar Studios, Main and Market, Gymnasium, Fireside Sound Studio", 'output': '[]' },
{ 'prompt': "1980s", 'output': '[{"Decade":{"year":1980}}]' },
{ 'prompt': "3000 BC", 'output': '[{"Year":{"year":-3000}}]' },
# ...
]

and have GPE optimise a prompt that produces the given output for a prompt.

Multi-line prompts are difficult to retrieve from the result table

While working on the problem described in #15, I completed one run and had a table produced of the results.

Unfortunately, the resulting prompts are multi-line and quite long, which makes the table rather unwieldy:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Prompt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |       Rating       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
|                                                                                                                                                                                                                                                                                                                              You have to analyze a given string to identify any dates it may contain and categorize the dates according to their granularity - millennium, century, decade, year, month, or day. Once you identify these dates, you need to convert them into a structured JSON array. Each element of the array should follow the given TypeScript schema, and should contain the identified date and optionally any relevant metadata from the string.                                                                                                                                                                                                                                                                                                                              | 1297.3925847169814 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                              Your task is to process the string in a way that considers all types of date representations. Certain dates may be ranges, for instance "1920-1973", which need to be separated into "start" and "end" and categorized respectively. For dates specified in multiple calendar systems, you should prioritize and return the date in the Gregorian (New Style) calendar. If a date only specifies one half of a range (e.g., "pre-1730"), you should ignore it. When the string does not contain any dates, you should return an empty array.                                                                                                                                                                                                                                                                             |                    |
|                                                                                                                                                                               Using the given description of the use-case, please parse and convert a string that may contain dates into a specific JSON array. Each element of the array should match the structure of the TypeScript schema provided: A string can be identified as either "Millennium", "Century", "Decade", "Year", "Month", "Day", "Range", "Ambiguous", or "Present". Each category should be followed by the corresponding year, month or day, as well as an optional metadata string. If the string contains no dates, you should return an empty array. Exclude dates that form one half of a range (for example "pre-1730"). If a date is given in multiple calendar systems, select the Gregorian/NS/New Style date and disregard the others.                                                                                                                                                                              | 1243.026647667641  |
|                                                                                                                                                                                                                                                                                                                                                             Your task is to analyze strings that may contain dates of various formats, such as single days, months, years, decades, centuries, millennia, ranges, or ambiguously defined time periods, and possibly non-date elements. You should return a JSON array with elements representing each date you recognize in the string. Each element must conform to a TypeScript schema:                                                                                                                                                                                                                                                                                                                                                             | 1234.4011803833216 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - "Millennium": { "year": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          - "Century": { "year": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          - "Decade": { "year": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           - "Year": { "year": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  - "Month": { "year": number, "month": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            - "Day": { "year": number, "month": number, "day": number, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   - "Range": { "start": Result, "end": Result, "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - "Ambiguous": Result[]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  - "Present": { "metadata"?: string }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                    Always prioritize the Gregorian/NS/New Style interpretation when multiple date systems are used. If there are no dates in the string, return an empty array. If a date is used to indicate one end of a range but the other end is not specified (e.g., "pre-2000"), do not create a "Range" object for it. Also, remember to handle ambiguous dates by creating an "Ambiguous" object with a list of plausible interpretations. Always remember to add additional non-date information in the string, if present, as metadata to the respective dating element.                                                                                                                                                                                                                                                                   |                    |
|                                                                                                                                                                                    Your task is to accurately interpret and convert a diverse set of date formats from a given string into structured JSON data. The JSON array you generate should consist of individual elements that follow specific schemas such as "Millennium", "Century", "Decade", "Year", "Month", "Day", "Range", "Ambiguous", and "Present". All of these schemas are associated with a particular year, and some additionally require month, day, or metadata. In the case of "Range", you'll be handling a start and end date which can be any of the other date schemas. If the input string has periods like "pre-700" only one part of which can be considered a valid date, ignore such instances and do not generate a 'Result'.                                                                                                                                                                                    | 1217.2035701506197 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                               For example, if a string has an ambiguous date, you should categorize it under the "Ambiguous" schema. If the string contains multiple representations of a date, stick to the Gregorian/New Style (N.S) version and discard the others.                                                                                                                                                                                                                                                                                                                                                                                                                                |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                            If the string doesn't contain any interpretable dates, return an empty array []. Treat non-date information in the input string as metadata associated with the date. Your challenge here is to use the entity representation to the best extent whilst maintaining accuracy, comprehensibility, and consistency in extracted date information.                                                                                                                                                                                                                                                                                                                                                                            |                    |
|                                                                                                                                                             Your task is to parse various string inputs for possible dates and time periods. You have to produce a JSON array where each element is categorized into either 'Millennium', 'Century', 'Decade', 'Year', 'Month', 'Day', 'Range', or 'Ambiguous'. Each category should have 'year' as a key and may include an optional 'metadata' key. The 'Range' category contains two 'Result' type objects in 'start' and 'end' keys, and an optional 'metadata'. 'Ambiguous' contains an array of 'Result' type objects. If the input has no date-like information, produce an empty array. If a date forms only part of a range, like 'pre-2020', neglect it. If a date is expressed in different calendars, use the Gregorian or New Style (NS) system and ignore the other systems.                                                                                                                                                            | 1214.5539587548917 |
|                                                      Your task involves interpreting various forms of date references from a given string and transcribing these into a specific JSON array format. The array components can be of distinct types: "Millennium", "Century", "Decade", "Year", "Month", "Day", "Range", "Ambiguous", and "Present", each with an associated 'year' attribute and an optional 'metadata'. For "Month" and "Day" types, additional 'month' and 'day' attributes are required. If the string contains a time span, represent it as a "Range" type, with 'start' and 'end' components, each having their own respective date type. If the date is unclear, encode it as "Ambiguous" with all possible interpretations. However, if a date is only part of a range (like "pre-1730"), do not convert it into the 'Result' format. For dates presented in multiple systems, focus on the Gregorian/New Style dates, disregarding the others. If there are no dates in the string, your output should be an empty array.                                                      | 1201.7252458028424 |
|                                                                                                                                                                                                                                                                                                                                                                                        Your task is to interpret a provided string which may contain dates. You have to generate a JSON array, where each element represents a date entity indicated in the string and fits into one of the defined TypeScript schema categories: Millennium, Century, Decade, Year, Month, Day, Range, Ambiguous or Present.                                                                                                                                                                                                                                                                                                                                                                                         | 1167.763957429434  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                           You have to provide year, month, or day depending on the category. Some dates might be associated with some non-date information (metadata), capture that too. However, if the input string contains no dates, you should return an empty array.                                                                                                                                                                                                                                                                                                                                                                                                                            |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                  In case of a date range, a 'start' and 'end' date should be provided, but ignore ranges that have only one end specified (i.e., "pre-1730"). If a date is presented in multiple calendar systems, rely on the Gregorian/New Style system while ignoring the others.                                                                                                                                                                                                                                                                                                                                                                                                                  |                    |
|                                                                                                                                                                                                                                                                                                                                                                                             Your task is to convert a given string into a JSON array consisting of elements that represent different timelines such as Millennium, Century, Decade, Year, Month, Day, and Range. The string can contain dates, time periods, or simply names of places or events, which you need to interpret correctly.                                                                                                                                                                                                                                                                                                                                                                                              | 1163.6446257993205 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                            For instance, if it's a specific year, you must identify it as a 'Year' with the corresponding 'year' detail. If a range of years is given, you should categorize it as 'Range' and provide 'start' and 'end' details respectively. Same goes for Millennium, Century, Decade, Month, and Day based on the information available in the string.                                                                                                                                                                                                                                                                                                                                                                            |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                               If the string has no dates, you are expected to return an empty array. Ignore any incomplete date range information like 'pre-1730'. In case the date is mentioned in different systems - like Old Style (O.S), New Style (N.S), or Gregorian - you should always prefer the Gregorian/NS/New Style date.                                                                                                                                                                                                                                                                                                                                                                                               |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                  Remember to include any other relevant metadata in the 'metadata' field if it's available in the string (like names of places). All your outputs should conform to the schema provided in the task: 'type Result' which is a combination of various timeline types.                                                                                                                                                                                                                                                                                                                                                                                                                  |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Note: The task does not require you to calculate or infer dates, just categorize and arrange the dates provided in the string into the appropriate format.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                    |
| As an AI, your task is to analyze a provided string which may contain dates and convert it into a structured JSON array. The dates could be in various forms, such as a specific day, month, year, decade, century, millennium, or a range of these. Each detected date should be converted into a JSON object fitting into one of the types defined in the TypeScript schema: "Millennium", "Century", "Decade", "Year", "Month", "Day", "Range", "Ambiguous", or "Present". Each type has a "year" key for the year of the date and may have a "metadata" key for extra information included in the string. "Month" and "Day" types also have "month" and "day" keys respectively, and the "Range" type has "start" and "end" keys each holding a JSON object of a date type. If a date is given in multiple date systems, only consider the Gregorian/New Style date, and ignore the others. If the string contains a date that is one half of a range such as 'pre-1730', it should not be included in the final output. In the end, if no date is detected in the string, return an empty array. | 1160.2919660816647 |
|                                                              Analyze the provided string and identify any potential date or time period information it may contain. For the found dates, classify them based on their granularity, such as "Millennium", "Century", "Decade", "Year", "Month", "Day", or a "Range" between two dates. Depending on the level of details, you should create a JSON object with the corresponding structure from the mentioned options. If the date could fall under multiple categories, create an "Ambiguous" entry with all possible interpretations. If the input string contains no identifiable date, output an empty array. For any string data associated with a date, include it as the "metadata" attribute of the corresponding JSON object. If a date is part of a range, but the other end of the range is missing (like "pre-1730"), do not generate an entry for it. When dates are provided in multiple calendar systems, prioritize the Gregorian/NS/New Style date and disregard others.                                                              | 1099.996263213283  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+

(apologies for pasting the whole thing in, but I wanted to demonstrate how unwieldy it was)

Instead of using a table for the prompts, I'd suggest printing them out separately, so that users can easily copy and paste from the result without having to manually remove the whitespace to extricate it from the table structure.

Alternatively, I'd suggest left-aligning on the table. While multi-line prompts would still be annoying (as you'd have to remove the table formatting yourself), at least you'd only have to remove whitespace from one side.

InvalidRequestError: The model: `gpt-4` does not exist

NOTE: I have a gpt-4 (paid) account

final cell returned:

generate_optimal_prompt(description, test_cases, NUMBER_OF_PROMPTS, use_wandb)

InvalidRequestError Traceback (most recent call last)
in <cell line: 1>()
----> 1 generate_optimal_prompt(description, test_cases, NUMBER_OF_PROMPTS, use_wandb)

6 frames
/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
761 stream_error = stream and "error" in resp.data
762 if stream_error or not 200 <= rcode < 300:
--> 763 raise self.handle_error_response(
764 rbody, rcode, resp.data, rheaders, stream_error=stream_error
765 )

InvalidRequestError: The model: gpt-4 does not exist

Is it possible to reduce the computational complexity?

The current complexity is a bit high: $O(m\cdot n^2)$
And only one optimal one can be selected from $n$ prompts

total_rounds = len(test_cases) * len(prompts) * (len(prompts) - 1) // 2

Is there a way to iterate prompt faster to get it to good shape?
Such as the use of randomization, generative adversarial or genetic algorithms?

Shall we add a function to calculate Elo scores of its generations and the prompts which we write

I'd like to compare prompts of gpt-prompt-engineer and my handwritten prompts, or just test the elo scores of what I have written. But when I wanna modify the function test_candidate_prompts, it seems like something wrong. Is there anyone has the same problem?

NameError: name 'wandb' is not defined

Hi, very interesting, thanks for sharing! I keep running into the error below

NameError Traceback (most recent call last)
in <cell line: 1>()
----> 1 generate_optimal_prompt(description, test_cases, 2, use_wandb)

in generate_optimal_prompt(description, test_cases, number_of_prompts, use_wandb)
119 wandb_table.add_data(prompt, rating)
120
--> 121 wandb.log({"prompt_ratings": wandb_table})
122 print(table)

NameError: name 'wandb' is not defined

Also, could you clarify how to use the 'test cases'?

Does set logit_bias to 100 lead the model choice between 'A' and 'B' essentially random?

logit_bias={
'32': 100, # 'A' token
'33': 100, # 'B' token
},
I asked GPT4 this issue, the following is its answer:

Use prompt errors to improve the prompting

Let's say a generated prompt gives an error on a test label.

What I sometimes do manually is I ask GPT4 to explain its reasoning, and argue that the correct label should have been FOO.

When it concedes that it was wrong, I then ask it to rewrite the prompt so that it would have gotten this test label correct.

That would be another approach, that is iterative, rather than just randomly picking 10 prompts.

This could be extended further, a little bit based upon the idea of genetic algorithms:

Take the initial prompt, run it, find the errors.
Pick 10 random errors and ask it to rewrite the prompt based upon each.
Ask it to combine different prompts.
Iterate

KeyError when accessing 'content' in API response at 12% progress

File:
claude_prompt_engineer.ipynb

Description:
When executing a specific part of the code that involves processing API responses, a KeyError is encountered, specifically when trying to access a 'content' key in the response dictionary. This issue consistently occurs approximately at 12% progress in our data processing workflow, indicating it may be related to the data returned by the API at that point.

Steps to Reproduce:

Execute the code as per the normal workflow.
Observe the progress until it reaches around 12%.
The KeyError is thrown, indicating the absence of the 'content' key in one of the API response dictionaries.

Expected Behavior:
The expectation is that the API response will always include a 'content' key, as the code relies on this key for further processing.

Actual Behavior:
At about 12% progress, the API response apparently lacks the 'content' key, leading to a KeyError. This suggests either an inconsistency in the API's response structure or an unhandled edge case in the data.

Troubleshooting Steps Attempted:

Verified that the issue consistently occurs at the same point in the data processing pipeline.
Checked for any conditional logic that could lead to this issue, but found none that applies.
Considered implementing a check for the 'content' key before accessing it, but I am seeking clarification on whether the API's response structure is guaranteed to include this key in all cases.

Questions/Clarifications Sought:

Is the 'content' key expected to be present in all API responses under normal circumstances?
Could there be specific conditions under which the API might not return this key in the response?
Would it be recommended to implement additional error handling to account for the possibility of this key's absence?

Additional Context:
This issue impacts a critical part of our data processing pipeline, and resolving it is crucial for the continuity of our operations. Any insights or recommendations on how to handle such cases or confirmations on the API's expected behavior would be greatly appreciated.

Thank you for your assistance and looking forward to your guidance on resolving this issue.

steam table

make steam table

JSONDecodeError

when I try to run the notebook I keep getting this error:

"JSONDecodeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 generate_optimal_prompt(description, input_variables, NUMBER_OF_TEST_CASES, NUMBER_OF_PROMPTS, use_wandb)

Cell In[7], line 182
179 if wandb.run is None:
180 start_wandb_run()
--> 182 test_cases = generate_test_cases(description, input_variables, num_test_cases)
183 prompts = generate_candidate_prompts(description, input_variables, test_cases, number_of_prompts)
184 print('Here are the possible prompts:', prompts)

Cell In[7], line 231
227 message = response.json()
229 response_text = message['content'][0]['text']
--> 231 test_cases = json.loads(response_text)
233 print('Here are the test cases:', test_cases)
235 return test_cases

File c:\Users\ABDUL RAHMAN\AppData\Local\Programs\Python\Python311\Lib\json_init_.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
341 s = s.decode(detect_encoding(s), 'surrogatepass')
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
...
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Unterminated string starting at: line 165 column 29 (char 5609)"

when the only thing I changed is the description and adding my API key thats it