Comments (4)
This is a misunderstanding of the way the model works. Models are 'stateless', similar to HTTP requests.
From https://platform.openai.com/docs/guides/gpt/chat-completions-api
Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. If a conversation cannot fit within the modelβs token limit, it will need to be shortened in some way.
If you're going to use the API, it definitely pays to study the documentation and understand both how it to interact with it, and what parameters are available to affect its behavior.
Your ability to have endless conversations with ChatGPT is just that app managing the conversation history for you. How it does that is internal to the app -- it could just be silently cutting off messages, or collecting older messages and rewriting them to a single summary message, or something else.
LWE provides you visibility into token consumption as a conversation progresses:
The first number is the current 'temperature', the second number is the token limit of the current model (both input and output), and the third number is the total number of tokens in the current conversation's history.
In that example pictured above, if I were to simply send 'hello' as the next message, that request would consume approximately 2649 tokens (2648 to pass in the conversation history, 1 for the word 'hello')
This is an important thing to keep in mind if you care about cost. As you can see token consumption in a conversation is NOT linear. My limited mathematical understanding is that total token use in a conversation would progress more quadratically than linearly.
The worst thing to do if you care about cost is to have a lot of very long conversations. So in your example, when you kept continuing the conversation, and LWE kept truncating messages, each turn was costing you nearly 8000 tokens!
Let's look at that warning message you got as an example:
Conversation exceeded max submission tokens (8192), stripped out 5 oldest messages before sending, sent 8171 tokens instead
So to send that one question, however long it was, cost 8171 tokens for the input. The response from GPT-4 was cut off because there were almost no available tokens left in its context window for the response (8192 - 8171 = 21 tokens approximately).
So yes, with the ChatGPT app, ignorance is bliss. Not so with the API, you have to remain aware of usage, and how it works.
However, I can tell you that I ONLY use the API for conversion with GPT-4. I remain aware of conversation length, keep my individual conversations focused, summarize when necessary, etc. It's really not that hard once you understand how it operates. This last month I was even running some automated testing using GPT-4, and my bill was still under $40 for the month. And for that I have no message limits, and a HIGH degree of control over the model's output.
Hope this explanation helps.
from llm-workflow-engine.
Thank you.
I appreciate the quality of your explanation -- your answer helped me understand.
I am planning a series of multi-hour conversations with the gpt4.
If I do what I think I was going to do with the OpenAI API, it will cost many hundreds if not thousands of dollars per month.
Is it possible to access the ChatGPT Plus
Application via lwe?
The 50 messages per 3 hours cap is okay with me as long as the cost per month is fixed.
Currently, the only way I know of to do this is within the browser.
Is it possible to do it with the command line?
from llm-workflow-engine.
I am planning a series of multi-hour conversations with the gpt4.
My previous explanation was meant to highlight that this is an illusion. The context window is 8192 tokens for input plus output. When you get beyond that, you are losing context. Just because the ChatGPT app hides it from you does not mean it's not happening.
Two sides comments about my last claim:
- It's theoretically possible the GPT-4 that the ChatGPT Plus app is using is their 32K context model, but that seems extremely unlikely.
- It's also possible that you're satisfied with however the ChatGPT app is handling things when you go beyond the context window. But you also have to be OK not knowing what it's doing, or exactly what context you're losing and how.
My strong advice for you is to rethink what you're trying to accomplish now that you better understand the limitations you're working with.
Is it possible to access the
ChatGPT Plus
Application via lwe?
See here
There also may be other open source apps that wrap the ChatGPT app. Many of the original ones were broken by OpenAI's security changes, including, finally, ours. They don't want people programmatically accessing their web app -- it's probably against their terms of service, too.
from llm-workflow-engine.
Thanks for this explanation -- it helped me understand much more deeply.
I appreciate your help πΏπΏ All the best π
from llm-workflow-engine.
Related Issues (20)
- Option to disable the metadata after chat response? HOT 1
- Discard history before a certain point? HOT 4
- Add support for max_submission_tokens to presets HOT 3
- Failed to Read Response HOT 5
- How to load third party models? HOT 2
- How to change default model? HOT 3
- OpenAI SDK update to v1.x breaks LWE HOT 2
- /examples not found in path when I try to install examples (on windows) HOT 3
- lwe config leads to error HOT 7
- bard support? HOT 2
- No module named 'lwe.core.constants' HOT 2
- ERROR: Invalid pattern is specified in "path:pattern" HOT 11
- No Windows console found. Are you running cmd.exe? HOT 9
- Setup lwe with non-default provider HOT 4
- `__call__` was deprecated in LangChain 0.1.7
- Improve lwe startup time HOT 2
- How do i change the model? HOT 2
- Adding Ollama provider Python API HOT 5
- Preset migration of functions to tools
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-workflow-engine.