GithubHelp home page GithubHelp logo

Comments (4)

thehunmonkgroup avatar thehunmonkgroup commented on June 12, 2024

This is a misunderstanding of the way the model works. Models are 'stateless', similar to HTTP requests.

From https://platform.openai.com/docs/guides/gpt/chat-completions-api

Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.

If you're going to use the API, it definitely pays to study the documentation and understand both how it to interact with it, and what parameters are available to affect its behavior.

Your ability to have endless conversations with ChatGPT is just that app managing the conversation history for you. How it does that is internal to the app -- it could just be silently cutting off messages, or collecting older messages and rewriting them to a single summary message, or something else.

LWE provides you visibility into token consumption as a conversation progresses:

2023-08-19_12-03

The first number is the current 'temperature', the second number is the token limit of the current model (both input and output), and the third number is the total number of tokens in the current conversation's history.

In that example pictured above, if I were to simply send 'hello' as the next message, that request would consume approximately 2649 tokens (2648 to pass in the conversation history, 1 for the word 'hello')

This is an important thing to keep in mind if you care about cost. As you can see token consumption in a conversation is NOT linear. My limited mathematical understanding is that total token use in a conversation would progress more quadratically than linearly.

The worst thing to do if you care about cost is to have a lot of very long conversations. So in your example, when you kept continuing the conversation, and LWE kept truncating messages, each turn was costing you nearly 8000 tokens!

Let's look at that warning message you got as an example:

Conversation exceeded max submission tokens (8192), stripped out 5 oldest messages before sending, sent 8171 tokens instead

So to send that one question, however long it was, cost 8171 tokens for the input. The response from GPT-4 was cut off because there were almost no available tokens left in its context window for the response (8192 - 8171 = 21 tokens approximately).

So yes, with the ChatGPT app, ignorance is bliss. Not so with the API, you have to remain aware of usage, and how it works.

However, I can tell you that I ONLY use the API for conversion with GPT-4. I remain aware of conversation length, keep my individual conversations focused, summarize when necessary, etc. It's really not that hard once you understand how it operates. This last month I was even running some automated testing using GPT-4, and my bill was still under $40 for the month. And for that I have no message limits, and a HIGH degree of control over the model's output.

Hope this explanation helps.

from llm-workflow-engine.

klebs6 avatar klebs6 commented on June 12, 2024

Thank you.

I appreciate the quality of your explanation -- your answer helped me understand.

I am planning a series of multi-hour conversations with the gpt4.

If I do what I think I was going to do with the OpenAI API, it will cost many hundreds if not thousands of dollars per month.

Is it possible to access the ChatGPT Plus Application via lwe?

The 50 messages per 3 hours cap is okay with me as long as the cost per month is fixed.

Currently, the only way I know of to do this is within the browser.

Is it possible to do it with the command line?

from llm-workflow-engine.

thehunmonkgroup avatar thehunmonkgroup commented on June 12, 2024

I am planning a series of multi-hour conversations with the gpt4.

My previous explanation was meant to highlight that this is an illusion. The context window is 8192 tokens for input plus output. When you get beyond that, you are losing context. Just because the ChatGPT app hides it from you does not mean it's not happening.

Two sides comments about my last claim:

  1. It's theoretically possible the GPT-4 that the ChatGPT Plus app is using is their 32K context model, but that seems extremely unlikely.
  2. It's also possible that you're satisfied with however the ChatGPT app is handling things when you go beyond the context window. But you also have to be OK not knowing what it's doing, or exactly what context you're losing and how.

My strong advice for you is to rethink what you're trying to accomplish now that you better understand the limitations you're working with.

Is it possible to access the ChatGPT Plus Application via lwe?

See here

There also may be other open source apps that wrap the ChatGPT app. Many of the original ones were broken by OpenAI's security changes, including, finally, ours. They don't want people programmatically accessing their web app -- it's probably against their terms of service, too.

from llm-workflow-engine.

klebs6 avatar klebs6 commented on June 12, 2024

Thanks for this explanation -- it helped me understand much more deeply.
I appreciate your help 🌿🌿 All the best πŸ˜ƒ

from llm-workflow-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.