GithubHelp home page GithubHelp logo

forestwanglin / openai-java Goto Github PK

View Code? Open in Web Editor NEW
56.0 56.0 9.0 3.44 MB

OpenAi GPT API for Java. Including all API from OpenAI except deprecated. It especially includes stream client and jtokkit with function calculation. Including Baidu AI.

License: MIT License

Java 100.00%
baidu chatgpt gpt-35 gpt-4 java openai openai-api openai-chatgpt

openai-java's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openai-java's Issues

AudioTranscription throws Unrecognized file format

Hello,
According to the documentation, the Audio API is compatible. However, upon utilizing the Java API for Audio Transcription, I encounter an error stating "Unrecognized file format." I attempted this with various file formats such as m4a, mp3, mp4, and wav.

CreateAudioTranscriptionRequest request = CreateAudioTranscriptionRequest.builder().model("whisper-1") .filePath(filePath).language("en").responseFormat(ResponseFormat.TEXT).build(); AudioResponse response = new OpenAiService(token, Duration.ZERO).createAudioTranscription(request);

Exception in thread "main" xyz.felh.openai.OpenAiHttpException: Unrecognized file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']
at xyz.felh.openai.OpenAiService.execute(OpenAiService.java:128)
at xyz.felh.openai.OpenAiService.createAudioTranscription(OpenAiService.java:461)
at openai.audio.SpeechToText.main(SpeechToText.java:40)
Caused by: retrofit2.adapter.rxjava3.HttpException: HTTP 400
at retrofit2.adapter.rxjava3.BodyObservable$BodyObserver.onNext(BodyObservable.java:57)
at retrofit2.adapter.rxjava3.BodyObservable$BodyObserver.onNext(BodyObservable.java:38)
at retrofit2.adapter.rxjava3.CallEnqueueObservable$CallCallback.onResponse(CallEnqueueObservable.java:62)
at retrofit2.OkHttpCall$1.onResponse(OkHttpCall.java:161)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)

Is there any resolution available for this problem? Even a temporary workaround would be appreciated.

Token Count Difference (Followup)

The token count for Tools is still having a difference of 1.
Request Body:

{
	"model": "gpt-4",
	"messages": [
		{
			"role": "assistant",
			"content": null,
			"tool_calls": [
				{
					"id": "call_Id8ycVMsW8gdsf7kSXfgAcf1",
					"type": "function",
					"function": {
						"name": "get_current_weather",
						"arguments": "{\n  \"location\": \"Boston, MA\"\n}"
					}
				}
			]
		},
		{
			"role": "tool",
			"tool_call_id": "call_Id8ycVMsW8gdsf7kSXfgAcf1",
			"name": "get_current_weather",
			"content": "29 degree celcius"
		}
	]
}

My count: 34
OpenAI Count: 35

My guess is that even "tool" role requires 3 tokens.

Thanks

Parallel function calling多个tool返回,token计算有误

使用Parallel function calling,会返回多个role为tool的message。
调用模型为gpt-4-1106-preview

第一次tool calling 先不讨论,因为一般人不会用流式调用第一次tool calls,

第二次tool calls调用prompt比官方多了1。
例子:

{
    "messages": [
        {
            "role": "system",
            "content": "你是一个数学达人"
        },
        {
            "role": "user",
            "content": "两道计算题 20+70; 30x100"
        },
        {
            "role": "assistant",
            "content": null,
            "tool_calls": [
                {
                    "id": "call_ouziXPZBrGmtdxh5BnBooKI3",
                    "type": "function",
                    "function": {
                        "name": "plus",
                        "arguments": "{\"numbers\": [20, 70]}"
                    }
                },
                {
                    "id": "call_jxP70EbkqL5L78EMVK76jFDh",
                    "type": "function",
                    "function": {
                        "name": "product",
                        "arguments": "{\"numbers\": [30, 100]}"
                    }
                }
            ]
        },
        {
            "tool_call_id": "call_ouziXPZBrGmtdxh5BnBooKI3",
            "role": "tool",
            "name": "plus",
            "content": "{\"numbers\": [20, 70], \"result\": \"90\"}"
        },
        {
            "tool_call_id": "call_jxP70EbkqL5L78EMVK76jFDh",
            "role": "tool",
            "name": "product",
            "content": "{\"numbers\": [30, 100], \"result\": \"30000\"}"
        }
    ]
}

OpenAI返回

"usage": {
    "prompt_tokens": 121,
    "completion_tokens": 29,
    "total_tokens": 150
  }

本库返回
prompt_tokens 122

这个库在chat模型的token计算上有诸多错误

与openai返回的token对比发现,几乎各chat模型都有计算方式错误或结果偏差,于是我自己从零建模和编写了token计算工具

你的部分token计算代码有严重错误,这里列举部分:

  • vision base64读取
  • vision图片缩放
  • function call、functions
  • tool call、tools
  • 0301/0314与0613版本计算差异
  • 多个参数(model、functions)组合使用对token计算的影响
  • 应使用encodeOrdinary来跳过special tokens

由于精力有限我无法在开源代码上提交修改,本issue只是告知绝大部分token计算都有误,请你自己有精力时研究下

Java 8 is not supported jtokkit

Great project i can't use because i use Java 8:

java: cannot access xyz.felh.openai.chat.ChatMessage
bad class file: lib/jtokkit/core-4.0.2024080901.jar!/xyz/felh/openai/chat/ChatMessage.class
class file has wrong version 65.0, should be 52.0
Please remove or make sure it appears in the correct subdirectory of the classpath.

Token Count Difference

Thanks for implementing token count logic in Java, this is very helpful!

I found a few cases in which the count calculated was different from the actual OpenAI count. While I know this is an estimation, it would be good if you could check the logic.

  1. When an integer enum is passed as a function parameter, 1 less token is counted.
  2. When the message contains role assistant with tool_calls and a subsequent message with role tool, 6 extra tokens are counted.

If you want, I can provide examples for the above.
Thanks!

Problem with tokens counter

when I count the number of tokens, this message is cut off after formatting and is not fully counted

{
            "role": "tool",
            "content": "{\"items\":[{\"title\":\"#MINSK NIGHTLIFE TOUR / BELARUS AFTER SANCTIONS JUNE ...\", \"link\":\"https://www.youtube.com/watch?v=d6te-xePaj0\", \"snippet\":\"Jun 26, 2022 ... Anfisa BELARUS•66K views · 16:51 · Go to channel · MANHATTAN NIGHTLIFE AREAS - PACKED BARS \u0026 CLUBS Summer Update【ENTIRE TOUR】Best ...\"}, {\"title\":\"THE 10 BEST Nightlife Activities in Minsk (Updated 2024) - Tripadvisor\", \"link\":\"https://www.tripadvisor.com/Attractions-g294448-Activities-c20-Minsk.html\", \"snippet\":\"Results 1 - 30 of 77 ... These places are best for nightlife in Minsk: RetravelMe Belarus · Party Bus · HookahPlace Yakuba Kolasa · Nuahule Krasnaya · Dictator Bar.\"}, {\"title\":\"MINSK AT NIGHT WITH @IrishPartizan - YouTube\", \"link\":\"https://www.youtube.com/watch?v=1siveAZHym0\", \"snippet\":\"Aug 5, 2023 ... scene, this video is your ultimate guide to the hottest night spots in Minsk ... MINSK NIGHTLIFE AFTER SANCTIONS / BEST 5 NIGHT BARS IN MINSK.\"}]}",
            "name": "http",
            "tool_call_id": "call_ZPSUPvPgBhNMSzI2mXDW36XZ"
        }
public static String formatArguments(String arguments) {
        List<String> lines = new ArrayList<>();
        lines.add("{");
        JSONObject jsonObject = JSONObject.parseObject(arguments);
        List<String> properties = new ArrayList<>();
        for (String fieldName : jsonObject.keySet()) {
            properties.add(String.format("\"%s\":%s", fieldName, formatValue(jsonObject.get(fieldName))));
        }
        lines.add(String.join(",\n", properties));
        lines.add("}");
        return String.join("\n", lines);
    }

Output:
"{
"items":["","",""]
}"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.