GithubHelp home page GithubHelp logo

jesselau76 / ebook-gpt-translator Goto Github PK

View Code? Open in Web Editor NEW
1.6K 12.0 205.0 653 KB

Enjoy reading with your favorite style.

Home Page: https://jesselau.com

License: MIT License

Python 100.00%
epub pdf python translation translator docx mobi

ebook-gpt-translator's People

Contributors

ac1982 avatar jesselau76 avatar kagangtuya-star avatar mefengl avatar tanquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ebook-gpt-translator's Issues

openai.error.APIError

当出现openai.error.APIError: HTTP code 502 from API时,书籍已经翻译到第六章,那么后续该怎么处理才能不从头开始翻译呢?

正常运行但是什么都不出现啊

我正常安装,也设定了setting.cfg,在对应目录中运行text_translation.py,以后什么都没有,既没有生成文件,也没有报错啊???求问各位大佬是什么情况

翻译风格的prompt探讨

作者大大,你介绍中的那张翻译图,把英文翻译成了文言文,您是用了什么prompt?我看源码里的prompt只是让它扮演gpt4进行翻译,应该还有什么吧?

导入 ChatCompletion报错。

File "text_translation.py", line 229, in translate_text
completion = openai.ChatCompletion.create(
AttributeError: module 'openai' has no attribute 'ChatCompletion'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "text_translation.py", line 363, in
translated_short_text = translate_and_store(short_text)
File "text_translation.py", line 281, in translate_and_store
translated_text = translate_text(text)
File "text_translation.py", line 253, in translate_text
completion = openai.ChatCompletion.create(
AttributeError: module 'openai' has no attribute 'ChatCompletion'

怎么解决啊大佬。

可选更大的分割长度

现在gpt3.5已经有16k模型了,考虑到个人用户的api被限制在3次/分钟,更大的分割长度会提高翻译速度

此处代码是否需要修改?

text_translation.py 中的第 300 行:
def return_text(text): text = text.replace(".", ".\n")

是否应该将text.replace(".", ".\n") 改为text.replace(". ", ".\n")
因为 . 不一定表示英文中的句号,也有可能出现在数字(如3.14)或者代码中(如text.replace);
在 . 后面加上空格,才能准确地对应英文中的句号。

不知我的想法是否正确,望解答~

一直报错

Traceback (most recent call last):
File "/Users/cellier/ebook-GPT-translator/text_translation.py", line 121, in
config_text = f.read()
UnicodeDecodeError: 'gb2312' codec can't decode byte 0x81 in position 167: illegal multibyte sequence

txt翻译成epub以后没有内容

你好,我使用该程序翻译了一个txt文件,文件中全是英文内容。最后翻译出来是这样的
image

这里没有内容,我会自己尝试调试一下,看看具体有什么问题。文本确实很长,三万多个字符。

Missing deps

ModuleNotFoundError: No module named 'pdfminer' so I run pip install pdfminer
Then ModuleNotFoundError: No module named 'pdfminer.high_level'
Have you tested it on a new machine which doesn't have any python modules?

报错,一次翻译的词数太多

报错信息:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 29968 tokens. Please reduce the length of the messages.
image

我数了了下,报错那次一共翻译了文本的1952到7285排共5333排的内容
我看了下代码,说是每次限制了长度1024,但是对于短文本的处理依然有问题啊,根本没能限制长度,为什么短文本要放在一起翻译而不是遍历每一排来翻译呢?

如果使用GPT4模型?

我的API已经通过GPT-4白名单,我在哪里设置可以将默认gpt-3.5模型修改为gpt-4

处理PDF文件时遇到了无效的交叉引用(XRef)表

解析这个optimized过的pdf报错, 在deepl里面是可以正常处理的。
https://assets.ctfassets.net/95kuvdv8zn1v/44FqPJmYPZRwiZN2socdOK/14f5eb025d87a452100d80f513567f2a/Cruise_Impact_Report_-_2022-optimized.pdf

Converting PDF to text:   0% 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 722, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 1000, in read_xref_from
    xref.load(parser)
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 282, in load
    raise PDFNoValidXRef("Invalid PDF stream spec.")
pdfminer.pdfdocument.PDFNoValidXRef: Invalid PDF stream spec.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 347, in <module>
    text = convert_pdf_to_text(filename,startpage,endpage)
  File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 221, in convert_pdf_to_text
    end_page = get_total_pages(pdf_filename)
  File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 217, in get_total_pages
    document = PDFDocument(parser)
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 727, in __init__
    newxref.load(parser)
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 241, in load
    (_, obj) = parser.nextobject()
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 609, in nextobject
    (pos, token) = self.nexttoken()
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 526, in nexttoken
    self.fillbuf()
  File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 239, in fillbuf
    raise PSEOF("Unexpected EOF")
pdfminer.psparser.PSEOF: Unexpected EOF

No module named 'chardet',requirements.txt 文件的内容是不是要加上一个chardet

python 版本 3.10.1
python pip install -r requirements.txt
首次运行报错,然后报错,提示

Traceback (most recent call last):
  File "C:\Users\ebook-GPT-translator\text_translation.py", line 114, in <module>
    import chardet
ModuleNotFoundError: No module named 'chardet'

于是再安装 python -m pip install chardet, 就可以了,看来 requirements.txt 的内容要更新。

但是运行还有一些警告

C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (5.1.0)/charset_normalizer (2.0.10) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "

看起来不影响使用。

使用Pycharm报错:Unsupported file type

image
目前控制台显示是这样的。
那些包也全都安装了呢。
API-key用到是openAI官网的,代理地址是随便找了一个,不知道哪里出问题了。

token 长度问题

Traceback (most recent call last):
File "/content/ebook-GPT-translator/ebook-GPT-translator/pdf-epub-GPT-translator/text_translation.py", line 515, in
translated_short_text = translate_and_store(short_text)
File "/content/ebook-GPT-translator/ebook-GPT-translator/pdf-epub-GPT-translator/text_translation.py", line 355, in translate_and_store
translated_text = translate_text(text)
File "/content/ebook-GPT-translator/ebook-GPT-translator/pdf-epub-GPT-translator/text_translation.py", line 335, in translate_text
completion = create_chat_completion(prompt, text)
File "/content/ebook-GPT-translator/ebook-GPT-translator/pdf-epub-GPT-translator/text_translation.py", line 154, in create_chat_completion
return openai.ChatCompletion.create(
File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4291 tokens. Please reduce the length of the messages.

可参数可配置吗?

针对Connection aborted的优化

对于长文档,比如600页的PDF,每次运行都会遇到

openai.error.APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

针对Connection aborted,能否添加一个参数,断开自动重试,并将重试的startpage设置为上次断掉的页面

Fail with invalid url

Fail with following error. Is that a configuration issue?

Invalid URL (POST /v1/chat/completions) will sleep  60 seconds
  0%|                                                                                                                                                                                                                                                     | 0/3 [01:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/xxxx/workspace/ebook-GPT-translator/text_translation.py", line 229, in translate_text
    completion = openai.ChatCompletion.create(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
                           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
  File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 682, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: Invalid URL (POST /v1/chat/completions)

功能请求:一种可能的译名表实现方法?

希望有一个译名表的功能,在尝试翻译一些有专业术语的文本时,gpt会根据自己的理解来对专有名词进行翻译,还需要自行改动。

这里有两个可能的译名表实现方法。
增加一个指定译名表的参数,在拆分之后,请求api之前对需要翻译的内容参考译名表预先的替换,然后给gpt喂进去时多一句相关的描述“你不能替换其中[语言类型]的专有名词”;另一种是其余不变,在喂进去时增加“xxx应被译为xxx”这样的补充项

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.