GithubHelp home page GithubHelp logo

opencc-python's Introduction

開放中文轉換(Pure Python)

Open Chinese convert (OpenCC) in pure Python.

Introduction 簡介

opencc-python 是用純 Python 所寫,使用由 BYVoid([email protected]) 所開發的 OpenCC 中的字典檔案。 opencc-python 可以支援 Python2.7 及 Python3.x。

opencc-python is made by pure Python with the dictionary files of OpenCC which is developed by BYVoid([email protected]).

opencc-python can run with Python2.7 and Python3.x.

Installation 安裝

opencc 這個目錄複製到你正在開發的專案中即可,或是執行(需要管理者權限):

python setup.py install

套件也可從 PyPI 安裝,使用指令:

pip install opencc-python-reimplemented

Copy the opencc folder to your project, or run (admin required)

python setup.py install

The package can also be installed from PyPI by issuing:

pip install opencc-python-reimplemented

Usage 使用方式

Code

from opencc import OpenCC
cc = OpenCC('s2t')  # convert from Simplified Chinese to Traditional Chinese
# can also set conversion by calling set_conversion
# cc.set_conversion('s2tw')
to_convert = '开放中文转换'
converted = cc.convert(to_convert)

Command Line

usage: python -m opencc [-h] [-i <file>] [-o <file>] [-c <conversion>]
                        [--in-enc <encoding>] [--out-enc <encoding>]

optional arguments:
  -h, --help            show this help message and exit
  -i <file>, --input <file>
                        Read original text from <file>. (default: None = STDIN)
  -o <file>, --output <file>
                        Write converted text to <file>. (default: None = STDOUT)
  -c <conversion>, --config <conversion>
                        Conversion (default: None)
  --in-enc <encoding>   Encoding for input (default: UTF-8)
  --out-enc <encoding>  Encoding for output (default: UTF-8)

example with UTF-8 encoded file:

  python -m opencc -c s2t -i my_simplified_input_file.txt -o my_traditional_output_file.txt

See https://docs.python.org/3/library/codecs.html#standard-encodings for list of encodings.

Conversions 轉換

  • hk2s: Traditional Chinese (Hong Kong standard) to Simplified Chinese

  • s2hk: Simplified Chinese to Traditional Chinese (Hong Kong standard)

  • s2t: Simplified Chinese to Traditional Chinese

  • s2tw: Simplified Chinese to Traditional Chinese (Taiwan standard)

  • s2twp: Simplified Chinese to Traditional Chinese (Taiwan standard, with phrases)

  • t2hk: Traditional Chinese to Traditional Chinese (Hong Kong standard)

  • t2s: Traditional Chinese to Simplified Chinese

  • t2tw: Traditional Chinese to Traditional Chinese (Taiwan standard)

  • tw2s: Traditional Chinese (Taiwan standard) to Simplified Chinese

  • tw2sp: Traditional Chinese (Taiwan standard) to Simplified Chinese (with phrases)

Issues 問題

當轉換有兩個以上的字詞可能時,程式只會使用第一個。

When there is more than one conversion available, only the first one is taken.

opencc-python's People

Contributors

chetien avatar cologler avatar david30907d avatar eugene-work avatar hopkins1 avatar lchris314 avatar seanwu1105 avatar urain39 avatar yichen0831 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opencc-python's Issues

一對多的問題

我試著用opencc將繁簡夾雜的文件全部轉為繁體,遇到一對多的問題,
例如 這個字,這個單字本身在繁體中就是有意義的,
所以轉換成 或者 都有道理,視前後文而定,
但opencc遇到這個單字單獨出現的時候卻無論如何都把他轉成
是不是能夠增加一個模式,就是一對多的情況下,遇到phrase才進行轉換,否則略過?
例如:

来,我買了"Whoo 漢方精萃純露甦活洗髮精 SPA Essence Shampoo"

可以轉換為:

來,我買了"Whoo 漢方精萃純露甦活洗髮精 SPA Essence Shampoo"

memoryerror

请问出现这个是什么问题?感谢
python -m opencc -i wiki.zh.text -o wiki.zh.text.jian -c t2s
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\ProgramData\Anaconda3\lib\site-packages\opencc_main
.py", line 41, in
sys.exit(main())
File "D:\ProgramData\Anaconda3\lib\site-packages\opencc_main
.py", line 31, in main
input_str = f.read()
File "D:\ProgramData\Anaconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

pip install fail

感謝大大的專案
不過pip安裝噴錯了
在此附上錯誤訊息

root@b1dcb7389a1f:/code# pip3 install opencc-python-reimplemented
Collecting opencc-python-reimplemented
  Using cached opencc-python-reimplemented-0.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-a4wg0psa/opencc-python-reimplemented/setup.py", line 7, in <module>
        with open(path.join(cwd, 'README.md'), encoding='utf-8') as f:
      File "/usr/lib/python3.5/codecs.py", line 895, in open
        file = builtins.open(filename, mode, buffering)
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-build-a4wg0psa/opencc-python-reimplemented/README.md'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-a4wg0psa/opencc-python-reimplemented/

opencc-python Conversion Does Not Match OpenCC

When running a conversion of "s2twp", the results for opencc-python do not always match those for OpenCC. For example:
OpenCC: "一干 " -> "一干 "
opencc-python: "一干 " -> "一幹 "

Note:
It appears that the opencc-python conversion chain does not honor "group" tag in the configuration file.
The chain is [TWVariantsRevPhrases.txt, TWVariantsRev.txt, TWPhrasesRev.txt, TSPhrases.txt, TSCharacters.txt]
The chain should be [[TWVariantsRevPhrases.txt, TWVariantsRev.txt], TWPhrasesRev.txt, [TSPhrases.txt, TSCharacters.txt]]

I've made changes to example.py and opencc.py appear to fix the problem The implementation is ~6x faster. Because of the large changes, I've decided to just attach the modified files rather than try creating a branch.

example.py.zip

opencc.py.zip

内存转为繁体

不管用什么模式,结果都是内存。

按道理会有一些是'記憶體’

RecursionError: maximum recursion depth exceeded while calling a Python object

What could cause the following recursion limit error when executing opencc? Thank you.

/usr/local/anaconda3/bin/python -m opencc -c s2t -i testset/zhidao.test.json -o zhidao.test_tw.jsonn
Traceback (most recent call last):
  File "/usr/local/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/__main__.py", line 41, in <module>
    sys.exit(main())
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/__main__.py", line 32, in main
    output_str = cc.convert(input_str)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 72, in convert
    result.append(self._convert(split_string_list[i], self._dict_chain_data))
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 102, in _convert
    tree = StringTree(self._convert("".join(tree.inorder()), c_dict, True))
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 94, in _convert
    tree.convert_tree(c_dict)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 221, in convert_tree
    self.right.convert_tree(test_dict)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 221, in convert_tree
    self.right.convert_tree(test_dict)
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 221, in convert_tree
    self.right.convert_tree(test_dict)
  [Previous line repeated 986 more times]
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 220, in convert_tree
    self.right = StringTree(self.string[i+test_len:])
  File "/usr/local/anaconda3/lib/python3.7/site-packages/opencc_python_reimplemented-0.1.5-py3.7.egg/opencc/opencc.py", line 189, in __init__
    self.string_len = len(string)
RecursionError: maximum recursion depth exceeded while calling a Python object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.