GithubHelp home page GithubHelp logo

cjh0613 / userdefinedphraser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kyan001/userdefinedphraser

0.0 0.0 0.0 53 KB

Convert pre-defined User Defined Phrases(UDP) to supported format for Win10 Pinyin IME, macOS Pinyin IME (+iOS/iPadOS), QQPinyin. Also generate HTML and JSON file for further usage.

License: MIT License

Python 92.14% HTML 7.86%

userdefinedphraser's Introduction

UserDefinedPhraser

Doc Version: 1.1.1-20200531

๐Ÿ‡บ๐Ÿ‡ธ English | ๐Ÿ‡จ๐Ÿ‡ณ ็ฎ€ไฝ“ไธญๆ–‡

  • Convert pre-defined User Defined Phrases(UDP) to supported format for Win10 Pinyin IME, macOS Pinyin IME (+iOS/iPadOS), QQPinyin. Also generate HTML and JSON file for further usage.

Basics

  • Based on the .json file as the input, convert to other formats.

Usage

# Quick Start
python3 run_parser.py
|-- Phrasers/  # Parser classes for decode from target format to python dict and encode python dict to target format.
    |-- phraser.py  # Base class for all the phraser classes.
    |-- macphraser.py  # Parse macOS `.plist` file.
    |-- jsonphraser.py  # Parse `.json` file.
    |-- msphraser.py  # Parse Win10 Pinyin IME `.dat` file.
    |-- txtphraser.py  # Parse QQPinyin `.ini` file.
    |-- htmlpharser.py  # Generate `.html` file.
    |-- htmlphraser_tpl.py  # Template for `.html` file generation.
|-- Phrases/  # User Defined Phrases in JSON format, as the input to conversions.
    |-- UDP-*.json
|-- GeneratedUDP/  # This Folder holds the generated files. You can delete these files any time, they are not important.
|-- run_parser.py  # Main entry of program. Convert `.json` files to other formats.
  • All Python Dict and JSON format is: { 'phrase': "<PHRASE>", 'shortcut': "<SHORTCUT>" }
  • *Phraser classes include to_file(), from_file(), to_format*(), from_format*() functions. They are used for read/write files and read/write formatted strings.

Microsoft Pinyin IME

Operations

Delete

  1. System Settings โ†’ Time and Languages โ†’ Region and Languages โ†’ Chinese โ†’ Preferences โ†’ Microsoft Pinyin โ†’ Preferences
  2. Lexicon and self-learning โ†’ Add or Edit User Defined Phrases โ†’ Clear

Add

  1. System Settings โ†’ Time and Languages โ†’ Region and Languages โ†’ Chinese โ†’ Preferences โ†’ Microsoft Pinyin โ†’ Preferences
  2. Lexicon and self-learning โ†’ Add or Edit User Defined Phrases โ†’ Import
  3. UserDefinedPhrase.dat

Format

  • File suffix: .dat or .lex.
  • Use mschxudp for formatting. Update with system update.

References

File Example

# win10 1703
#           proto8                   unknown_X   version
# 00000000  6d 73 63 68 78 75 64 70  02 00 60 00 01 00 00 00  |mschxudp..`.....|
#           phrase_offset_start
#                       phrase_start phrase_end  phrase_count
# 00000010  40 00 00 00 48 00 00 00  98 00 00 00 02 00 00 00  |@...H...........|
#           timestamp
# 00000020  49 4e 06 59 00 00 00 00  00 00 00 00 00 00 00 00  |IN.Y............|
# 00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
#                                                      candidate2
#           phrase_offsets[]         magic_X     phrase_offset2
# 00000040  00 00 00 00 24 00 00 00  10 00 10 00 18 00 06 06  |....$...........|
#           phrase_unknown8_X        pinyin
# 00000050  00 00 00 00 96 0a 99 20  61 00 61 00 61 00 00 00  |....... a.a.a...|
#           phrase                               magic_X
# 00000060  61 00 61 00 61 00 61 00  61 00 00 00 10 00 10 00  |a.a.a.a.a.......|
#                       phrase_unknown8_X
#                 candidate2
#           offset2                        pinyin
# 00000070  1a 00 07 06 00 00 00 00  a6 0a 99 20 62 00 62 00  |........... b.b.|
#                             phrase
# 00000080  62 00 62 00 00 00 62 00  62 00 62 00 62 00 62 00  |b.b...b.b.b.b.b.|
# 00000090  62 00 62 00 62 00 00 00                           |b.b.b...|
  • proto8: 'mschxudp'
  • phrase_offset_start + 4 * phrase_count == phrase_start
  • phrase_start + phrase_offsets[N] == magic(0x00080008)
  • pinyin&phrase: utf16-le string
  • hanzi_offset = 8 + len(pinyin)
  • phrase_offsets[N] + offset + len(phrase) == phrase_offsets[N+1]
  • candidate2: 1st byte represent the phrase position

macOS Pinyin IME

Operations

Delete

  1. System Preferences โ†’ Keyboard โ†’ Text
  2. Select any, โŒ˜A, click - or โŒซ/delete

Add

  1. System Preferences โ†’ Keyboard โ†’ Text
  2. Drag *.plist into the window (one by one).
  • Existing phrases will not duplicated, it's smart.

Format

  • .plist file with xml format.

File Example

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><?xml version="1.0" ?>
<plist version="1.0">
    <array>
        <dict>
            <key>phrase</key>
            <string>[word]</string>
            <key>shortcut</key>
            <string>[spell]</string>
        </dict>
        <dict>
            <key>phrase</key>
            <string>[word]</string>
            <key>shortcut</key>
            <string>[spell]</string>
        </dict>
    </array>
</plist>

QQ Pinyin

Opertaions

Delete

  1. QQPinyin โ†’ Settings โ†’ Lexicon โ†’ User Defined Phrases::Settings
  2. Multi-select: hold Ctrl + Click, one by one.
  3. Click delete.

Add

  1. QQPinyin โ†’ Settings โ†’ Lexicon โ†’ User Defined Phrases::Settings
  2. Click "Import", select *.txt file.

Format

  • .txt format

File Example

[spell]=[position],[word]
[spell]=[position],[word]

userdefinedphraser's People

Contributors

kyan001 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.