Comments (24)
@xuan-w @sunlin7 我添加了一个设置变量,你们可以试试
from pyim.
是呢,现在的规则是,第一个汉字永远是词库的第一个汉字,后面的会按照使用频率动态调整,我不知道这个规则合不合理
from pyim.
对形码来说固定字序最好,词序我觉得可以调整。
from pyim.
或者能否把这个排序方法暴露出来,形码特殊处理
from pyim.
型码既涉及字词,又涉及个人词库,公共词库,感觉很绕脑袋,最好能找一个通用的算法,如果找不到,我就将相关函数劈来,方便你们 override
from pyim.
或者添加一个选项来控制
from pyim.
就我个人而言。如果全是字,按词库来。如果字词混合,字按库来排,词按频率。如果全是词,按频率。我记得有次讨论过了
from pyim.
现在版本用 cl-lib 写后,看不太懂,不会 hack了。
from pyim.
如果字词混合,字按库来排,词按频率
意思是先排字,后排词? 还是先排词后排字?
from pyim.
先按标准库排字。
from pyim.
现在版本用 cl-lib 写后,看不太懂,不会 hack了。
基本上就是在你的配置中添加类似下面的代码
(cl-defmethod pyim-candidates-create
:extra "lld2001hack" (imobjs (scheme pyim-scheme-xingma))
"按照 SCHEME, 从 IMOBJS 获得候选词条,用于五笔仓颉等形码输入法。"
(let (result)
(dolist (imobj imobjs)
(let* ((codes (pyim-codes-create imobj scheme))
(last-code (car (last codes)))
(other-codes (remove last-code codes))
output prefix)
;; 如果 wubi/aaaa -> 工 㠭;wubi/bbbb -> 子 子子孙孙;wubi/cccc 又 叕;
;; 用户输入为: aaaabbbbcccc
;; 那么:
;; 1. codes => ("wubi/aaaa" "wubi/bbbb" "wubi/cccc")
;; 2. last-code => "wubi/cccc"
;; 3. other-codes => ("wubi/aaaa" "wubi/bbbb")
;; 4. prefix => 工子
(when other-codes
(setq prefix (mapconcat
(lambda (code)
(pyim-candidates-get-chief
scheme
(pyim-dcache-get code '(icode2word))
(pyim-dcache-get code '(code2word))))
other-codes "")))
;; 5. output => 工子又 工子叕
(setq output
(let* ((personal-words (pyim-dcache-get last-code '(icode2word)))
(personal-words (pyim-candidates--sort personal-words))
(common-words (pyim-dcache-get last-code '(code2word)))
(chief-word (pyim-candidates-get-chief scheme personal-words common-words))
(common-words (pyim-candidates--sort common-words))
(other-words (pyim-dcache-get last-code '(shortcode2word))))
(mapcar (lambda (word)
(concat prefix word))
`(,chief-word
,@personal-words
,@common-words
,@other-words))))
(setq output (remove "" (or output (list prefix))))
(setq result (append result output))))
(when (car result)
(delete-dups result))))
from pyim.
这个怎么用啊?放在require 'pyim 后面吗?
搜索了下,大概明白了履盖的方法。但不知道怎么写这个逻辑,能不能麻烦你帮下忙
from pyim.
我试着调整了一下,你可以再试试
(defun pyim-candidates--xingma-words (code)
"按照形码 scheme 的规则,搜索 CODE, 得到相应的词条列表。
当前的词条的构建规则是:
1. 先排公共词库中的字。
2. 然后再排所有词库中的词,词会按词频动态调整。"
(let* ((common-words (pyim-dcache-get code '(code2word)))
(common-chars (pyim-candidates--get-chars common-words))
(personal-words (pyim-dcache-get code '(icode2word)))
(other-words (pyim-dcache-get code '(shortcode2word)))
(words-without-chars
(pyim-candidates--sort
(pyim-candidates--remove-chars
(delete-dups
`(,@personal-words
,@common-words
,@other-words))))))
`(,@common-chars
,@words-without-chars)))
from pyim.
好的,谢谢。
另外我调试了下,发现获取common-words时的词时,顺序就已经变了。
(pyim-dcache-get "wubi/g" '(code2word))
这段代码输出:
("与" "一" "王")
我理解这个 code2word是不是词库,默认是不会变顺序的?默认顺序应该“一”在最前面(一级简码)
我将dcache删除后,重新启动,输出这样:
("一" "与" "王")
from pyim.
对,这个顺序不会变,词库什么样子,顺序就是什么样子,除非你添加了多个词库
from pyim.
但在我这确实改变顺序了,不知道什么原因。会不会跟我的用法有关系?
我现在有三台机器会相互同步个人词库。定时用 pyim-export-words-and-counts 导出到外部 dict 文件,启动emacs时,再用pyim-import-words-and-counts 分别导进三个词库。现在会生成大量的带日期缓存文件:
pyim-dhashcache-icode2word-backup-20220704084810
pyim-dhashcache-icode2word-backup-20220704113050
pyim-dhashcache-icode2word-backup-20220705081521
pyim-dhashcache-icode2word-backup-20220705082011
pyim-dhashcache-icode2word-backup-20220706091706
pyim-dhashcache-icode2word-backup-20220706144411
pyim-dhashcache-icode2word-backup-20220706171605
from pyim.
现在词序是预期的了。
pyim-dhashcache-icode2word-backup-20220704084810
pyim-dhashcache-icode2word-backup-20220704113050
pyim-dhashcache-icode2word-backup-20220705081521
pyim-dhashcache-icode2word-backup-20220705082011
pyim-dhashcache-icode2word-backup-20220706091706
pyim-dhashcache-icode2word-backup-20220706144411
pyim-dhashcache-icode2word-backup-20220706171605
像这些还是会出现,不知道什么原因。
from pyim.
这是pyim对个人词库缓存的保护机制,如果个人词库缓存的尺寸发生的变化超过一个阈值,pyim就会backup, 防止缓存损坏导致的数据丢失
from pyim.
我现在有三台机器会相互同步个人词库。定时用 pyim-export-words-and-counts 导出到外部 dict 文件,启动emacs时,再用pyim-import-words-and-counts 分别导进三个词库。
这种是不是会导致词库变化较大啊。
from pyim.
不知道,一般词库 hash-table-count 变化超过20%,就会自动备份,
from pyim.
好的,谢谢。我定期手动删除吧。
from pyim.
如果字词混合,字按库来排,词按频率
意思是先排字,后排词? 还是先排词后排字?
如您所说,不同形码用户的需求确实不同。比如说,我作为郑码用户,就希望输入法完全不考虑字词区分,只按词频或者词典顺序进行排序。
因为郑码本身的重码率极低,留下大量码位给词语。对于郑码使用来者说,nyll 在 99.9% 的情况下,用户想打的是词语“自己“而不是单字”翺“,因为后者几乎永远是组词”翱翔“来使用,而“翱翔”的编码 nguy 是唯一的,不存在重码。
既然不同用户需求不同,还希望留出一个选项,至少将“严格按词典文件排序”作为一个候选项?
感谢!
from pyim.
@xuan-w 我觉得有特殊需求的同学还是直接 advice 下面的函数吧,比选项更灵活
(defun pyim-candidates--xingma-words (code)
"搜索形码 CODE, 得到相应的词条列表。
当前的词条的构建规则是:
1. 先排公共词库中的字。
2. 然后再排所有词库中的词,词会按词频动态调整。"
(let* ((common-words (pyim-dcache-get code '(code2word)))
(common-chars (pyim-candidates--get-chars common-words))
(personal-words (pyim-dcache-get code '(icode2word)))
(other-words (pyim-dcache-get code '(shortcode2word)))
(words-without-chars
(pyim-candidates--sort
(pyim-candidates--remove-chars
(delete-dups
`(,@personal-words
,@common-words
,@other-words))))))
`(,@common-chars
,@words-without-chars)))
from pyim.
@xuan-w 我觉得有特殊需求的同学还是直接 advice 下面的函数吧,比选项更灵活
(defun pyim-candidates--xingma-words (code) "搜索形码 CODE, 得到相应的词条列表。 当前的词条的构建规则是: 1. 先排公共词库中的字。 2. 然后再排所有词库中的词,词会按词频动态调整。" (let* ((common-words (pyim-dcache-get code '(code2word))) (common-chars (pyim-candidates--get-chars common-words)) (personal-words (pyim-dcache-get code '(icode2word))) (other-words (pyim-dcache-get code '(shortcode2word))) (words-without-chars (pyim-candidates--sort (pyim-candidates--remove-chars (delete-dups `(,@personal-words ,@common-words ,@other-words)))))) `(,@common-chars ,@words-without-chars)))
@tumashu 能不能把上面这段写到文档里啊 🙏
我找了半天才找到这里 🥲
from pyim.
Related Issues (20)
- 使用pyim全拼如何输入“nue”和“lue”? HOT 2
- 请问如何当从pyim切换回英文时保留已输入字母? HOT 5
- 即使 pyim-page-tooltip 已经设置成 minibuffer,pyim 依然依赖 posframe HOT 1
- 输入位置越靠后候选词(开启了浮窗)的响应越慢 HOT 8
- Sync latest vertico suuport changes HOT 6
- [FR] Display how to type each candidate in default scheme, when assistant scheme is enabled HOT 3
- How can I diable the "pyim-translate-trigger-char" ? HOT 2
- 关于emacs29词库的报错 HOT 1
- 光标颜色不随主题变化而变化 HOT 1
- 为什么我的候选词后面总出现(buff) HOT 4
- rime-quanpin 情况下上屏的词顺序固定不变 HOT 2
- 固定词频配置 HOT 1
- pyim-isearch-mode开启,无法搜索中括号 [ ] HOT 1
- pyim输入法无法在eww打开的网站(比如百度)的输入框中输入中文 HOT 1
- error: No such file or directory, xr
- 几年未用了。新安装后,词序感觉不太对啊。 HOT 4
- 如何简单的按词库顺序排序 HOT 2
- 默认的输入法打不出lue 忽略的略 HOT 1
- termux 下的emacs29.3使用报错“error in process sentinel: End of file during parsing” HOT 2
- 可否输入法使用 emacs-rime,但其他功能用 pyim? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyim.