zh-lx / pinyin-pro Goto Github PK

View Code? Open in Web Editor NEW

3.7K 28.0 325.0 13.34 MB

中文转拼音、拼音音调、拼音声母、拼音韵母、多音字拼音、姓氏拼音、拼音匹配

Home Page: https://pinyin-pro.cn

License: MIT License

JavaScript 7.69% TypeScript 92.31%

pinyin-pro pinyin hanzi hanzi-pinyin hanzi2pinyin js-pinyin pinyin-match

pinyin-pro's People

Contributors

Stargazers

Watchers

Forkers

pxmapple hy89173584jiayu blackface111 encorehe zlx1134558955 wangmeilan wozuidiao hins-z j-w-ca ipingxingshikong mengchuan87 qiuluoye6 mugicon jbjianbin whyisma wangda11 gitai123 f4112cd danny1879 fullstack-x mikec-en boykaa wannie163 yblbarry czpeier liuhui1320 littlelollipop zhangrenhua niuhaorong uchihaitach1 hzkjrs youjiezhang tankan centos-f maryue renqian805 bangbang93 sroxck herrrrr s836064858 pjqdd iooxx luozz1994 berryjimk littleseven007 wxhshine fws407296762 ashleykas 476542084 lorry1314 xiarimangguo limiao2008 pkumaplee wallleap vsuyi curryml whfay lazytuanzi evansque zhaskyba shuaibicool kanbujianwojpg cvdata shenchuanhuan codebyteme fish-liu hertz-pj reactleaner qq284590533 wangzhengbo hypnos-lin tomysk sweet-corn 0xabccba houruipeng spongebob-wu arnoldczhang 108518 shawn-zs workofwindytree andygoo spritenee huaxingxu hydercps dolt131943 m69w johnnyma blackcat308 zhaochangyou hqdmyjsw front2psy ivu4e sanyuesiyuewuyue jianweizh glacier-xl geekwish robincodex zhangaz1 liuzongquan a614645303

pinyin-pro's Issues

功能描述

请输入内容……

提炼几个功能点

pinyin('汉语2拼音11', {pattern: 'first', toneType: 'none'}); // 期望: h y p y 去除 11 2

bug 描述

错误输出，哼 h ng hng

pinyin-pro 版本号

"pinyin-pro": "^3.10.0",

（请告诉我们，如何最快的复现该 bug）

 let wd = "哼";
    let sheng = pinyin(wd, { pattern: "initial", toneType: "none", v: true }),
      yun = pinyin(wd, { pattern: "final", toneType: "none", v: true });
    console.log(wd, sheng, yun, pinyin(wd));

功能描述

##安装🇬🇧百度输入法🐾应用
##https://srf.baidu.com/?c=j&e=d&from=1000e&platform=ubuntu&ref=index_entrance_android_click
sudo gdebi fcitx-baidupinyin.deb
百度输入法 - 礼逆袭🐧
Originally posted by @englianhu in scibrokes/mytools#1 (comment)
杀猿者巫，亡巫者猿，本人中华民族雷欧，并非美猴王洋人雷猴。
npm install pinyin-pro
# 或
yarn add pinyin-pro
作者：德育处主任
链接：前端中文汉字转拼音
来源：慕课网
本文原创发布于慕课网，转载请注明出处，谢谢合作
Originally posted by @englianhu in englianhu/binary.com-interview-question#7 (comment)
sudo apt-get install *scim* *gcin* *hime* *fcitx* -y
《清(qīng)明(míng)》

清(qīng)明(míng)时(shí)节(jié)雨(yǔ)纷(fēn)纷(fēn)，路(lù)上(shàng)行(xíng)人(rén)欲(yù)断(duàn)魂(hún)；

借(jiè)问(wèn)酒(jiǔ)家(jiā)何(hé)处(chù)有(yǒu)，牧(mù)童(tóng)遥(yáo)指(qín)杏(xìng)花(huā)村(cūn)。

出处：小学必背古诗《清明》杜牧借问酒家何处有？牧童遥指杏花村

《寻(xún)隐(yǐn)者(zhě)不(bú)遇(yù)》

松(sōng)下(xià)问(wèn)童(tóng)子(zǐ)，言(yán)师(shī)採(cǎi)药(yào)去(qù)；

只(zhǐ)在(zài)此(cǐ)山(shān)中(zhōng)，云(yún)深(shēn)不(bù)知(zhī)处(chù)。

出处：《寻隐者不遇》

参考文献

字体天下 - 方正楷体拼音字库

「小红书」字体～方正楷体拼音字库～

scim输入法

「原创」SCIM输入法架构分析(上)

「原创」SCIM输入法架构分析(下)

「转载」linux下最好用的中文输入法 scim

「转载」debian下安装SCIM输入法的详细文档支援很齐全的天下诸侯各种语言输入法。

Debian下scim配置

安装Debian下的中文输入法

树莓派Scim-pinyin输入法相关（安装，不能切换中文输入法）

讯飞输入法9.0正式发布：免切换打字效率提升300%

讯飞输入法正式登陆Linux平台带来更好的输入体验

pinyin-pro在线运行

「猫城」pinyin-pro专业的汉语拼音转化器

Debian 中文自由軟體 (Debian Chinese Free Software)

聊天打字带拼音的输入法？

Debian & Ubuntu 設定中文輸入法

[Linux] 讓 Linux 下的中文輸入法更接近微軟新注音使用體驗（以Ubuntu、Linux Mint為例）有介绍gcin、hime、ibus，hime输入法类似改良版gcin支援好几种语言，用户界面比scim容易操作，只可惜没有支援汉语拼音输入法（咱们马来西亚政府所有国民华校都是以汉语拼音、阴阳上去标音来授教，[Issue] Encode / Decode : Running chunk and knit documents。可以通过孔子学院或像愚生的大嫂越女自学中文）。

Linux上的输入法－Gcin(支持词音、注音、拼音、仓颉、呒虾米)

Luna's Debian/Devuan/Ubuntu/Mint Archiver

englianhu/binary.com-interview-question#7

Originally posted by @englianhu in scibrokes/r-world#1 (comment)

提炼几个功能点

附有汉语拼音的**话输入法。

参考文献

bug 描述

当传值为全特殊字符，不包含中文时提示Maximum call stack size exceeded 例如 aaa
请输入内容……

你预期的样子是？

请输入内容……

pinyin-pro 版本号

3.6.0
请输入内容……

引入方式

浏览器 script 引入 / esmodule(浏览器 import 引入) / commonjs(node require 引入)

最小成本的复现步骤

（请告诉我们，如何最快的复现该 bug）

步骤一
步骤二
步骤三

问题描述

请输入遇到的问题...
例如输出的是： a ya，不能输出成：aya ？

功能描述

希望增加对古诗词拼音的支持
请输入内容……
比如古诗鹿柴读音是lu zhai，而不是lu chai

问题描述

请问想要获取 hao-hao-xue-xi，拼音之间用 - 分隔如何实现呢

bug 描述

使用当前的方法时 let nameList = pinyin("三九二",{ pattern: 'initial', type: 'array' }); 输出的为 [ 's', 'j', '' ]，当前的首字母不对

你预期的样子是？

[ 's', 'j', 'e' ]

pinyin-pro 版本号

@3.3.1

引入方式

const { pinyin } = require('pinyin-pro');
 let nameList = pinyin("三九二",{ pattern: 'initial', type: 'array' });

浏览器 script 引入 / esmodule(浏览器 import 引入) / commonjs(node require 引入)

最小成本的复现步骤

（请告诉我们，如何最快的复现该 bug）

步骤一
const { pinyin } = require('pinyin-pro');
步骤二
let nameList = pinyin("三九二",{ pattern: 'initial', type: 'array' });
步骤三

功能描述

建议增强match方法的功能

提炼几个功能点

1.增加一个配置项，允许设置不分词匹配，如：match('汉语拼音', 'hanpin')在不分词匹配的情况下应该返回null；

2.增加对中文的匹配，如：match('汉语拼音', '语拼')应该返回[1, 2]，虽然可以自己通过字符串匹配来实现，但是还是希望match方法可以兼容

关于韵母”ü“

使用pattern: final模式获取类似“序”（xu），“局”（ju），“与”（yu）等字的韵母时，得到的结果是“u”。

然而这些字事实上的韵母应该是是“ü”。

Speed benchmark is not comparing apple to apple

Since pinyin-pro doesn't support segment feature, it seems you shouldn't compare pinyin-pro with segment options enabled in pinyin and @napi-rs/pinyin.

The results from my computer without segment enabled in pinyin and @napi-rs/pinyin:

Hardware

OS: macOS 12.3.1 21E258 arm64
Host: MacBookPro18,2
Kernel: 21.4.0
Shell: zsh 5.8
CPU: Apple M1 Max
GPU: Apple M1 Max
Memory: 9539MiB / 65536MiB

Result

node speed.js
pinyin-pro 转换 5000 字数时间: 3.193ms
pinyin 转换 5000 字数时间: 1.835ms
@napi-rs/pinyin 转换 5000 字数时间: 0.584ms

node speed.js
pinyin-pro 转换 10000 字数时间: 9.691ms
pinyin 转换 10000 字数时间: 5.644ms
@napi-rs/pinyin 转换 10000 字数时间: 1.093ms

node speed.js
pinyin-pro 转换 1000000 字数时间: 313.71ms
pinyin 转换 1000000 字数时间: 185.946ms
@napi-rs/pinyin 转换 1000000 字数时间: 135.24ms

This result is matched with benchmark results in @napi-rs/pinyin

这几个拼音有疑惑

查岗: 'zhā gǎng',
查核: 'zhā hé',
查缉: 'zhā jī',

如果是英文，则转换出来的字符串虽然也是英文，但是会字母间会出现空格

应用场景：可以输入汉字或者英文，但是如果是汉字的话，需要转成拼音，中间用“”来连接，若是英文的话，就保持原样。如：输入“负责人”，则转换成“fu_ze_ren”，若是输入的“changeBy”,输入应该还是“changeBy”，但是转换出来的结果却是c h a r g e B y，中间自动加了空格，前端做正则需要将空格替换成""，结果就成“c_h_a_r_g_e_B_y”，所以可以适配下英文状态下的转换规则么？，

The @napi-rs/pinyin esm support is marked as ❌

The @napi-rs/pinyin indeed supports esm project:

`package.json`

{
  "type": "module",
  "dependencies": {
    "@napi-rs/pinyin": "^1.7.3"
  }
}

`index.js`

import { pinyin } from '@napi-rs/pinyin'

console.log(pinyin("你好👋")) // [ 'ni', 'hao', '👋' ]

Cannot convert string of "一丁点儿\n" to Pinyin.

bug

Can not convert the string of "一丁点儿\n" to Pinyin.
I would appreciate it if you could tell me the cause or solve it. 🙂

script

const { pinyin } = require('pinyin-pro')

const res = pinyin("一丁点儿\n")

error result

TypeError: Cannot read properties of undefined (reading 'result')
    at {workspace}/node_modules/pinyin-pro/dist/index.js:1:286390
    at E ({workspace}/node_modules/pinyin-pro/dist/index.js:1:286561)

“将”字的拼音缺少 jiàng

feat: 支持 html 拼音字符串输出

支持 html 拼音字符串输出

Unicode 扩展汉字解析 bug

bug 描述

String.split 无法将多字节的 Unicode 扩展汉字正常转换到数组，而是按照两个字符处理，导致返回两个乱码

你预期的样子是？

像正常情况下匹配不到拼音时一样，返回原字符

pinyin-pro 版本号

v3.7.2

引入方式

不影响

最小成本的复现步骤

pinyinPro.pinyin('龥') // 字库中没有的双字节字符\u9fa5，返回"龥"
pinyinPro.pinyin('𧒽') // 字库中没有的四字节字符\ud85d\udcbd，返回"� �"

建议多个文字输出多音字时以数组的方式显示，而不是限制为单字

功能描述

如题，类似 https://www.npmjs.com/package/pinyin

console.log(pinyin("中心", {
  heteronym: true               // 启用多音字模式
}));                            // [ [ 'zhōng', 'zhòng' ], [ 'xīn' ] ]

Bug: 非全拼模式时，非中文字符会被删掉

bug 描述

const { pinyin } = require("pinyin-pro");

console.log(pinyin('foo/汉字/bar/123', {
	pattern: 'pinyin',
	toneType: 'none',
	type: 'array',
	nonZh: 'spaced'
})); 
/*
This correct
[
  'f',   'o',  'o', '/',
  'han', 'zi', '/', 'b',
  'a',   'r',  '/', '1',
  '2',   '3'
]
*/
console.log(pinyin('foo/汉字/bar/123', {
	pattern: 'initial',
	toneType: 'none',
	type: 'array',
	nonZh: 'spaced'
}));
/*
[
  'f', '', '',  '', 'h',
  'z', '', 'b', '', 'r',
  '',  '', '',  ''
]
*/

console.log(pinyin('foo/汉字/bar/123', {
	pattern: 'first',
	toneType: 'none',
	type: 'array',
	nonZh: 'spaced'
}));

/*
[
'f', 'o', 'o', '/',
'h', 'z', '/', 'b',
'a', 'r', '/', '1',
'2', '3'
]
*/

非常感谢作者，你的这个实现非常高效，但多少有那么一点儿瑕玼。

传入单个字母获取多音字拼音时候报错

例如如下代码

pinyin("a", { multiple: true, type: "array" })

报错

Cannot read property 'split' of undefined

建议在dict中未发现对应拼音的时候直接返回原词.

multiple: true 和 toneType: none 结合使用时，会有重复读音，应该去重

bug 描述

你预期的样子是？

只输出一个 hao

package.json add exports

功能描述

目前这个包，没有在 package.json 中指定 exports 的描述，在 esm 中，导入时，会获得一个错误

类似如下错误

SyntaxError: Named export 'xx' not found. The requested module 'xx' is a CommonJS module, which may not support all module.exports as named exports.
CommonJS modules can always be imported via the default export, for example using:

import pkg from 'xx';
const { b, a } = pkg;

这是我在 Nuxt 中得到的错误，他默认使用 Vite

@acme/nuxt:dev:  ERROR  [worker reload] [worker init] The requested module 'file:///Users/baboon/iduo/iduo-scheme-store/node_modules/pinyin-pro/dist/index.js' does not provide an export named 'pinyin'
@acme/nuxt:dev: 
@acme/nuxt:dev:   import { pinyin } from '/Users/baboon/iduo/iduo-scheme-store/node_modules/pinyin-pro/dist/index.js';
@acme/nuxt:dev:   ^^^^^^
@acme/nuxt:dev:   SyntaxError: The requested module '/Users/baboon/iduo/iduo-scheme-store/node_modules/pinyin-pro/dist/index.js' does not provide an export named 'pinyin'
@acme/nuxt:dev:   at ModuleJob._instantiate (node:internal/modules/esm/module_job:124:21)
@acme/nuxt:dev:   at async ModuleJob.run (node:internal/modules/esm/module_job:190:5)
@acme/nuxt:dev:

这个也是目前对于多格式输出的库的主流用法
这是 Vue https://github.com/vuejs/core/blob/main/packages/vue/package.json#L22-L70
当然还有一些其他库

如果你没有时间，我愿意帮助你做这件事情。

最后感谢你维护了这个库，为众多开发者节省了时间😄

麻烦增加一下对这几个字的支持，并修改bug

1.需要增加的字
'乥':'hol',
'乲':'cɑ lo',
'兙':'shi2 ke3',
'兛':'qian1 ke4',
'兝':'gong1 fen1',
'兞':'hao2 ke4',
'兡':'bai3 ke4',
'兣':'gong1 li3',
'兺':'bun1',
'叾':'du1',
'唜':'mas5',
'嗧':'jia1 lun2',
'怾':'gi0',
'朩':'pin4',
'烪':'zhen4',
'瓧':'shi2 wa3',
'瓩':'qian1 wa3',
'瓰':'fen1 wa3',
'瓱':'mao2 wa3',
'瓲':'tun2 wa3',
'瓼':'li3 wa3',
'甅':'li2 wa3',
'硛':'ceok0',
'莻':'neus0',
'襨':'e0',
'覅':'fiao4',
'迲':'ke0',
'黁':'nun2',
'龥':'yu4',

2.bug，'呣'字，num模式输出结果不正确应为'm2'，实际结果是'ḿ'
pinyinPro.pinyin('呣', { toneType: 'num' }) //ḿ

Wrong pinyin for an idiom

bug 描述

pinyin('价廉物美') returns jià lián wù měi which is correct.

But, pinyin('物美价廉') also return jià lián wù měi which is incorrect (wrongly ordered).

你预期的样子是？

pinyin('物美价廉') should return wù měi jià lián.

pinyin-pro 版本号

v3.7.1

引入方式

Tested both in browser and node environment. For browser, ViteJS is used.

最小成本的复现步骤

var {pinyin} = require("pinyin-pro")
console.log(pinyin('物美价廉') + ' -- ' + pinyin('价廉物美'))

Here's a RunKit: https://runkit.com/cheeaun/6205478538a7f900084c18ca

Here's a screenshot:

Notes: I'm not sure if this affect other idioms too. I found that this idiom is inside main/data/dict3.ts, and the library seems to only check if characters exists without looking at the indices?

来自600+数据的测试结果性能问题可能有些堪忧相比pinyin慢了 3-4倍

问题描述

请输入遇到的问题...

完整支持繁体字吗？

问题描述

试用了下，支持繁体字。其他库（比如 https://github.com/hotoo/pinyin ）提到“简单的繁体支持”，请问 pinyin-pro "完整"地支持繁体字吗？

增加分别获取介音和韵尾的功能

例如，如果使用pattern: final模式获取”状“字的拼音，得到的结果是”uang“，由介音”u“和韵尾”ang“构成。希望能够增加一个新功能，可以将韵母的这两部分分离。

是否可以添加一个「姓氏」配置项

功能描述

因为「姓氏」汉子存在多音字，如下：

const pinyin = require('pinyin-pro');
pinyin.pinyin('曾', { multiple: true, type: 'array' });
[ 'céng', 'zēng' ]

是否可提供一个功能，如下：

const pinyin = require('pinyin-pro');
pinyin.pinyin('曾', { multiple: true, type: 'array', kind: 'lastname' });
[ 'zēng' ]

提炼几个功能点

支持返回姓氏拼音

BUG: customPinyin 和 multiple 参数同时使用时失效

bug 描述

customPinyin 和 multiple 参数同时使用时失效：

pinyinPro.customPinyin({
        嗯: 'en',
      });
      console.log(
        pinyinPro.pinyin('嗯', {
          multiple: true,
          type: 'array',
          nonZh: 'removed',
          toneTyp: 'num',
        })
      );

姓氏模式下自定义拼音无效

bug 描述

姓氏模式下如果自定义拼音包含了姓氏，则无法生效

你预期的样子是？

应先判断自定义拼音再判断姓氏

pinyin-pro 版本号

3.6.1

引入方式

esmodule

最小成本的复现步骤

customPinyin({
  乐嘉: 'lè jiā',
  乐毅: 'yuè yì'
})
pinyin('乐嘉') // 输出 "lè jiā"
pinyin('乐嘉', { mode: 'surname' }) // 输出 "yuè jiā"

部分包含儿化音的字符串在转换时可能报错

bug 描述

包含下列儿化音短句的字符串，如果短句后有任意字符(包括空格/换行)，在转换时一定会报错
( mode: "surname" 时 那阵儿、那会儿 因为包含姓氏不报错)

有点儿
压根儿
那会儿
自个儿
好好儿
挨个儿
那阵儿
大婶儿
个头儿
一丁点儿

pinyin-pro 版本号

"pinyin-pro": "^3.13.0"

引入方式

commonjs(node require 引入)

最小成本的复现步骤

https://pinyin-pro.cn/run/run.html

可以支持直接获取拼音首字母吗

想做城市列表根据首字母查询的，目前可以按照获取拼音再截取字符串第一个，可以支持直接获取首字母功能吗？

输出首字母

问题描述

输出字母能不能把空格去掉呢

对多音字的处理

感谢作者的贡献 🌹🌹🌹🌹

pinyin('李乐', { toneType: 'none' }); // 'li le'

● 希望也能得到其它形式的拼音 // 如把通过“李乐”得到 'li le' 和 'li yue' 这样的多种可能性
● 因为我遇到的工程有一个输入拼音首字母来快速检索的功能，之前用手段经常不能匹配到“会计”功能，因为程序以为那个是 hui ji

微信小程序使用不了

在微信小程序上使用，整个程序崩溃。

输入字符包含 “嗯”时异常

bug 描述

pinyin("阿斯蒂芬嗯", {pattern: 'first', toneType: 'none',})
tonetype设为none 仍然返回带音调的 a s d f ň

你预期的样子是？

请输入内容……

pinyin-pro 版本号

3.6.1

引入方式

浏览器 script 引入 / esmodule(浏览器 import 引入) / commonjs(node require 引入)

最小成本的复现步骤

（请告诉我们，如何最快的复现该 bug）

步骤一
步骤二
步骤三

关于韵母`ü`

目前的版本是否有考虑将韵母ü修改为v的选项支持？甚至默认将ü修改为v？因为ü在很多环境下使用会有问题
例如

pinyin('吕布', { toneType: 'none' }) // 'lü bu'

当然，这个问题显然可以通过字符串替换来达成。
如果有支持当然就更好了。

建议增加英文字母加声调转拼音的功能

例如
nv3, hai2, er -> nǚ，hái，ér

cdn失效,建议更换

问题描述

请输入遇到的问题...

转英文报错

test case:
console.log(pinyin('xxxxx', { toneType: 'none', type: 'array' }));
error:
TypeError: Cannot read property 'xxxxx' of undefined

略（lue）

bug 描述

“略”我们会输入“lue”，但是只能输出 “lüe”。

皇甫: 姓氏拼音不正确

match太灵活了

比如“百度网盘”四个字，这里面只有“度”是多音字，也就是说可以是，baiduwangpan、baiduowangpan，首字母是bdwp，我输入oa两个字母的时候也能match上，很费解，虽然是有duo这个拼音，但是起码是我输入duo之后再匹配上。

对于多音词处理的问题

感谢项目贡献者的工作！我认为可以对本项目的一些功能做出改进：
本项目对于多音词处理上有一些小bug.而且根据文档，multiple属性不能解决此问题
例如

pinyin("朝阳");
//输出"zhāo cháo yáng"

其中“朝”字被自动检测出两种读音。
如果关闭multiple属性为false，输出结果仍然是zhāo cháo yáng，一行出现了三个拼音。
"朝阳"一词有两种正确读音：zhao yang和chao yang均正确。对于其他多音词也可能会出现这个问题。
我认为可以把多音词的各种读音都返回到一个数组中。

第二个问题就是，项目中的多音词库似乎不足。比如“增长”，有zeng zhang和zeng chang两种读音。程序只能得到zeng zhang读音。再比如"大夫"，有da fu(古代官职名)和dai fu(医生称谓)，项目中只能返回dai fu一词。
我认为可以在现有的权威拼音数据库中更新一些数据，可以解决这个问题。

姓名拉丁化首字母大写

功能描述

有三个文本框，第一个输入姓氏，第个输入名字，第三个框自动输出拉丁化的姓名拼音，如姓氏：刘，名字：德华，输出：Liu Dehua

这个效果怎么实现呢？

当获取拼音首字母时同时获取音调num时，只会出现首字母

bug 描述

当获取拼音首字母时同时获取音调num时，只会出现首字母

你预期的样子是？

首字母并携带音调num

pinyin-pro 版本号

引入方式

node require 引入

最小成本的复现步骤

pinyin.pinyin('山西',{pattern:'first',toneType:'num'})
pinyin.pinyin('陕西',{pattern:'first',toneType:'num'})

步骤一
步骤二
步骤三

功能描述

希望增加打包后的ESM文件，
有些环境下没有办法用webpack之类的打包工具，也不能用requirejs，只希望直接用ESM动态导入使用，如：

import('/lib/pinyin-pro.js').then(exports=>{
      let pinyin=exports.pinyin('测试');
});