Comments (28)
@medcl ,如何结合中文分词来实现这个拼音的提示?谢谢
from elasticsearch-analysis-pinyin.
@medcl 同问,我也想知道这个如何配置
from elasticsearch-analysis-pinyin.
for es2.0这几天会更新,随便我写一个例子,敬请留意.
from elasticsearch-analysis-pinyin.
@medcl HI,一直关注着你的新例子啥时候更新呢,不知道什么时候可以看到拼音提示的这个demo
from elasticsearch-analysis-pinyin.
抱歉 这阵子太忙了,抽空一定更新的
send via my Phone.
在 2015年11月21日,下午8:42,domyway [email protected] 写道:
@medcl HI,一直关注着你的新例子啥时候更新呢,不知道什么时候可以看到拼音提示的这个demo
—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
@medcl 好的,静候佳音:)
from elasticsearch-analysis-pinyin.
pinyin这块要加修改下,增加些功能才能完全实现第一个的需求,目前可以的例子有一个:
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"custom_pinyin_analyzer" : {
"tokenizer" : "ik_smart",
"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
}
},
"filter" :{
"full_pinyin_no_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : ""
},
"full_pinyin_with_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : " "
},
"first_letter_pinyin" : {
"type" : "pinyin",
"first_letter" : "only",
"padding_char" : ""
}
}
}
}
}'
from elasticsearch-analysis-pinyin.
用了你给的例子,没反应啊
from elasticsearch-analysis-pinyin.
错误贴一下
send via my Phone.
在 2015年12月31日,下午4:04,i6448038 [email protected] 写道:
用了你给的例子,没反应啊
—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
感谢你的来信,我的环境是ElasticSearch-rtf,我的index设置是这样的(PHP 代码):
......
"analysis" => [
"filter" => [
"name_ngrams" => [
"side" => "front",
"max_gram" =>40,
"min_gram" => 1,
"type" => "edgeNGram"
],
"full_pinyin_no_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => ""
],
"full_pinyin_with_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => " "
],
"first_letter_pinyin" => [
"type" => "pinyin",
"first_letter" => "only",
"padding_char" => ""
]
],
"analyzer" => [
"custom_pinyin_analyzer" => [
"type" => "custom",
"tokenizer" => "ik_smart",
"filter" => ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
],
"default" => [
"type" => "custom",
"tokenizer" => "smartcn_sentence",
"filter" => ["asciifolding", "smartcn_word", "snowball", "shingle"],
"char_filter" => ["ph" => "f", "qu" => "k"]
],
"keyword2" => [
"type" => "custom",
"tokenizer"=> "keyword",
"filter" => ["lowercase"]
],
"prefix" => [
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["lowercase", "name_ngrams"]
]
]
],
'mappings' => [
self::TYPE => [
'properties' => [
......
'name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
.....
'hospital_name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'disease' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"disease" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'hospital_alias' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_alias" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
然后我的查询指令是这样的:
$param = [
'index' => ElasticSearch::INDEX,
'type' => ElasticSearch::TYPE,
'body' => [
'size' =>9000,
'from' =>0,
'query' => [
"multi_match" => [
"query" => 'xh',
"type" => "best_fields",
"fields" => [ "hospital_name", "hospital_alias", "disease", "name"],
"tie_breaker" => 0.3,
"minimum_should_match" => "10%"
]
]
]
];
$response =$this->client->search($param);
结果却是通过拼音找不到东西:
返回结果:
[hits] => Array
(
[total] => 0
[max_score] =>
[hits] => Array
(
)
hits等于零
from elasticsearch-analysis-pinyin.
多谢作者,关注一下这个。想结合使用。
from elasticsearch-analysis-pinyin.
查询里面没有指明拼音字段,如disease,fields需要明确加上:disease. name_pinyin而不是仅disease
from elasticsearch-analysis-pinyin.
from elasticsearch-analysis-pinyin.
用nGram得到的结果准确性好像差很多。。但是不用的话,又只能搜全部才能出结果。。medcl大神考虑过对拼音本身的分词嘛,就是通过配置分析器,可以将”liudehua“切分成”liu de hua",这样,应该能解决拼音的各种问题。还是我理解不够,不会配置这种分词器呢?@medcl
from elasticsearch-analysis-pinyin.
@bsll 你自定义的拼音分词时,有一个参数padding_char
就是用来分割拼音的,你设置成空格就有了
from elasticsearch-analysis-pinyin.
@i6448038 能通过_mapping接口获取一下实际的mapping,看看是不是你的mapping生效了,这个是常见的问题
from elasticsearch-analysis-pinyin.
@medcl 谢谢大神,已经成功了,非常好用!
from elasticsearch-analysis-pinyin.
@medcl
你好,我按照你上面"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
的写法,但用首字母搜索搜不到
analyzer中的tokenizer我用了ik_max_word
然后mapping
PUT my/folks/_mapping
{
"folks": {
"properties": {
"name": {
"type": "string",
"term_vector" : "with_positions_offsets",
"analyzer": "ik",
"search_analyzer": "ik",
"fields": {
"pinyin": {
"type": "string",
"analyzer": "custom_pinyin_analyzer",
"search_analyzer": "custom_pinyin_analyzer"
}
}
}
}
}
}
PUT my/folks/david
{
"name": "郭富城"
}
然后搜索
GET my/folks/_search
{
"query": {
"match": {
"name.pinyin": "fucheng"
}
}
}
只有guo fu cheng
、guofucheng
这种搜索的到
fucheng
、guofu
、gfc
都搜不到
我现在很奇怪自定义analyzer时的三个filter是顺序执行的还是并行执行的?去翻了官方文档,没找到具体说明的。
from elasticsearch-analysis-pinyin.
@Morriaty-The-Murderer 顺序执行的
from elasticsearch-analysis-pinyin.
@medcl 那请问我上面是哪里有问题呢?怎么才能使实现guofucheng
、fucheng
、guo fu cheng
、gfc
都能搜索到呢?
我现在能想到的就是多做几个multi_fields,分别做no_space、with_space、first_letter
from elasticsearch-analysis-pinyin.
@Morriaty-The-Murderer:我也正在做这个拼音分词,搞了好久没搞出来,你是怎么实现的呢,是多做几个multi_fields吗
from elasticsearch-analysis-pinyin.
这个例子配置了三个filter,使用http://localhost:9200/medcl/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=custom_pinyin_analyzer查询时,分词都不生效,难道是我的版本的问题(1.7.1)
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"custom_pinyin_analyzer" : {
"tokenizer" : "ik_smart",
"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
}
},
"filter" :{
"full_pinyin_no_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : ""
},
"full_pinyin_with_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : " "
},
"first_letter_pinyin" : {
"type" : "pinyin",
"first_letter" : "only",
"padding_char" : ""
}
}
}
}
}'
from elasticsearch-analysis-pinyin.
欢迎试试最新版本,有问题可重开本帖
from elasticsearch-analysis-pinyin.
@medcl medcl老师,能不能指点下上面的问题要怎么做呢?
怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢?
更甚者,我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。
from elasticsearch-analysis-pinyin.
拼音就是按拼音来匹配的,所以只有拼音一样,搜出来自然正常,一般需要结合另外一个中文字段组合,两个字段权重不一样,优先中文
levylll [email protected]于2016年10月9日 周日下午1:05写道:
@medcl https://github.com/medcl medcl老师,能不能指点下上面的问题要怎么做呢?
怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢?
更甚者,我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD75z_eqNoqbPfd5Sb7jl8DKbZ_TOwOks5qyHYTgaJpZM4F_nIh
.
from elasticsearch-analysis-pinyin.
@medcl 是的,你的意思我明白,目前我这边做的也是同时有拼音和中文的字段,理论上这种错别字的处理应该是先ik分词再pinyin分词的吧,但是总感觉结果不是太理想。。。。
from elasticsearch-analysis-pinyin.
@medcl 我想问一下 , ik+pinyin 分词,怎么能优先匹配中文词组,在匹配拼音词组?
from elasticsearch-analysis-pinyin.
@levylll @JsonShare
一个字段不现实,多个字段吧,中文和拼音各一个字段,使用 multi-filed
from elasticsearch-analysis-pinyin.
Related Issues (20)
- 关于zh,ch,sh无法查询到相关的词语
- 关于示例中name.pinyin搜索能直接搜中文英文 HOT 1
- 没有高亮
- No installable zip in release assets for v8.4.2 and v8.4.3 HOT 1
- es 8.5X版本无法建立mapping HOT 1
- v6.8.20 源码和jar包对不上
- es7.17.0 使用7.17.0版本依然报错startOffset HOT 3
- 求助,使用match_phrase搜索不到结果 HOT 1
- 中文首字符携带数字排序不理解大小
- 严重BUG:当分词内容中包含单独的A字母时,这个A字母会被分词器扔掉 HOT 1
- 如何解决同音字的问题 HOT 2
- 没有7.17.10版本吗? HOT 1
- elasticsearch8.7.0可以使用7.x版本的拼音吗 HOT 3
- 怎么在分词后保留"c++软件工程师"中“+”号在结果中,为什么拼音分词器会过滤掉符号呢
- 中英文混合时能否也支持下提取英文单词首字母
- 拼音首字母查询问题,当第二个字的拼音首字母为第一个字的韵母时查询不到结果 HOT 1
- 构建了 8.10.2 ,8.10.3,8.10.4,7.17.14供使用 HOT 6
- 求ES 8.12版本的插件 HOT 2
- ES8.9.2,release版本没有编译好的jar包 HOT 1
- 希望提供8.13.2、7.17.19版本插件 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-analysis-pinyin.