我碰到的问题就是如标题所述。我的想法是只按照完整的拼音来建索引，所以按照文档里的第一种方式来配置pinyin analyzer： curl -XPUT <a hre

我mongodb数据库中的数据样式如下： { 'url' => '<a href="http://gdxrz.com/" rel="nofollow"

查询需要指明字段才行如 title:dianshiji Medcl' <a href="ht

索引里面有数据么？ <a href="http://localhost:9200/searchshowinde

为什么我建索引的时候必须配置filter“nGram”才能在搜索的时候输入拼音得到数据？ about elasticsearch-analysis-pinyin HOT 18 CLOSED

medcl commented on June 10, 2024

为什么我建索引的时候必须配置filter“nGram”才能在搜索的时候输入拼音得到数据？

from elasticsearch-analysis-pinyin.

Comments (18)

medcl commented on June 10, 2024

第一种是在medcl这个索引下面新建的这个analyzer，你的index是searchshowindex_v3，两个是分开的，你换一下index重新配置一下试试

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

非常感谢，还是没弄明白各种配置字段的含义我试试看
此邮件来自易信 - 点击下载，免费短信、免费国际通话服务等你体验！

在2014年7月4日 21:13:05, Medcl [email protected] 写道：

第一种是在medcl这个索引下面新建的这个analyzer，你的index是searchshowindex_v3，两个是分开的，你换一下index重新配置一下试试

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

我想起来了，我建analyzer的时候指定的索引就是searchshowindex_v3，帖子里贴出来的是我直接从教程里面复制的。所以现在还是搞不懂为什么，配置好以后测试分词效果都可以，就是建立索引好像无法成功，log信息里面没有任何报错，就是搜索的时候找不到数据，我的搜索语句如下：
http://localhost:9200/searchshowindex_v3/searchshowtype/_search?q=dianshiji
我的数据库里面有很多关于“电视机”的数据，但就是一个也搜不出来，只要我在配置analyzer的时候加上nGram就能搜出结果，但是基本上我数据库有多少数据，它就出多少数据。我贴一下我的完整配置流程吧：
第一步：创建索引
curl -XPUT localhost:9200/searchshowindex_v3 -d'{
"index":{
"analysis":{
"analyzer":{"pinyin_analyzer":{"tokenizer":"my_pinyin","filter":["standard","lowercase"]}},
"tokenizer":{"my_pinyin":{"type":"pinyin","first_letter":"append","padding_char":""}}
}}}'
第二步：配置mapping
curl -XPOST localhost:9200/searchshowindex_v3/searchshowtype/_mapping -d'
{
"searchshowtype":{
"_all":{"analyzer":"pinyin_analyzer","term_vector":"no","store":false}}}'
第三步：创建mongodb的river
curl -XPUT localhost:9200/_river/ssmongo2/_meta -d'{
"type":"mongodb",
"mongodb":{
"host":"192.168.0.10", "port":22222,
"db":"verticalsearch",
"collection":"searchshow"
},
"index":{"name":"searchshowindex_v3","type":"searchshowtype"}
}'
这个配置过程我用IK也试过，是完全可以的，就是pinyin的时候出了这种问题。不知道究竟错在哪里了。

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

测试数据和查询也发一下

send via my Phone.

在 2014年7月5日，8:19，ganxiaomao [email protected] 写道：

我想起来了，我建analyzer的时候指定的索引就是searchshowindex_v3，帖子里贴出来的是我直接从教程里面复制的。所以现在还是搞不懂为什么，配置好以后测试分词效果都可以，就是建立索引好像无法成功，log信息里面没有任何报错，就是搜索的时候找不到数据，我的搜索语句如下：
http://localhost:9200/searchshowindex_v3/searchshowtype/_search?q=dianshiji
我的数据库里面有很多关于“电视机”的数据，但就是一个也搜不出来，只要我在配置analyzer的时候加上nGram就能搜出结果，但是基本上我数据库有多少数据，它就出多少数据。我贴一下我的完整配置流程吧：
第一步：创建索引
curl -XPUT localhost:9200/searchshowindex_v3 -d'{

"index":{

"analysis":{
"analyzer":{"pinyin_analyzer":{"tokenizer":"my_pinyin","filter":["standard","lowercase"]}},
"tokenizer":{"my_pinyin":{"type":"pinyin","first_letter":"append","padding_char":""}}
}}}'
第二步：配置mapping
curl -XPOST localhost:9200/searchshowindex_v3/searchshowtype/_mapping -d'
{

"searchshowtype":{
"_all":{"analyzer":"pinyin_analyzer","term_vector":"no","store":false}}}'
第三步：创建mongodb的river
curl -XPUT localhost:9200/_river/ssmongo2/_meta -d'{

"type":"mongodb",
"mongodb":{
"host":"192.168.0.10", "port":22222,
"db":"verticalsearch",
"collection":"searchshow"
},
"index":{"name":"searchshowindex_v3","type":"searchshowtype"}
}'
这个配置过程我用IK也试过，是完全可以的，就是pinyin的时候出了这种问题。不知道究竟错在哪里了。

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

我mongodb数据库中的数据样式如下：
{
'url' => 'http://gdxrz.com/',
'title' => '电视挂架|电视机吊架|电视机支架|显示器支架|液晶电视机挂架|液晶...',
'info' => ' tcl 电视挂架 nb 电视挂架投影仪吊架投影机支架显示器推车投影机吊架 lg 电视挂架电视架厂家电视机吊架红叶支架幕红叶支架幕投影仪支架 ',
}
查询语句为：
http://localhost:9200/searchshowindex_v3/searchshowtype/_search?q=dianshiji

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

查询需要指明字段才行如 title:dianshiji

Medcl'

http://log.medcl.net

------------------ 原始邮件 ------------------
发件人: "ganxiaomao"[email protected];
发送时间: 2014年7月5日(星期六) 中午1:55
收件人: "medcl/elasticsearch-analysis-pin"[email protected];
抄送: "Medcl'"[email protected];
主题: Re: [elasticsearch-analysis-pinyin] 为什么我建索引的时候必须配置filter“nGram”才能在搜索的时候输入拼音得到数据？ (#9)

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

非常感谢你的回复，这两天不上班就没看到。现在的问题依旧存在，虽然你说了查询时指明字段，但还是不行。我一开始在mapping里面就配置了_all，它的analyzer为pinyin，其他字段并没有配置，根据我使用IK的经验，直接q=关键字是能够有结果的，不然的话也不会在我将pinyin里的filter多配置一个nGram的时候就能搜到结果，然后根据你说的字段的问题我又重新尝试了一下，步骤如下：
1.创建索引：
curl -XPUT localhost:9200/searchshowindex_v1 -d'
{
"index":{
"analysis":{
"analyzer":{"pinyin_analyzer":{"tokenizer":"my_pinyin","filter":["standard"]}},
"tokenizer":{"my_pinyin":{"type":"pinyin","first_letter":"none","padding_char":""}}
}
}
}'
2.配置mapping
curl -XPOST localhost:9200/searchshowindex_v1/searchshowtype/_mapping -d'
{
"searchshowtype":{
"properties":{
title:{
“type”:"string",
"store":"no",
"term_vector":"with_positions_offsets",
"analyzer":"pinyin_analyzer",
"boost":5}
}
}
}'
3.创建_river
curl -XPUT localhost:9200/_river/ssmongo1/_meta -d'{
"type":"mongodb",
"mongodb":{
"host":"192.168.0.10", "port":22222,
"db":"verticalsearch",
"collection":"searchshow"
},
"index":{"name":"searchshowindex_v1","type":"searchshowtype"}
}'
4.查询
http://localhost:9200/searchshowindex_v1/searchshowtype/_search?q=title:dianshiji
结果为：
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
5.analyzer测试
http://localhost:9200/searchshowindex_v1/_analyze?text=电视机&analyzer=pinyin_analyzer
结果为：
{"tokens":[{"token":"dianshiji","start_offset":0,"end_offset":3,"type":"word","position":1}]}

以上就是针对某个字段配置es并测试的结果，并没有出现想要的结果，所以还是不明白怎么回事。

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

索引里面有数据么？

http://localhost:9200/searchshowindex_v1/searchshowtype/_search?q=*

在 2014年7月7日，上午8:28，ganxiaomao [email protected] 写道：

非常感谢你的回复，这两天不上班就没看到。现在的问题依旧存在，虽然你说了查询时指明字段，但还是不行。我一开始在mapping里面就配置了_all，它的analyzer为pinyin，其他字段并没有配置，根据我使用IK的经验，直接q=关键字是能够有结果的，不然的话也不会在我将pinyin里的filter多配置一个nGram的时候就能搜到结果，然后根据你说的字段的问题我又重新尝试了一下，步骤如下：
1.创建索引：
curl -XPUT localhost:9200/searchshowindex_v1 -d'
{
"index":{
"analysis":{
"analyzer":{"pinyin_analyzer":{"tokenizer":"my_pinyin","filter":["standard"]}},
"tokenizer":{"my_pinyin":{"type":"pinyin","first_letter":"none","padding_char":""}}
}
}
}'
2.配置mapping
curl -XPOST localhost:9200/searchshowindex_v1/searchshowtype/_mapping -d'
{
"searchshowtype":{
"properties":{
title:{
“type”:"string",
"store":"no",
"term_vector":"with_positions_offsets",
"analyzer":"pinyin_analyzer",
"boost":5}
}
}
}'
3.创建_river
curl -XPUT localhost:9200/_river/ssmongo1/_meta -d'{
"type":"mongodb",
"mongodb":{
"host":"192.168.0.10", "port":22222,
"db":"verticalsearch",
"collection":"searchshow"
},
"index":{"name":"searchshowindex_v1","type":"searchshowtype"}
}'
4.查询
http://localhost:9200/searchshowindex_v1/searchshowtype/_search?q=title:dianshiji
结果为：
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
5.analyzer测试
http://localhost:9200/searchshowindex_v1/_analyze?text=电视机&analyzer=pinyin_analyzer
结果为：
{"tokens":[{"token":"dianshiji","start_offset":0,"end_offset":3,"type":"word","position":1}]}

以上就是针对某个字段配置es并测试的结果，并没有出现想要的结果，所以还是不明白怎么回事。

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

用你给的语句查询了一下，有数据，如下截取其中一部分：
{"took":48,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":92,"max_score":1.0,"hits":[{"_index":"searchshowindex_v1","_type":"searchshowtype","_id":"53857e17f2136f31f083c5cd","_score":1.0, "_source" : {"title_suggest":"电视--家电--人民网","_id":"53857e17f2136f31f083c5cd","title":"电视--家电--人民网","url":"http://homea.people.com.cn/GB/41406/index.html","info":" 选电视上人民网家电频道。权威实力源自人民!"}},{"_index":"searchshowindex_v1","_type":"searchshowtype","_id":"53857e17f2136f31f083c5da","_score":1.0, "_source" : {"title_suggest":"7788电视网___**最大的电视机拍卖、交易网站","_id":"53857e17f2136f31f083c5da","title":"7788电视网___**最大的电视机拍卖、交易网站","url":"http://www.7788ds.com/","info":" 7788 电视网是黑白电视、显像管电视机、旧电视机等的收藏、投资、交易平台。"}},{"_index":"searchshowindex_v1","_type":"searchshowtype","_id":"53857e17f2136f31f083c601","_score":1.0, "_source" : {"title_suggest":"康佳集团 - 精致产品,美妙生活","_id":"53857e17f2136f31f083c601","title":"康佳集团 - 精致产品,美妙生活","url":"http://www.konka.com/","info":" 电视白色家电手机机顶盒生活电器厨卫电器视讯房地产商用电视商用视讯商用机顶盒 www.konka.com/ 2014-05-17 - 百度快照"}},{"_index":"searchshowindex_v1","_type":"searchshowtype","_id":"53857e17f2136f31f083c5bb","_score":1.0, "_source" : {"title_suggest":"【液晶电视】液晶电视报价及图片大全-ZOL中关村在线","_id":"53857e17f2136f31f083c5bb","title":"【液晶电视】液晶电视报价及图片大全-ZOL中关村在线","url":"http://detail.zol.com.cn/digital_tv/","info":" ZOL中关村在线提供液晶电视最新价格及经销商报价,包括液晶电视大全,液晶电视参数,液晶电视评测,液晶电视图片,液晶电视论坛等详细内容,为您购买液晶电视提供最全面参考 detail.zol.com.c"}},{"_index":"searchshowindex_v1","_type":"searchshowtype","_id":"53857e17f2136f31f083c5c2","_score":1.0, "_source" : {"title_suggest":"【液晶电视频道】液晶电视排行榜|行情|评测-万维家电网","_id":"53857e17f2136f31f083c5c2","title":"【液晶电视频道】液晶电视排行榜|行情|评测-万维家电网","url":"http://tv.ea3w.com/","info":" 作为国内最专业的液晶电视,等离子电视频道,本频道提供液晶电视,等离子电视报价,行情、评测、导购、调研等相关资讯 tv.ea3w.com/ 2014-05-17 - 百度快照"}},

from elasticsearch-analysis-pinyin.

liukaitj commented on June 10, 2024

这个问题有解决么？我现在遇到了同样的问题。。

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

拼音插件只做一件事情,就是把"中文"=>"zhongwen",所以默认是完整的pinyin,是一个整体,如果你需要模糊匹配,那就进一步分词处理,配置一个filter,ngramFilter可以对"zhongwen"进一步切分,比如成:"zh""ho""on"等,这样你就可以模糊搜索了

from elasticsearch-analysis-pinyin.

liukaitj commented on June 10, 2024

但是ngramFilter又会将所有包含的"zh"、"on"的文档匹配出来，这显然不是通常想要的结果。有没有一个介于两者之间，比如切分成"zhong"、"wen"的filter？

from elasticsearch-analysis-pinyin.

liukaitj commented on June 10, 2024

我又查了下，貌似是my_pinyin这个tokenizer有点问题，比如“全国首发”这四个字，在ES内部被tokenize成了"quan guo shou fa"一个整体，而不是"quan"、"guo"、"shou"、"fa"这四个token，我看my_pinyin下的padding_char参数设置的是一个空格啊，怎么tokenize不起作用呢？好奇怪。。

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

你用whitespace或者standard filter，拼音的padding char设置空格，按空格切

send via my Phone.

在 2015年8月4日，下午7:21，liukai [email protected] 写道：

但是ngramFilter又会将所有包含的"zh"、"on"的文档匹配出来，这显然不是通常想要的结果。有没有一个介于两者之间，比如切分成"zhong"、"wen"的filter？

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

padding只是在不同拼音间加隔断，未切哦

send via my Phone.

在 2015年8月4日，下午7:41，liukai [email protected] 写道：

我又查了下，貌似是my_pinyin这个tokenizer有点问题，比如“全国首发”这四个字，在ES内部被tokenize成了"quan guo shou fa"一个整体，而不是"quan"、"guo"、"shou"、"fa"这四个token，我看my_pinyin下的padding_char参数设置的是一个空格啊，怎么tokenize不起作用呢？好奇怪。。

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

ganxiaomao commented on June 10, 2024

非常感谢你的回复，我来试试

在 2015-08-04 19:53:11，"Medcl" [email protected] 写道：
你用whitespace或者standard filter，拼音的padding char设置空格，按空格切

send via my Phone.

在 2015年8月4日，下午7:21，liukai [email protected] 写道：

但是ngramFilter又会将所有包含的"zh"、"on"的文档匹配出来，这显然不是通常想要的结果。有没有一个介于两者之间，比如切分成"zhong"、"wen"的filter？

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

liukaitj commented on June 10, 2024

不用ngram了，匹配得太不准确了。扩展了一个基于IK分词的拼音插件，放在这了：https://github.com/liukaitj/elasticsearch-analysis-ik-pinyin ，可以根据IK分词出来的短语进行拼音匹配，避免了过度匹配问题。

from elasticsearch-analysis-pinyin.

medcl commented on June 10, 2024

其实你使用一个ik的tokenizer，再加一个拼音的filter就行了

send via my Phone.

在 2015年8月6日，下午7:41，liukai [email protected] 写道：

不用ngram了，匹配得太不准确了。扩展了一个基于IK分词的拼音插件，放在这了：https://github.com/liukaitj/elasticsearch-analysis-ik-pinyin ，可以根据IK分词出来的短语进行拼音匹配，避免了过度匹配问题。

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

为什么我建索引的时候必须配置filter“nGram”才能在搜索的时候输入拼音得到数据？ about elasticsearch-analysis-pinyin HOT 18 CLOSED

Comments (18)

查询需要指明字段才行如 title:dianshiji

Medcl'

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs