比如说我想搜索重庆酸菜鱼(长城店) 在拼音搜索中，通常是用中文分词器，然后用pinyin过滤器。用ik_s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

pinyin这块要加修改下，增加些功能才能完全实现第一个的需求，目前可以的例子有一个： <div class="snippet-clipboard-content

关于拼音搜索的一个建议！ about elasticsearch-analysis-pinyin HOT 28 CLOSED

medcl commented on June 2, 2024

关于拼音搜索的一个建议！

from elasticsearch-analysis-pinyin.

Comments (28)

dsonet commented on June 2, 2024

@medcl ，如何结合中文分词来实现这个拼音的提示？谢谢

from elasticsearch-analysis-pinyin.

domyway commented on June 2, 2024

@medcl 同问，我也想知道这个如何配置

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

for es2.0这几天会更新,随便我写一个例子,敬请留意.

from elasticsearch-analysis-pinyin.

domyway commented on June 2, 2024

@medcl HI，一直关注着你的新例子啥时候更新呢，不知道什么时候可以看到拼音提示的这个demo

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

抱歉这阵子太忙了，抽空一定更新的

send via my Phone.

在 2015年11月21日，下午8:42，domyway [email protected] 写道：

@medcl HI，一直关注着你的新例子啥时候更新呢，不知道什么时候可以看到拼音提示的这个demo

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

domyway commented on June 2, 2024

@medcl 好的，静候佳音：）

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

pinyin这块要加修改下，增加些功能才能完全实现第一个的需求，目前可以的例子有一个：

curl -XPUT http://localhost:9200/medcl/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "custom_pinyin_analyzer" : {
                    "tokenizer" : "ik_smart",
                    "filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
                }
            },
            "filter" :{
                "full_pinyin_no_space" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : ""
                },
                "full_pinyin_with_space" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : " "

            },
                "first_letter_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "only",
                    "padding_char" : ""

            }
        }
        }
    }
}'

from elasticsearch-analysis-pinyin.

i6448038 commented on June 2, 2024

用了你给的例子，没反应啊

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

错误贴一下

send via my Phone.

在 2015年12月31日，下午4:04，i6448038 [email protected] 写道：

用了你给的例子，没反应啊

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

i6448038 commented on June 2, 2024

感谢你的来信，我的环境是ElasticSearch-rtf，我的index设置是这样的(PHP 代码)：
......
"analysis" => [
"filter" => [
"name_ngrams" => [
"side" => "front",
"max_gram" =>40,
"min_gram" => 1,
"type" => "edgeNGram"
],
"full_pinyin_no_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => ""
],
"full_pinyin_with_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => " "
],
"first_letter_pinyin" => [
"type" => "pinyin",
"first_letter" => "only",
"padding_char" => ""
]
],
"analyzer" => [
"custom_pinyin_analyzer" => [
"type" => "custom",
"tokenizer" => "ik_smart",
"filter" => ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
],
"default" => [
"type" => "custom",
"tokenizer" => "smartcn_sentence",
"filter" => ["asciifolding", "smartcn_word", "snowball", "shingle"],
"char_filter" => ["ph" => "f", "qu" => "k"]
],
"keyword2" => [
"type" => "custom",
"tokenizer"=> "keyword",
"filter" => ["lowercase"]
],
"prefix" => [
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["lowercase", "name_ngrams"]
]
]
],
'mappings' => [
self::TYPE => [
'properties' => [
......
'name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
.....
'hospital_name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'disease' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"disease" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'hospital_alias' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_alias" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]

然后我的查询指令是这样的：
$param = [
'index' => ElasticSearch::INDEX,
'type' => ElasticSearch::TYPE,
'body' => [
'size' =>9000,
'from' =>0,
'query' => [
"multi_match" => [
"query" => 'xh',
"type" => "best_fields",
"fields" => [ "hospital_name", "hospital_alias", "disease", "name"],
"tie_breaker" => 0.3,
"minimum_should_match" => "10%"
]
]
]
];

    $response =$this->client->search($param);

结果却是通过拼音找不到东西：
返回结果：

[hits] => Array
    (
        [total] => 0
        [max_score] =>
        [hits] => Array
            (
            )

hits等于零

from elasticsearch-analysis-pinyin.

bro0k commented on June 2, 2024

多谢作者，关注一下这个。想结合使用。

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

查询里面没有指明拼音字段，如disease，fields需要明确加上：disease. name_pinyin而不是仅disease

from elasticsearch-analysis-pinyin.

i6448038 commented on June 2, 2024

from elasticsearch-analysis-pinyin.

bsll commented on June 2, 2024

用nGram得到的结果准确性好像差很多。。但是不用的话，又只能搜全部才能出结果。。medcl大神考虑过对拼音本身的分词嘛，就是通过配置分析器，可以将”liudehua“切分成”liu de hua",这样，应该能解决拼音的各种问题。还是我理解不够，不会配置这种分词器呢？@medcl

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

@bsll 你自定义的拼音分词时，有一个参数padding_char就是用来分割拼音的，你设置成空格就有了

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

@i6448038 能通过_mapping接口获取一下实际的mapping，看看是不是你的mapping生效了，这个是常见的问题

from elasticsearch-analysis-pinyin.

i6448038 commented on June 2, 2024

@medcl 谢谢大神，已经成功了，非常好用！

from elasticsearch-analysis-pinyin.

Morriaty-The-Murderer commented on June 2, 2024

@medcl
你好，我按照你上面"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]的写法，但用首字母搜索搜不到

analyzer中的tokenizer我用了ik_max_word

然后mapping

PUT my/folks/_mapping
{
  "folks": {
    "properties": {
      "name": {
        "type": "string",  
        "term_vector" : "with_positions_offsets",  
        "analyzer": "ik",  
        "search_analyzer": "ik",  
        "fields": {  
          "pinyin": {  
            "type": "string",  
            "analyzer": "custom_pinyin_analyzer",  
            "search_analyzer": "custom_pinyin_analyzer"  
          }  
        }  
      }
    }
  }
}

PUT my/folks/david
{
  "name": "郭富城"
}

然后搜索

GET my/folks/_search
{
  "query": {
    "match": {
        "name.pinyin": "fucheng"
    }
  }
}

只有guo fu cheng、guofucheng这种搜索的到
fucheng、guofu、gfc都搜不到

我现在很奇怪自定义analyzer时的三个filter是顺序执行的还是并行执行的？去翻了官方文档，没找到具体说明的。

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

@Morriaty-The-Murderer 顺序执行的

from elasticsearch-analysis-pinyin.

Morriaty-The-Murderer commented on June 2, 2024

@medcl 那请问我上面是哪里有问题呢？怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢？

我现在能想到的就是多做几个multi_fields，分别做no_space、with_space、first_letter

from elasticsearch-analysis-pinyin.

commented on June 2, 2024

@Morriaty-The-Murderer：我也正在做这个拼音分词，搞了好久没搞出来，你是怎么实现的呢，是多做几个multi_fields吗

from elasticsearch-analysis-pinyin.

commented on June 2, 2024

这个例子配置了三个filter，使用http://localhost:9200/medcl/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=custom_pinyin_analyzer查询时，分词都不生效，难道是我的版本的问题(1.7.1)
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"custom_pinyin_analyzer" : {
"tokenizer" : "ik_smart",
"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
}
},
"filter" :{
"full_pinyin_no_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : ""
},
"full_pinyin_with_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : " "

        },
            "first_letter_pinyin" : {
                "type" : "pinyin",
                "first_letter" : "only",
                "padding_char" : ""

        }
    }
    }
}

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

欢迎试试最新版本，有问题可重开本帖

from elasticsearch-analysis-pinyin.

levylll commented on June 2, 2024

@medcl medcl老师，能不能指点下上面的问题要怎么做呢？
怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢？
更甚者，我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

拼音就是按拼音来匹配的，所以只有拼音一样，搜出来自然正常，一般需要结合另外一个中文字段组合，两个字段权重不一样，优先中文
levylll [email protected]于2016年10月9日周日下午1:05写道：

@medcl https://github.com/medcl medcl老师，能不能指点下上面的问题要怎么做呢？

怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢？

更甚者，我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD75z_eqNoqbPfd5Sb7jl8DKbZ_TOwOks5qyHYTgaJpZM4F_nIh
.

from elasticsearch-analysis-pinyin.

levylll commented on June 2, 2024

@medcl 是的，你的意思我明白，目前我这边做的也是同时有拼音和中文的字段，理论上这种错别字的处理应该是先ik分词再pinyin分词的吧，但是总感觉结果不是太理想。。。。

from elasticsearch-analysis-pinyin.

JsonShare commented on June 2, 2024

@medcl 我想问一下， ik+pinyin 分词，怎么能优先匹配中文词组，在匹配拼音词组？

from elasticsearch-analysis-pinyin.

medcl commented on June 2, 2024

@levylll @JsonShare
一个字段不现实，多个字段吧，中文和拼音各一个字段，使用 multi-filed

from elasticsearch-analysis-pinyin.

关于拼音搜索的一个建议！ about elasticsearch-analysis-pinyin HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs