GithubHelp home page GithubHelp logo

Comments (28)

dsonet avatar dsonet commented on June 2, 2024

@medcl ,如何结合中文分词来实现这个拼音的提示?谢谢

from elasticsearch-analysis-pinyin.

domyway avatar domyway commented on June 2, 2024

@medcl 同问,我也想知道这个如何配置

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

for es2.0这几天会更新,随便我写一个例子,敬请留意.

from elasticsearch-analysis-pinyin.

domyway avatar domyway commented on June 2, 2024

@medcl HI,一直关注着你的新例子啥时候更新呢,不知道什么时候可以看到拼音提示的这个demo

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

抱歉 这阵子太忙了,抽空一定更新的

send via my Phone.

在 2015年11月21日,下午8:42,domyway [email protected] 写道:

@medcl HI,一直关注着你的新例子啥时候更新呢,不知道什么时候可以看到拼音提示的这个demo


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

domyway avatar domyway commented on June 2, 2024

@medcl 好的,静候佳音:)

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

pinyin这块要加修改下,增加些功能才能完全实现第一个的需求,目前可以的例子有一个:

curl -XPUT http://localhost:9200/medcl/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "custom_pinyin_analyzer" : {
                    "tokenizer" : "ik_smart",
                    "filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
                }
            },
            "filter" :{
                "full_pinyin_no_space" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : ""
                },
                "full_pinyin_with_space" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : " "

            },
                "first_letter_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "only",
                    "padding_char" : ""

            }
        }
        }
    }
}'

from elasticsearch-analysis-pinyin.

i6448038 avatar i6448038 commented on June 2, 2024

用了你给的例子,没反应啊

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

错误贴一下

send via my Phone.

在 2015年12月31日,下午4:04,i6448038 [email protected] 写道:

用了你给的例子,没反应啊


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

i6448038 avatar i6448038 commented on June 2, 2024

感谢你的来信,我的环境是ElasticSearch-rtf,我的index设置是这样的(PHP 代码):
......
"analysis" => [
"filter" => [
"name_ngrams" => [
"side" => "front",
"max_gram" =>40,
"min_gram" => 1,
"type" => "edgeNGram"
],
"full_pinyin_no_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => ""
],
"full_pinyin_with_space" => [
"type" => "pinyin",
"first_letter" => "none",
"padding_char" => " "
],
"first_letter_pinyin" => [
"type" => "pinyin",
"first_letter" => "only",
"padding_char" => ""
]
],
"analyzer" => [
"custom_pinyin_analyzer" => [
"type" => "custom",
"tokenizer" => "ik_smart",
"filter" => ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
],
"default" => [
"type" => "custom",
"tokenizer" => "smartcn_sentence",
"filter" => ["asciifolding", "smartcn_word", "snowball", "shingle"],
"char_filter" => ["ph" => "f", "qu" => "k"]
],
"keyword2" => [
"type" => "custom",
"tokenizer"=> "keyword",
"filter" => ["lowercase"]
],
"prefix" => [
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["lowercase", "name_ngrams"]
]
]
],
'mappings' => [
self::TYPE => [
'properties' => [
......
'name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
.....
'hospital_name' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_name" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'disease' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"disease" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]
]
],
'hospital_alias' => [
'type' => 'multi_field',
'path' => 'just_name',
"fields" => [
"hospital_alias" => ["type" => "string", "index" => "analyzed", "boost" =>"5.0"],
"name_pinyin" => ["type" => "string", "index" => "analyzed", "analyzer" => "custom_pinyin_analyzer", "boost" => "10"],
"name_untouched" => ["type" => "string", "index" => "analyzed", "analyzer" => "keyword2", "boost" => "9.0"],
"name_prefix" => ["type" => "string", "index" => "analyzed", "search_analyzer" => "keyword2", "index_analyzer" =>"prefix", "boost" => "4.0"]

然后我的查询指令是这样的:
$param = [
'index' => ElasticSearch::INDEX,
'type' => ElasticSearch::TYPE,
'body' => [
'size' =>9000,
'from' =>0,
'query' => [
"multi_match" => [
"query" => 'xh',
"type" => "best_fields",
"fields" => [ "hospital_name", "hospital_alias", "disease", "name"],
"tie_breaker" => 0.3,
"minimum_should_match" => "10%"
]
]
]
];

    $response =$this->client->search($param);

结果却是通过拼音找不到东西:
返回结果:

[hits] => Array
    (
        [total] => 0
        [max_score] =>
        [hits] => Array
            (
            )

hits等于零

from elasticsearch-analysis-pinyin.

bro0k avatar bro0k commented on June 2, 2024

多谢作者,关注一下这个。想结合使用。

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

查询里面没有指明拼音字段,如disease,fields需要明确加上:disease. name_pinyin而不是仅disease

from elasticsearch-analysis-pinyin.

i6448038 avatar i6448038 commented on June 2, 2024

image
image
image
image

from elasticsearch-analysis-pinyin.

bsll avatar bsll commented on June 2, 2024

用nGram得到的结果准确性好像差很多。。但是不用的话,又只能搜全部才能出结果。。medcl大神考虑过对拼音本身的分词嘛,就是通过配置分析器,可以将”liudehua“切分成”liu de hua",这样,应该能解决拼音的各种问题。还是我理解不够,不会配置这种分词器呢?@medcl

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

@bsll 你自定义的拼音分词时,有一个参数padding_char就是用来分割拼音的,你设置成空格就有了

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

@i6448038 能通过_mapping接口获取一下实际的mapping,看看是不是你的mapping生效了,这个是常见的问题

from elasticsearch-analysis-pinyin.

i6448038 avatar i6448038 commented on June 2, 2024

@medcl 谢谢大神,已经成功了,非常好用!

from elasticsearch-analysis-pinyin.

Morriaty-The-Murderer avatar Morriaty-The-Murderer commented on June 2, 2024

@medcl
你好,我按照你上面"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]的写法,但用首字母搜索搜不到

analyzer中的tokenizer我用了ik_max_word

然后mapping

PUT my/folks/_mapping
{
  "folks": {
    "properties": {
      "name": {
        "type": "string",  
        "term_vector" : "with_positions_offsets",  
        "analyzer": "ik",  
        "search_analyzer": "ik",  
        "fields": {  
          "pinyin": {  
            "type": "string",  
            "analyzer": "custom_pinyin_analyzer",  
            "search_analyzer": "custom_pinyin_analyzer"  
          }  
        }  
      }
    }
  }
}

PUT my/folks/david
{
  "name": "郭富城"
}

然后搜索

GET my/folks/_search
{
  "query": {
    "match": {
        "name.pinyin": "fucheng"
    }
  }
}

只有guo fu chengguofucheng这种搜索的到
fuchengguofugfc都搜不到

我现在很奇怪自定义analyzer时的三个filter是顺序执行的还是并行执行的?去翻了官方文档,没找到具体说明的。

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

@Morriaty-The-Murderer 顺序执行的

from elasticsearch-analysis-pinyin.

Morriaty-The-Murderer avatar Morriaty-The-Murderer commented on June 2, 2024

@medcl 那请问我上面是哪里有问题呢?怎么才能使实现guofuchengfuchengguo fu chenggfc都能搜索到呢?

我现在能想到的就是多做几个multi_fields,分别做no_space、with_space、first_letter

from elasticsearch-analysis-pinyin.

 avatar commented on June 2, 2024

@Morriaty-The-Murderer:我也正在做这个拼音分词,搞了好久没搞出来,你是怎么实现的呢,是多做几个multi_fields吗

from elasticsearch-analysis-pinyin.

 avatar commented on June 2, 2024

这个例子配置了三个filter,使用http://localhost:9200/medcl/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=custom_pinyin_analyzer查询时,分词都不生效,难道是我的版本的问题(1.7.1)
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"custom_pinyin_analyzer" : {
"tokenizer" : "ik_smart",
"filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"]
}
},
"filter" :{
"full_pinyin_no_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : ""
},
"full_pinyin_with_space" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : " "

        },
            "first_letter_pinyin" : {
                "type" : "pinyin",
                "first_letter" : "only",
                "padding_char" : ""

        }
    }
    }
}

}'

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

欢迎试试最新版本,有问题可重开本帖

from elasticsearch-analysis-pinyin.

levylll avatar levylll commented on June 2, 2024

@medcl medcl老师,能不能指点下上面的问题要怎么做呢?
怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢?
更甚者,我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

拼音就是按拼音来匹配的,所以只有拼音一样,搜出来自然正常,一般需要结合另外一个中文字段组合,两个字段权重不一样,优先中文
levylll [email protected]于2016年10月9日 周日下午1:05写道:

@medcl https://github.com/medcl medcl老师,能不能指点下上面的问题要怎么做呢?

怎么才能使实现guofucheng、fucheng、guo fu cheng、gfc都能搜索到呢?

更甚者,我输入 “锅富城” 也能搜索到 “郭富城”
我目前的版本是2.1.1
多谢了。


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD75z_eqNoqbPfd5Sb7jl8DKbZ_TOwOks5qyHYTgaJpZM4F_nIh
.

from elasticsearch-analysis-pinyin.

levylll avatar levylll commented on June 2, 2024

@medcl 是的,你的意思我明白,目前我这边做的也是同时有拼音和中文的字段,理论上这种错别字的处理应该是先ik分词再pinyin分词的吧,但是总感觉结果不是太理想。。。。

from elasticsearch-analysis-pinyin.

JsonShare avatar JsonShare commented on June 2, 2024

@medcl 我想问一下 , ik+pinyin 分词,怎么能优先匹配中文词组,在匹配拼音词组?

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

@levylll @JsonShare
一个字段不现实,多个字段吧,中文和拼音各一个字段,使用 multi-filed

from elasticsearch-analysis-pinyin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.