GithubHelp home page GithubHelp logo

medcl / elasticsearch-analysis-pinyin Goto Github PK

View Code? Open in Web Editor NEW
2.9K 113.0 539.0 32.51 MB

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.

License: Apache License 2.0

Java 100.00%
pinyin pinyin-analysis conversion elasticsearch analyzer easysearch opensearch

elasticsearch-analysis-pinyin's Introduction

Pinyin Analysis for Elasticsearch and OpenSearch

This Pinyin Analysis plugin facilitates the conversion between Chinese characters and Pinyin. It supports major versions of Elasticsearch and OpenSearch. Maintained and supported with ❤️ by INFINI Labs.

The plugin comprises an analyzer named pinyin, a tokenizer named pinyin, and a token filter named pinyin.

Optional Parameters

  • keep_first_letter: When enabled, retains only the first letter of each Chinese character. For example, 刘德华 becomes ldh. Default: true.

  • keep_separate_first_letter: When enabled, keeps the first letters of each Chinese character separately. For example, 刘德华 becomes l,d,h. Default: false. Note: This may increase query fuzziness due to term frequency.

  • limit_first_letter_length: Sets the maximum length of the first letter result. Default: 16.

  • keep_full_pinyin: When enabled, preserves the full Pinyin of each Chinese character. For example, 刘德华 becomes [liu,de,hua]. Default: true.

  • keep_joined_full_pinyin: When enabled, joins the full Pinyin of each Chinese character. For example, 刘德华 becomes [liudehua]. Default: false.

  • keep_none_chinese: Keeps non-Chinese letters or numbers in the result. Default: true.

  • keep_none_chinese_together: Keeps non-Chinese letters together. Default: true. For example, DJ音乐家 becomes DJ,yin,yue,jia. When set to false, DJ音乐家 becomes D,J,yin,yue,jia. Note: keep_none_chinese should be enabled first.

  • keep_none_chinese_in_first_letter: Keeps non-Chinese letters in the first letter. For example, 刘德华AT2016 becomes ldhat2016. Default: true.

  • keep_none_chinese_in_joined_full_pinyin: Keeps non-Chinese letters in joined full Pinyin. For example, 刘德华2016 becomes liudehua2016. Default: false.

  • none_chinese_pinyin_tokenize: Breaks non-Chinese letters into separate Pinyin terms if they are Pinyin. Default: true. For example, liudehuaalibaba13zhuanghan becomes liu,de,hua,a,li,ba,ba,13,zhuang,han. Note: keep_none_chinese and keep_none_chinese_together should be enabled first.

  • keep_original: When enabled, keeps the original input as well. Default: false.

  • lowercase: Lowercases non-Chinese letters. Default: true.

  • trim_whitespace: Default: true.

  • remove_duplicated_term: When enabled, removes duplicated terms to save index space. For example, de的 becomes de. Default: false. Note: Position-related queries may be influenced.

  • ignore_pinyin_offset: After version 6.0, offsets are strictly constrained, and overlapped tokens are not allowed. With this parameter, overlapped tokens will be allowed by ignoring the offset. Please note, all position-related queries or highlights will become incorrect. You should use multi-fields and specify different settings for different query purposes. If you need offsets, please set it to false. Default: true.

How to Install

You can download the packaged plugins from here: https://release.infinilabs.com/,

or you can use the plugin cli to install the plugin like this:

For Elasticsearch

bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-pinyin/8.4.1

For OpenSearch

bin/opensearch-plugin install https://get.infini.cloud/opensearch/analysis-pinyin/2.12.0

Tips: replace your own version number related to your elasticsearch or opensearch.

Getting Started

1.Create a index with custom pinyin analyzer

PUT /medcl/ 
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin"
                    }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "keep_separate_first_letter" : false,
                    "keep_full_pinyin" : true,
                    "keep_original" : true,
                    "limit_first_letter_length" : 16,
                    "lowercase" : true,
                    "remove_duplicated_term" : true
                }
            }
        }
    }
}

2.Test Analyzer, analyzing a chinese name, such as 刘德华

GET /medcl/_analyze
{
  "text": ["刘德华"],
  "analyzer": "pinyin_analyzer"
}
{
  "tokens" : [
    {
      "token" : "liu",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "de",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "hua",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "刘德华",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ldh",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 4
    }
  ]
}

3.Create mapping

POST /medcl/_mapping 
{
        "properties": {
            "name": {
                "type": "keyword",
                "fields": {
                    "pinyin": {
                        "type": "text",
                        "store": false,
                        "term_vector": "with_offsets",
                        "analyzer": "pinyin_analyzer",
                        "boost": 10
                    }
                }
            }
        }
    
}

4.Indexing

POST /medcl/_create/andy
{"name":"刘德华"}

5.Let's search

curl http://localhost:9200/medcl/_search?q=name:%E5%88%98%E5%BE%B7%E5%8D%8E
curl http://localhost:9200/medcl/_search?q=name.pinyin:%e5%88%98%e5%be%b7
curl http://localhost:9200/medcl/_search?q=name.pinyin:liu
curl http://localhost:9200/medcl/_search?q=name.pinyin:ldh
curl http://localhost:9200/medcl/_search?q=name.pinyin:de+hua

6.Using Pinyin-TokenFilter

PUT /medcl1/ 
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "user_name_analyzer" : {
                    "tokenizer" : "whitespace",
                    "filter" : "pinyin_first_letter_and_full_pinyin_filter"
                }
            },
            "filter" : {
                "pinyin_first_letter_and_full_pinyin_filter" : {
                    "type" : "pinyin",
                    "keep_first_letter" : true,
                    "keep_full_pinyin" : false,
                    "keep_none_chinese" : true,
                    "keep_original" : false,
                    "limit_first_letter_length" : 16,
                    "lowercase" : true,
                    "trim_whitespace" : true,
                    "keep_none_chinese_in_first_letter" : true
                }
            }
        }
    }
}

Token Test:刘德华 张学友 郭富城 黎明 四大天王

GET /medcl1/_analyze
{
  "text": ["刘德华 张学友 郭富城 黎明 四大天王"],
  "analyzer": "user_name_analyzer"
}
{
  "tokens" : [
    {
      "token" : "ldh",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "zxy",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "gfc",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "lm",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "sdtw",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 4
    }
  ]
}

7.Used in phrase query

  • option 1
PUT /medcl2/
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin"
                    }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "keep_first_letter":false,
                    "keep_separate_first_letter" : false,
                    "keep_full_pinyin" : true,
                    "keep_original" : false,
                    "limit_first_letter_length" : 16,
                    "lowercase" : true
                }
            }
        }
    }
}
GET /medcl2/_search
{
  "query": {"match_phrase": {
    "name.pinyin": "刘德华"
  }}
}

  • option 2
 
PUT /medcl3/
{
   "settings" : {
       "analysis" : {
           "analyzer" : {
               "pinyin_analyzer" : {
                   "tokenizer" : "my_pinyin"
                   }
           },
           "tokenizer" : {
               "my_pinyin" : {
                   "type" : "pinyin",
                   "keep_first_letter":true,
                   "keep_separate_first_letter" : true,
                   "keep_full_pinyin" : true,
                   "keep_original" : false,
                   "limit_first_letter_length" : 16,
                   "lowercase" : true
               }
           }
       }
   }
}
   
POST /medcl3/_mapping 
{
  "properties": {
      "name": {
          "type": "keyword",
          "fields": {
              "pinyin": {
                  "type": "text",
                  "store": false,
                  "term_vector": "with_offsets",
                  "analyzer": "pinyin_analyzer",
                  "boost": 10
              }
          }
      }
  }
}
  
   
GET /medcl3/_analyze
{
   "text": ["刘德华"],
   "analyzer": "pinyin_analyzer"
}
 
POST /medcl3/_create/andy
{"name":"刘德华"}

GET /medcl3/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "刘德h"
 }}
}

GET /medcl3/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "刘dh"
 }}
}

GET /medcl3/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "liudh"
 }}
}

GET /medcl3/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "liudeh"
 }}
}

GET /medcl3/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "liude华"
 }}
}

8.That's all, have fun.

Community

Fell free to join the Discord server to discuss anything around this project:

https://discord.gg/4tKTMkkvVX

License

Copyright ©️ INFINI Labs.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elasticsearch-analysis-pinyin's People

Contributors

abbi031892 avatar abia321 avatar adrianocrestani avatar ariesy avatar blueshen avatar crzidea avatar doabit avatar icode avatar lrsec avatar medcl avatar sweetstreet avatar yatming-mo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-pinyin's Issues

关于pinyin分词正向搜索2

大神,我又来了
刚才正向确实可以,但有个比较实际的问题

http://localhost:9200/tag/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=pinyin_analyzer

后结果是

{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}

用prefix搜索 ldh 确实从正向搜出结果的,但是如果搜 liu 这样就没有结果了,这个该怎么办呢?

现在就是想在正向搜索的基础上可以实现 开头字母和pinyin都可以搜出结果。

关于pinyin分词正向搜索

hi,medcl大神,我又来这边看pinyin了,请教您个问题,如果我想要通过pinyin进行正向搜索,应该怎么处理呢?比如,我想搜索 “我们来自**” “**是个大国”,我输入‘zg’ 就会把前面的搜出来,您明白这个意思了吗?就是说我想把正向的内容放在前面,或者是像sql中like zg% 这样的。


另外,我分词设置成如下


############################ Pinyin

index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard,nGram]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prifix"
padding_char: " "


映射设置成如下:


POST /tag/keywords/_mapping
{
"keywords": {
"properties": {
"kwname": {
"type": "multi_field",
"fields": {
"kwname": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
},
"primitive": {
"type": "string",
"store": "yes",
"analyzer": "keyword"
}
}
}
}
}
}


http://localhost:9200/tag/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=pinyin_analyzer

结果是


{"tokens":[{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ld","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"dh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hl","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"li","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"i","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"iu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ud","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"e","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"eh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"a","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这个有点太细了,怎么让他粗一些
比如就要
{"token":"liu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},

应该怎么设置?

谢谢大神~~~~~

为什么我建索引的时候必须配置filter“nGram”才能在搜索的时候输入拼音得到数据?

我碰到的问题就是如标题所述。我的想法是只按照完整的拼音来建索引,所以按照文档里的第一种方式来配置pinyin analyzer:
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin",
"filter" : ["standard"]
}
},
"tokenizer" : {
"my_pinyin" : {
"type" : "pinyin",
"first_letter" : "none",
"padding_char" : " "
}
}
}
}
}'
mapping的配置如下:
curl -XPOST localhost:9200/searchshowindex_v3/searchshowtype/_mapping -d'
{
"searchshowtype":{
"_all":{"analyzer":"pinyin_analyzer","term_vector":"no","store":false}
}}'
但是对数据索引之后,在地址栏中随便输入无论什么样的拼音,即便是一个字母的也得不到任何结果,必须在filter里面配置nGram才有结果,但这又不是我要的,配了nGram无论输入什么拼音基本上所有的数据都能被搜索到,这个体验不好。
所以想问一下medcl为什么会发生这种问题,是我哪里理解错了,没有配好?如果方便的话麻烦能帮我解决这一疑惑。

大神,多音字什么时候支持?

拼音分词现在不支持多音字啊,多音字都是默认取pinyin4j转化出来的第一个拼音,什么时候支持多音字啊?
try {
String[] strs = PinyinHelper.toHanyuPinyinStringArray(c, format);
if (strs != null) {
//get first result by default
String first_value = strs[0];
//TODO more than one pinyin

请问PinyinTokenFilter 中first_letter 设为prefix有什么用处

您好,在拜读代码。
我有一个疑问,在PinyinTokenFilter中如果first_letter设为prefix,那么每个字的拼音前会前缀其拼音的声母。例如“刘”会被拼音转为liu 然后最终产生的termAttribute是“lliu”。这个操作的意义是神马?
初次接触soundex search,请多指教。

遇到一个奇怪的问题

配置:

pinyin_ik_first_letter:
        type: custom
        tokenizer: ik
        filter:
        - pinyin_first_letter
        - lowercase
        - unique

测试:
GET /test/_analyze?text=abc北京***&analyzer=pinyin_ik_first_letter&pretty

{
  "tokens" : [ {
    "token" : "",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "bj",
    "start_offset" : 3,
    "end_offset" : 5,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "j",
    "start_offset" : 4,
    "end_offset" : 5,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "tam",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "ta",
    "start_offset" : 5,
    "end_offset" : 7,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "m",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "CN_CHAR",
    "position" : 6
  } ]
}

abc 不见了,是什么原因呀?

plugin-descriptor.properties

ES 2.0 安装需要 plugin-descriptor.properties 文件 并且需要设置好参数
你的master版本中 plugin-descriptor.properties 放在src里了 而且没有配置好参数
你能说明下如何配置参数嘛

为什么用解析器分析 "宝宝巴士" 和 "baobaobashi" 返回的结果不一样?

{
  "py_v2": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "creation_date": "1466507976098",
        "analysis": {
          "filter": {
            "autocomplete_filter": {
              "type": "edge_ngram",
              "min_gram": "2",
              "max_gram": "20"
            }
          },
          "analyzer": {
            "pinyin_analyzer": {
              "filter": [
                "word_delimiter",
                "autocomplete_filter",
                "length"
              ],
              "tokenizer": "my_pinyin"
            }
          },
          "tokenizer": {
            "my_pinyin": {
              "padding_char": " ",
              "type": "pinyin",
              "first_letter": "none"
            }
          }
        },
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "nIa_gLHvRIe2HZtGC4e8Yg",
        "version": {
          "created": "2020199"
        }
      }
    },
    "warmers": {}
  }
}

Analyzer [pinyin_analyzer] must have a type associated

Caused by: java.lang.IllegalArgumentException: Analyzer [pinyin_analyzer] must have a type associated with it
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:313)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:61)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:233)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:105)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:143)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:159)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)

不支持繁体中文?

http://localhost:9200/users/_analyze?text=%E8%B2%8C%E7%BE%8E%E5%A6%82%E8%AA%AE&analyzer=pinyin_analyzer

返回

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[q_nVrMt][172.17.0.2:9300][indices:admin/analyze[s]]"}],"type":"string_index_out_of_bounds_exception","reason":"String index out of range: 0"},"status":500}

服务器500

版本为最新 elasticsearch 5

如何使用pinyin来“从头开始”进行基于拼音的搜索建议?

首先,感谢medcl大神~

先描述下需求,我们在商城搜索商品,提供建议时使用该插件。

见下图demo
image

  • 现在用的completion suggester实现,出现的问题是当document被delete之后,依然可以被suggester到。搜了一下es官方Issues,貌似还没解决。

elastic/elasticsearch-js#117

  • 后来想到使用context suggester,加一个字段来判断是否document已停用,结果update某个document后,通过context suggester搜到的结果不是update后的。
    • 比如使用state:1/0来判断该document是否失效,update某个doc从0到1后,使用context suggester state=0依然可以得到该结果
  • 所以我准备用search来替换suggester

大体的需求:
1、输入汉字、拼音时,根据商品名称从首字进行匹配,而且是根据声母和全拼都可以。

  • 比如输入"刘得",结果中应该有“刘德华”

2、应该过滤特殊字符,比如()*/等

模仿readme中例子,尝试建了index:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "only_pinyin_analyzer" : {
                    "tokenizer" : "only_pinyin"
                }
                ,"none_pinyin_analyzer" : {
                    "tokenizer" : "none_pinyin"
                }
            }
            ,"tokenizer" : {
                "only_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "only",
                    "padding_char" : ""
                }
                ,"none_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : ""
                }
            }
        }
    }
}

然后进行测试

_analyze?analyzer=only_pinyin_analyzer&pretty&text=(散)奇趣蛋功夫熊猫版20g*3粒/组

结果如下

{
    "tokens": [
        {
            "token": "s",
            "start_offset": 1,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "qqdgfxmb20g",
            "start_offset": 3,
            "end_offset": 14,
            "type": "word",
            "position": 1
        },
        {
            "token": "3l",
            "start_offset": 15,
            "end_offset": 17,
            "type": "word",
            "position": 2
        },
        {
            "token": "z",
            "start_offset": 18,
            "end_offset": 19,
            "type": "word",
            "position": 3
        }
    ]
}

文本被断词,而使用none这个tokenizer时

_analyze?analyzer=none_pinyin_analyzer&pretty&text=(散)奇趣蛋功夫熊猫版20g*3粒/组

结果

{
    "tokens": [
        {
            "token": "sanqiqudangongfuxiongmaoban20g 3li zu",
            "start_offset": 0,
            "end_offset": 19,
            "type": "word",
            "position": 0
        }
    ]
}

并没有被断词。。

我想知道这两个是否可以这么用,或者应该怎么用?

还有一点,就是如何从头开始匹配,比如输入“德华”时只能匹配到“德华”开头的doc,不能匹配到“刘德华”。。。

elasticsearch 1.5.2 使用pinyin 分词

Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find analyzer type [null] or tokenizer for [pinyin_analyzer]
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:383)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:328)
... 8 more
[2015-05-01 23:39:31,625][WARN ][cluster.action.shard ] [Brynocki] [medcl][1] received shard failed for [medcl][1], node[aUJOyVgfTzmo3P-okd9M3A], [P], s[INITIALIZING], indexUUID [jRV45HekR4Og1tlDYEgs1Q], reason [shard failure [failed to create index][IndexCreationException[[medcl] failed to create index]; nested: ElasticsearchIllegalArgumentException[failed to find analyzer type [null] or tokenizer for [pinyin_analyzer]]; ]]

这个警告信息有问题吗?

[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.elasticsearch:elasticsearch-analysis-pinyin:jar:1.7.4
[WARNING] 'dependencies.dependency.systemPath' for net.sourceforge.pinyin4j:pinyin4j:jar should not point at files within the project directory, ${basedir}/lib/pinyin4j-2.5.0.jar will be unresolvable by dependent projects @ line 64, column 25
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]

小建议

curl -XPUT http://localhost:9200/medcl/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : "word_delimiter
]                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : " "
                }
            }
        }
    }
}'

这边中扩号其实应该是引号。首页demo

elasticsearch 0.90.2 升级到0.90.7 default_operator参数不生效

curl -XPUT 'http://localhost:9200/medcl/' -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : ["standard","nGram"]
                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "prefix",
                    "padding_char" : ""
                }
            }
        }
    }
}'

curl -XPOST 'http://localhost:9200/medcl/folks/_mapping' -d'
{
    "folks": {
        "properties": {
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type": "string",
                        "store": "no",
                        "term_vector": "with_positions_offsets",
                        "analyzer": "pinyin_analyzer",
                        "boost": 10
                    },
                    "primitive": {
                        "type": "string",
                        "store": "yes",
                        "analyzer": "keyword"
                    }
                }
            }
        }
    }
}'

curl -XPOST 'http://localhost:9200/medcl/folks/andy' -d'{"name":"刘德华"}'
curl -XPOST 'http://localhost:9200/medcl/folks/xiaom' -d'{"name":"刘小明"}'

curl 'http://localhost:9200/medcl/folks/_search?pretty' -d '{
  "query": {
    "query_string":{
      "query": "name:dehua",
      "default_operator": "AND"
    }
  }
}'

0.90.7 查询结果小明与德华多匹配成功了!
0.90.2 只会把德华匹配出来

es-pinyin-1.7.2在es-v2.3.2下无法启动

没找到安装的说明,只能去网上扒拉下,参照这个http://my.oschina.net/xiaohui249/blog/214505 编译了pinyin的jar包

生成两个jar包(elasticsearch-analysis-pinyin-1.7.2.jar pinyin4j-2.5.0.jar),已放在[ES_HOME]/plugins/pinyin路径下

然后service elasticsearch start
报下边的错误,感觉是版本的问题。

重装了es到2.3.2版本依然不行。。。。只能求教了。。。

正在启动 elasticsearch:Exception in thread "main" java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [pinyin]. Was the plugin built before 2.0?
Likely root cause: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/plugins/pinyin/plugin-descriptor.properties
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.Files.newByteChannel(Files.java:361)
        at java.nio.file.Files.newByteChannel(Files.java:407)
        at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
        at java.nio.file.Files.newInputStream(Files.java:152)
        at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:87)
        at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:378)
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:128)
        at org.elasticsearch.node.Node.<init>(Node.java:158)
        at org.elasticsearch.node.Node.<init>(Node.java:140)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:178)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details.
                                                           [失败]

request to add pinyin filter

hi,medcl,
pinyin analyzer can not meet the requirement for production use as we first need to tokenize our input sentence and then convert each term to pinyin for query. I can help to add a pinyin filter for this module but I do not know how to contribute.

plugin-descriptor.properties中'name'参数可能会导致插件无法使用

今日在测试ES2.1部署新ik插件和pinyin插件,同时发现了此问题,就是在plugin-descriptor.properties文件中,同时用到name=${elasticsearch.plugin.name},在启动ES服务的时候会发现:
[node-1] loaded [license, marvel-agent, ${elasticsearch.plugin.name}], sites [kopf, sense, head]
此时在创建索引时用到分词可能会出现索引创建后产生索引异常,分片无法分配的情况;
测试修改为 name=ik ,name=pinyin后,启动ES服务:
[node-1] loaded [license, marvel-agent,ik,pinyin], sites [kopf, sense, head]
再重新创建索引后,问题消失。

可否支持下es 1.2.x 或者1.3.x?

多谢你的辛苦,你的插件很有用.
但是在es 1.2.2下怎么安装都失败,请问可否帮忙看下?

Message:
   Error while installing plugin, reason: IllegalArgumentException: Plugin installation assumed to be site plugin, but contains source code, aborting installation.

麻烦你了&多谢

中文分词后进行拼音处理无法得到拼音结果

我想先使用elasticsearch-analysis-ansj进行分词,然后使用elasticsearch-analysis-pinyin进行拼音处理,处理命令如下:
curl -XGET 'localhost:9200/_analyze?tokenizer=index_ansj&filter=pinyin&pretty' -d '你好,我是小明的同学小强'
但是无法输出分词后的拼音结果,结果如下:
{
"tokens" : [ {
"token" : "你好",
"start_offset" : 0,
"end_offset" : 2,
"type" : "l",
"position" : 0
}, {
"token" : ",",
"start_offset" : 2,
"end_offset" : 3,
"type" : "w",
"position" : 1
}, {
"token" : "我",
"start_offset" : 3,
"end_offset" : 4,
"type" : "r",
"position" : 2
}, {
"token" : "是",
"start_offset" : 4,
"end_offset" : 5,
"type" : "v",
"position" : 3
}, {
"token" : "小明",
"start_offset" : 5,
"end_offset" : 7,
"type" : "nr",
"position" : 4
}, {
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "uj",
"position" : 5
}, {
"token" : "同学",
"start_offset" : 8,
"end_offset" : 10,
"type" : "n",
"position" : 6
}, {
"token" : "小强",
"start_offset" : 10,
"end_offset" : 12,
"type" : "nr",
"position" : 7
} ]
}

es2.1.1 下,pinyin,ik 共存插件导致 es 启动失败

只有 ik 或者只有 pinyin,es2.1.1 都能正常启动,但是一旦两个都安装,es 就会启动失败

[2015-12-30 03:07:04,397][INFO ][node                     ] [Eleggua] version[2.1.1], pid[1], build[40e2c53/2015-12-15T13:05:55Z]
[2015-12-30 03:07:04,399][INFO ][node                     ] [Eleggua] initializing ...
Exception in thread "main" java.lang.IllegalStateException: failed to load bundle [file:/usr/share/elasticsearch/plugins/pinyin/elasticsearch-analysis-pinyin-1.5.2.jar, file:/usr/share/elasticsearch/plugins/pinyin/pinyin4j-2.5.0.jar, file:/usr/share/elasticsearch/plugins/ik/commons-logging-1.2.jar, file:/usr/share/elasticsearch/plugins/ik/httpclient-4.4.1.jar, file:/usr/share/elasticsearch/plugins/ik/httpcore-4.4.1.jar, file:/usr/share/elasticsearch/plugins/ik/elasticsearch-analysis-ik-1.6.2.jar, file:/usr/share/elasticsearch/plugins/ik/commons-codec-1.9.jar] due to jar hell
Likely root cause: java.lang.IllegalStateException: jar hell!
class: org.elasticsearch.analysis.PinyinIndicesAnalysis$3
jar1: /usr/share/elasticsearch/plugins/pinyin/elasticsearch-analysis-pinyin-1.5.2.jar
jar2: /usr/share/elasticsearch/plugins/ik/elasticsearch-analysis-ik-1.6.2.jar
        at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:280)
        at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:186)
        at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:336)
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:109)
        at org.elasticsearch.node.Node.<init>(Node.java:146)
        at org.elasticsearch.node.Node.<init>(Node.java:128)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:178))
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:285)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details.

README CreateMapping 操作失败: failed to find analyzer type [null] or tokenizer for [pinyin_analyzer]

elasticsearch版本是1.6.0
git clone 工程 后, 执行mvn package,然后通过如下方式安装插件
plugin --install analysis-pinyin --url file:///#{path}/elasticsearch-analysis-pinyin/target/releases/elasticsearch-analysis-pinyin-1.3.0.zip

之后照着README上的步骤操作时, 在Creat Mapping时老是会报错:
{"error":"IndexCreationException[[medcl] failed to create index]; nested: ElasticsearchIllegalArgumentException[failed to find analyzer type [null] or tokenizer for [pinyin_analyzer]]; ","status":400}
服务器端的错误日志如下:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find analyzer type [null] or tokenizer for [pinyin_analyzer]
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:383)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:336)

请问我是遗漏了哪一个关键步骤吗?

搜索中拼音粒度太细

譬如我搜 ceshi(测试等) 他把不属于ceshi的都会搜出来如何改变这个mapping ngram 我用multi_match 添加了operate and这个属性

请问有没有类似pinyin+ik的分词效果呢

大神,pinyin里的"nGram"分词太细了,基本上所有内容都会出来,去掉后只能输入完整的拼音或字符才能出来,请问有没有什么好的解决办法。
这是我按教程的配置:
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin",
"filter" : ["word_delimiter","nGram"]
}
},
"tokenizer" : {
"my_pinyin" : {
"type" : "pinyin",
"first_letter" : "prefix",
"padding_char" : ""
}
}
}
}
}

padding_char是否有bug?

处理首字母为何要添加分隔符?
例如 “刘德华” => “liu de hua l d h” 为何不是“liu de hua ldh”

if (this.padding_char.length() > 0) {
if (stringBuilder.length() > 0) stringBuilder.append(this.padding_char);
if (firstLetters.length() > 0) firstLetters.append(this.padding_char);
}

看了代码,firstLetters是否不应该加padding_char? 或者将stringBuilder 和 firstLetters的分隔符分别设置更合理?

另外是否有考虑对多音字的支持?pinyin4j本身支持多音字...

thx...

为什么返回的结构包含所有的数据

我的例子如下:

curl -XPOST http://localhost:9200/medcl/_close


curl -XPUT http://localhost:9200/medcl/_settings -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : ["standard","nGram"]
                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "prefix",
                    "padding_char" : ""
                }
            }
        }
    }
}'

curl -XPOST http://localhost:9200/medcl/_open

curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
    "folks": {
        "properties": {
            "name": {
                "type": "string",
                "term_vector": "with_positions_offsets",
                "analyzer": "pinyin_analyzer"
            }
        }
    }
}'


curl -XPOST http://localhost:9200/medcl/folks/1 -d'{"name":"你好,我是abc公司"}'

curl -XPOST http://localhost:9200/medcl/folks/2 -d'{"name":"你好,我是abc"}'

curl http://localhost:9200/medcl/folks/_search?q=name:gongsi

但是返回结果把1和2都返回回来了

Completion Suggest如何实现汉字、拼音、简拼搜索?

@medcl 大神,你好。我用es2.2.1版本在做关键字搜索建议的时候,想实现suggest的汉字、拼音、简拼搜索,用了各种办法都没法实现。请帮忙看一下,哪些设置错了。另:我看到前面的帖子里有设置属性类型为multi_field。但suggest没办法设置为multi_fileld啊。
1.索引设置如下
curl -X PUT 192.168.1.45:9200/musiczz/ -d '{
"index": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"type": "custom",
"tokenizer": "my_pinyin",
"filter": [
"simple_pinyin",
"word_delimiter"
]
}
},
"tokenizer": {
"my_pinyin": {
"type": "pinyin",
"first_letter": "none",
"padding_char": ""
}
},
"filter": {
"simple_pinyin": {
"type": "pinyin",
"first_letter": "only",
"padding_char": ""
}
}
}
}
}'

2、mapping如下:
curl -X PUT 192.168.1.45:9200/musiczz/song/_mapping -d '{
"song" : {
"properties" : {
"name" : {"type" : "string"},
"suggest" : { "type" : "completion",
"analyzer" : "pinyin_analyzer",
"search_analyzer" : "pinyin_analyzer",
"payloads" : true
}
}
}
}'
3、索引数据
curl -X PUT '192.168.1.45:9200/musiczz/song/4?refresh=true' -d '{
"name" : "刘德华",
"suggest" : {
"input": [ "刘德华"],
"output": "刘德华",
"payload" : { "artistId" : 2325 },
"weight" : 34
}
}'

4、查询suggest
1)可以查询到结果
curl -X POST '192.168.1.45:9200/musiczz/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "liudehua",
"completion" : {
"field" : "suggest"
}
}
}'
结果如下:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"song-suggest" : [ {
"text" : "liudehua",
"offset" : 0,
"length" : 8,
"options" : [ {
"text" : "刘德华",
"score" : 34.0,
"payload" : {
"artistId" : 2325
}
} ]
} ]
}
2)可以查询到结果
curl -X POST '192.168.1.45:9200/musiczz/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "刘德华",
"completion" : {
"field" : "suggest"
}
}
}'
3)简拼不能查询到结果
curl -X POST '192.168.1.45:9200/musiczz/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "ldh",
"completion" : {
"field" : "suggest"
}
}
}'

请问,如果我要实现简拼也可以查到结果,索引应该如何设置???

检索结果有误的问题

实验证明输入任何的内容都会显示所有的的记录,跟关键字无关。请教这是怎么回事?

查找的时候有点疑问

到第三步时

curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"李小明"}'

然后

curl http://localhost:9200/medcl/folks/_search\?q\=name:%e5%88%98\&pretty\=1
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 2.8713017,
    "hits" : [ {
      "_index" : "medcl",
      "_type" : "folks",
      "_id" : "andy",
      "_score" : 2.8713017, "_source" : {"name":"刘德华"}
    }, {
      "_index" : "medcl",
      "_type" : "folks",
      "_id" : "andy2",
      "_score" : 0.89630747, "_source" : {"name":"李小明"}
    } ]
  }
}%                 

查找"刘",为什么“李小明”也查的到?

curl http://localhost:9200/medcl/folks/_search?q=name:%e5%88%98%e5%be%b7
curl http://localhost:9200/medcl/folks/_search?q=name:liu
curl http://localhost:9200/medcl/folks/_search?q=name:ldh
curl http://localhost:9200/medcl/folks/_search?q=name:dehua

下面几个也是一样,请问下,这个是什么情况?谢谢

关于拼音搜索的一个建议!

比如说我想搜索 重庆酸菜鱼(长城店)在拼音搜索中,通常是用中文分词器,然后用pinyin过滤器。用ik_smart分完词后的结果是 `重庆酸菜鱼`,`长城店`。如果搜索(chongqingsuancaiyu),(chong qing suan cai yu),(cqscy),(changchengdian),(chang cheng dian),(ccd),(重庆酸菜鱼),(长城店)这个几个关键词应该可以搜索到!

多音字拼音和首拼搜索顺序的问题

各位大神好
比如: 银行(yin xing) 香桔(xiang jie)
这样输入的是 yinhang 那么查询的结果完全是不正确的,
看了上面的提问, 这个问题忽略
---------------------以下----------
首拼: 香桔市 (xjs) 锦绣世界(jxsj) 这个时候, 锦绣世界就排到了 香桔市的前面去了, 因为词频上多了一个j

怎么样才能用完整的顺序(xjs) 去搜首拼呢? 没有找到

求助

大神,按照你的配置,做ik+pinyin分词,用拼音搜不出结果

我的配置:
{ "settings":{ "index":{ "number_of_shards": 5, "number_of_replicas": 1, "analysis" : { "analyzer" : { "custom_pinyin_analyzer" : { "tokenizer" : "ik_max_word", "filter" : ["full_pinyin_no_space","full_pinyin_with_space","first_letter_pinyin"] } }, "filter" :{ "full_pinyin_no_space" : { "type" : "pinyin", "first_letter" : "none", "padding_char" : "" }, "full_pinyin_with_space" : { "type" : "pinyin", "first_letter" : "none", "padding_char" : " " }, "first_letter_pinyin" : { "type" : "pinyin", "first_letter" : "only", "padding_char" : "" } } } }, "mappings": { "books": { "properties": { "title": { "type":"string", "index":"analyzed", "analyzer":"custom_pinyin_analyzer", "search_analyzer":"custom_pinyin_analyzer", "boost": 10 } } } } } }

PUT library/books/1
{
"title": "联想召回笔记本电源线"
}
直接搜中文能搜出来,用拼音就不行
GET library/books/_search
{
"size" :10,
"from" : 0,
"query": {
"multi_match": {
"type": "best_fields",
"query": "联想",
"fields": [ "title" ],
"tie_breaker": 0.3,
"minimum_should_match" : "10%"
}
}
}

about elasticsearch.yml pinyin analyzer config

analysis configuration (elasticsearch.yml)

index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
mmseg:
alias: [mmseg_analyzer]
type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
pinyin:
alias: [pinyin_analyzer]
filter : [standard]
type: org.elasticsearch.index.analysis.PinyinAnalyzerProvider
tokenizer: my_pinyin
tokenizer:
my_pinyin:
type: pinyin
first_letter: [none]
padding_char: [ ]

index.analysis.analyzer.default.type : "mmseg"

not is run

please give me guild

拼音不知为何出不来,我的 index和mapping

一模一样的,就是查不出来。。。我的天啊。。。。哪位大哥,告诉我一下吧。谢谢谢谢谢谢谢谢。。。

'settings' => [
    'goods'=> [
       "analysis" => [
           "analyzer" => [
           "pinyin_analyzer" => [
               "tokenizer" => "my_pinyin",
               "filter" => ["standard"]
            ],
        ],
       ],
       "tokenizer" => [
           "my_pinyin" => [
           "type" => "pinyin",
           "first_letter" => "none",
           "padding_char" => " "
           ],
        ],
    ],
],
'mappings' => [
    'goods' => [
        "properties" => [
        "name" => [
            "type" => "multi_field",
            "fields" => [
            'name'=>[
                "type" => "string",
                "store" => "no",
                "term_vector" => "with_positions_offsets",
                "analyzer" => "pinyin_analyzer",
                "boost" => 10
            ],
            "primitive" => [
                "type" => "string",
                "store" => "yes",
                "analyzer" => "keyword"
            ],
            ],
           ]
        ]
    ]
]

为什么没有v1.7.5版本呢?

本来想试试这个pinyin插件,我对应的es是2.3.5, 看说明是需要v1.7.5 release支持。 不过唯独没有这个版本,咋回事???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.