GithubHelp home page GithubHelp logo

Comments (13)

medcl avatar medcl commented on June 2, 2024

分词配置里面,把ngram去掉,重试

send via my Phone.

在 2015年8月12日,上午8:50,starckgates [email protected] 写道:

hi,medcl大神,我又来这边看pinyin了,请教您个问题,如果我想要通过pinyin进行正向搜索,应该怎么处理呢?比如,我想搜索 “我们来自**” “**是个大国”,我输入‘zg’ 就会把前面的搜出来,您明白这个意思了吗?就是说我想把正向的内容放在前面,或者是像sql中like zg% 这样的。

另外,我分词设置成如下

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard,nGram]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prifix"
padding_char: " "

映射设置成如下:

POST /tag/keywords/_mapping
{
"keywords": {
"properties": {
"kwname": {
"type": "multi_field",
"fields": {
"kwname": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
},
"primitive": {
"type": "string",
"store": "yes",
"analyzer": "keyword"
}
}
}
}
}
}

http://localhost:9200/tag/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=pinyin_analyzer

结果是

{"tokens":[{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ld","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"dh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hl","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"li","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"i","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"iu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ud","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"e","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"eh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"a","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这个有点太细了,怎么让他粗一些
比如就要
{"token":"liu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},

应该怎么设置?

谢谢大神~~~~~


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

这样就变成


{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这样了,并不是每个pinyin都分开索引的~

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

这个时候搜索不到内容了

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

用prefix query试试,你的需求不就这个么?

send via my Phone.

在 2015年8月12日,上午9:21,starckgates [email protected] 写道:

这个时候搜索不到内容了

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

额,好复杂,我试试,谢谢大神~

哦,对了大神,我看pinyin这里面创建索引用的settings,而不是在配置文件里设置的,这个是不是可以直接写在配置文件里来用?像ik分词那样,把analysis定义在.yml配置文件中?

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

standard filter你也去掉了么?

send via my Phone.

在 2015年8月12日,上午9:19,starckgates [email protected] 写道:

这样就变成

{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这样了,并不是每个pinyin都分开索引的~


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

filter没去掉。
我写在.yml里了
如下


################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "


from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

拼音analyzer的type没有设置为custom

send via my Phone.

在 2015年8月12日,上午10:00,starckgates [email protected] 写道:

filter没去掉。
我写在.yml里了
如下

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "


Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
type: custom
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "

这样吗?

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

是的,注意格式

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

嗯,这里面好像一写代码就都靠右对齐了。。。
我试试

from elasticsearch-analysis-pinyin.

starckgates avatar starckgates commented on June 2, 2024

可以了大神,谢谢大神。~~

from elasticsearch-analysis-pinyin.

medcl avatar medcl commented on June 2, 2024

cool~

from elasticsearch-analysis-pinyin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.