sing1ee / elasticsearch-jieba-plugin Goto Github PK
View Code? Open in Web Editor NEWjieba analysis plugin for elasticsearch 7.0.0, 6.4.0, 6.0.0, 5.4.0,5.3.0, 5.2.2, 5.2.1, 5.2, 5.1.2, 5.1.1
License: MIT License
jieba analysis plugin for elasticsearch 7.0.0, 6.4.0, 6.0.0, 5.4.0,5.3.0, 5.2.2, 5.2.1, 5.2, 5.1.2, 5.1.1
License: MIT License
我將此插件有送繁中的文檔進行解析,可以運作但結果有點怪怪的,請問這插件現在是否支持繁中文字?
test analyzer:
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU
这一步,我用postman 工具测试 用GET 是不行的, 要用POST
下载并构建
gradle pz
复制文件
将1中构建成功后build\distributions\elasticsearch-jieba-plugin-6.0.0.zip 复制到 elasticsearch-6.0.0\plugins解压并删除解压文件
创建词库文件
在elasticsearch-6.0.0\config 下创建stopwords/stopwords.txt 及 synonyms/synonyms.txt
启动ES
start elasticsearch
test analyzer
在postman GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=测试结巴分词看看结果出乎意料
错误结果:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "request body or source parameter is required"
}
],
"type": "parse_exception",
"reason": "request body or source parameter is required"
},
"status": 400
}
老大能不能给提供个思路,怎么加入热更新词库的功能呢,动态更新词典
自定义词:学区房
user.txt未添加”学区房“时,搜索时会命中”学区“和”房“这两个词
user.txt中添加 学区房 10(或者100/10000) 之后,未命中”学区房“这个词甚至没有”学区“和”房“这两个词
但是我测试分词器分出了”学区房“这个词
会是哪里有问题呢?
Elasticsearch6.6.2 ,我使用的是jieba 6.4.1插件
然后创建索引的时候,自定义analyzer如下所示:
"jieba_syno_search": {
"type": "custom",
"tokenizer": "jieba_search",
"filter": [
"jieba_stop",
"my_synonym_filter"
]
},
发送请求后,就遇到下面这种错误:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: 美国和伊拉克 analyzed to a token (伊拉克) with position increment != 1 (got: 2)"
}
}
},
"status": 400
}
还请指教,谢谢!
环境:es 5.3.0 jieba 5.3.0
按照readme 一步一步到
========= OK ==========
test analyzer:
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU
========================
================添加同义词 出错 ========
Pay attention to *jieba_synonym, same with jieba_stop, the format of synoyms.txt:
北京大学,北大,pku
清华大学,清华,Tsinghua University
===》 这一步我修改对应的文件synoyms.txt 后,重启 es ,会报错。日志如下 (如果是空内容是不会出错)
[2018-08-10T15:48:55,535][WARN ][o.e.g.Gateway ] [7ixgH36] recovering index [welink_index/7MYv9V1JSs-DAF8CBEKULA] failed - recovering as closed
java.lang.IllegalArgumentException: failed to build synonyms
at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.(SynonymTokenFilterFactory.java:97) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.lambda$buildTokenFilterFactories$1(AnalysisRegistry.java:169) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry$1.get(AnalysisRegistry.java:265) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:342) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:171) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexService.(IndexService.java:145) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:427) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:460) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.gateway.Gateway.performStateRecovery(Gateway.java:135) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.gateway.GatewayService$1.doRun(GatewayService.java:229) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:613) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.0.jar:5.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281) ~[?:1.8.0_171]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) ~[?:?]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:?]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_171]
at java.io.BufferedReader.read1(BufferedReader.java:210) ~[?:1.8.0_171]
at java.io.BufferedReader.read(BufferedReader.java:286) ~[?:1.8.0_171]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:1.8.0_171]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[?:1.8.0_171]
at java.io.LineNumberReader.readLine(LineNumberReader.java:201) ~[?:1.8.0_171]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:82) ~[lucene-analyzers-common-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:44:09]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:44:09]
at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.(SynonymTokenFilterFactory.java:92) ~[elasticsearch-5.3.0.jar:5.3.0]
... 16 more
====================================
synoyms.txt 内容是如下格式???
北京大学,北大,pku
清华大学,清华,Tsinghua University
========================
Pay attention to *jieba_synonym, same with jieba_stop, the format of synoyms.txt:
北京大学,北大,pku
清华大学,清华,Tsinghua University
我对java不是很了解, 更改了build.gradle的version为6.5.4 然后重新构建插件。 还是提示org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Plugin [analysis-jieba] was built for Elasticsearch version 6.4.0 but version 6.5.4 is running。
应该如何修改配置呢?
如题,IK完全靠分词词库,有点吃不消啊
经过测试, 发现jieba-7.4.2与jieba-6.4.1分词结果不一致, 主要是position不一致.
请问如果我从es6.4.1升级至es7.4.2, jieba这个版本间的差异, 需要对数据进行reindex吗?
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [elasticsearch-jieba-plugin-5.3.0]. Was the plugin built before 2.0?
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.3.0.jar:5.3.0]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [elasticsearch-jieba-plugin-5.3.0]. Was the plugin built before 2.0?
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:131) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:302) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.0.jar:5.3.0]
... 6 more
Caused by: java.nio.file.NoSuchFileException: E:\elasticsearch-5.3.0\elasticsearch-5.3.0\plugins\elasticsearch-jieba-plugin-5.3.0\plugin-descriptor.properties
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:230) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_102]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_102]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_102]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_102]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:131) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:302) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.0.jar:5.3.0]
... 6 more
How to add stopwords to config/stopwords.txt without restarting es
请问作者, 当前jieba插件是否支持热更新词典呢?
支持的话有没有具体配置的文档?
之前公司业务需要,暂时不能升级elasticsearch版本.....请问5.0版本是否试用呢?
elasticsearch中的jieba如何使用cut_all来分词
我加入了 16:9 這組字串在dic裡,但仍然會被切開,請問有什麼地方可以設定遇到 ':' 符號不分詞嗎?
我看huaban的文档中有“全角转半角、大写转小写、字符分词”的tokenizer filter,但是没有找到使用方法,也不知道这些filter的名称,希望指点一下
一直在使用插件6.4.0版本,近期计划升级ES版本到7.6.0,期待早日适配!
@sing1ee 作者您好,近期是否有计划更新呢,谢谢
我将最新版的代码中的外挂词典部分迁移过来有影响么
Doesn't work with elasticsearch 6.4.0 (and probably all versions > 6.0.0)
Plugin version 6.0.1
[2018-09-13T15:11:28,885][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Unknown properties in plugin descriptor: [jvm, site, isolated]
type: "parse_exception",
reason: "request body or source parameter is required"
PUT jieba_index
{
"settings": {
"analysis": {
"analyzer": {
"jieba_analyzer": {
"tokenizer": "jieba_index"
}
}
}
},
"mappings": {
"fulltext": {
"properties": {
"content": {
"type": "text",
"analyzer": "jieba_analyzer"
}
}
}
}
}
POST jieba_index/fulltext
{"content":"李海林"}
ES 返回:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=1 for field 'content'"
}
],
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=1 for field 'content'"
},
"status": 400
}
按照readme 步骤
一直到
test analyzer:
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU
=====这一步得到的结果和readme是一样的,同义词能识别出来=======
但是到下一步:
search
POST http://localhost:9200/jieba_index/fulltext/_search
Request body:
{
"query" : { "match" : { "content" : "pku" }},
"highlight" : {
"pre_tags" : ["", ""],
"post_tags" : ["", ""],
"fields" : {
"content" : {}
}
}
}
得到的结果是不会出现同义词效果的。就是 《北京大学 》没有标注,只有pku 。???为什么?
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for plugin directory [plugin.xml]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.0.0.jar:7.0.0]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for plugin directory [plugin.xml]
at org.elasticsearch.plugins.PluginsService.readPluginBundle(PluginsService.java:401) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.findBundles(PluginsService.java:386) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:379) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:151) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:306) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:251) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:325) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.0.0.jar:7.0.0]
... 6 more
Caused by: java.nio.file.FileSystemException: /home/work/fnrd/elastic-search/elasticsearch-7.0.0/plugins/plugin.xml/plugin-descriptor.properties: 不是目录
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_181]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_181]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_181]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_181]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:156) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.readPluginBundle(PluginsService.java:398) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.findBundles(PluginsService.java:386) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:379) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:151) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:306) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:251) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:325) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.0.0.jar:7.0.0]
按照介绍中的方法安装Plugin后,貌似Plugin没有工作。
curl -X GET "localhost:9200/_cat/plugins?v&s=component&h=name,component,version,description"
也发现不了这个插件。
创建index反馈:
{
"error": "IndexCreationException[[jieba_index] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_ana] failed to find tokenizer under name [jieba_index]]; ",
"status": 400
}
创建索引
DELETE /jieba_test
PUT /jieba_test
{
"settings": {
"analysis": {
"filter": {
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
},
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
]
}
}
}
}
}
报错
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: 北京大学 analyzed to a token (北京大学) with position increment != 1 (got: 0)"
}
}
},
"status": 400
}
近义词文件如下:
北京大学,北大,pku
清华大学,清华,Tsinghua University
之前比较忙,最近有空梳理一下分词插件的问题。
大家有什么需求都可以提出来,我这边根据需求的重要程度进行开发。
还是说目前只更新到6.0版本的。
curl -XGET "http://172.3.0.89:9200/_analyze" -H 'Content-Type: application/json' -d'{"analyzer": "jieba_index","text": "中华人民共和国成立"}'
{"tokens":[{"token":"中华人民共和国成立","start_offset":0,"end_offset":9,"type":"word","position":0}]}
如果使用官方的结巴进行精准分词可以分出
[中华人民共和国, 成立 ]
都是使用默认的字典。 这个是什么原因呢?
In branch 5.2
我用6.7.2的elasticsearch使用6.4.1的jieba会报错如下,请问怎么解决呢
Hi,
After running gradle pz, it stucks at - Download http://maven.aliyun.com/nexus/content/groups/public/com/huaban/jieba-analysis/1.0.2/jieba-analysis-1.0.2.jar
startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=93,endOffset=96,lastStartOffset=94
还有:下面的例子中 synonyms/synonyms.txt 这个文件没有在打包文件里面,是没有上传?
{
"settings": {
"analysis": {
"filter": {
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
},
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
]
}
}
}
}
}
{'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'failed to build synonyms'}], 'type': 'illegal_argument_exception', 'reason': 'failed to build synonyms', 'caused_by': {'type': 'parse_exception', 'reason': 'Invalid synonym rule at line 1', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'term: 北京大学 analyzed to a token (北京大学) with position increment != 1 (got: 0)'}}}, 'status': 400}
您好 在執行 gradle gz
時遇到以下錯誤
想請問該如何解決?
FAILURE: Build failed with an exception.
* What went wrong:
Task 'gz' not found in root project 'elasticsearch'.
* Try:
Run gradle tasks to get a list of available tasks. Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
BUILD FAILED
Total time: 0.837 secs
reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=74,endOffset=78,lastStartOffset=76 for field 'content'
按照README说明文档进行:
1: 下载 es : v6.0.0 jieba : v6.0.1
2: 编译结巴并放到es插件下
3: 拷贝停用词到指定目录下,并自己创建个synonyms.txt (找不到jieba对应的同义词文件)文件到指定目录下
4: create index 成功 PUT http://localhost:9200/jieba_index 。。。
==========
5:test analyzer 这步出错
执行以下步骤
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU
返回以下结果,请问下是什么原因导致的?
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "request body or source parameter is required"
}
],
"type": "parse_exception",
"reason": "request body or source parameter is required"
},
"status": 400
}
如可以能否加你QQ 联系谢谢。
GET http://localhost:9200/jieba_index
response :
{
"jieba_index": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "jieba_index",
"creation_date": "1532942171024",
"analysis": {
"filter": {
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
},
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
}
},
"analyzer": {
"my_ana": {
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
],
"tokenizer": "jieba_index"
}
}
},
"number_of_replicas": "1",
"uuid": "1pDMeWYJQDmRAlD4P7KfJA",
"version": {
"created": "6000099"
}
}
}
}
}
], spins? [no], types [ext4]
[2017-02-06T09:57:31,526][INFO ][o.e.e.NodeEnvironment ] [o-IYnu8] heap size [123.7mb], compressed ordinary object pointers [true]
[2017-02-06T09:57:31,672][INFO ][o.e.n.Node ] node name [o-IYnu8] derived from node ID [o-IYnu8pRMe6qi1j9t2bXQ]; set [node.name] to override
[2017-02-06T09:57:31,675][INFO ][o.e.n.Node ] version[5.1.2], pid[6585], build[c8c4c16/2017-01-11T20:18:39.146Z], OS[Linux/2.6.32-220.4.1.el6.centos.plus.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_91/25.91-b14]
[2017-02-06T09:57:32,980][ERROR][o.e.b.Bootstrap ] Exception
java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.1.2.jar:5.1.2]
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/plugins/jieba/plugin-descriptor.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_91]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_91]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_91]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_91]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.1.2.jar:5.1.2]
... 13 more
[2017-02-06T09:57:32,988][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) ~[elasticsearch-5.1.2.jar:5.1.2]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.1.2.jar:5.1.2]
... 6 more
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/plugins/jieba/plugin-descriptor.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_91]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_91]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_91]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_91]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.1.2.jar:5.1.2]
... 6 more
elasticsearch6.3.0用哪个版本,还是说不支持?
能否同时提供TokenFilter用于组合分词?
句子:为什么通知我可以成功申请,还不批给我
分词和jieba
Plugin [jieba] was built for Elasticsearch version 7.0.0 but version 7.3.0 is running
Does this plugin works on ES5.6.5..
thanks
需求:
如何根据自定义词典,从文本中提取词典中的关键词。
设想三种方案:
1、输入文本,获取结巴分词结果,编写一套代码根据分词结果对比自定义词典,输出同时包含在文本中和词典中的词。
2、输入文本,编写一套代码,逐个查询词典中的词是否在文本中出现,输出同时包含在文本中和词典中的词。
3、利用结巴词性标注的功能,在自定义词典中,将自定义词全部标注为某一特殊词性,利用结巴根据词性提取关键词功能,输入文本,提取指定词性的关键词。
4、利用结巴自定义词典功能,分词完全根据指定的自定义词典进行分词,输入文本,调用指定词典,输出分词结果。
5、利用结巴权重功能,输出分词结果中将指定自定义词典中的词的权重调大,其他词权重调低,输出分词结果后,截取权重靠前的几个词。
问题:
哪种方案可以实现需求?
结巴有没有直接根据算定义词典提取关键词的功能?
因为没有看到可以直接实现类似这样需求的资料,所以在此提问,请不吝赐教!
如果您了解这方面技术,请提供一下思路,如果能提供一下教程学习地址,或者写点参考代码就更好了。谢谢,不胜感激!
http://localhost:9200/jieba_index/fulltext/_search
{
"query" : { "match" : { "content" : "大学" }},
"highlight" : {
"pre_tags" : ["", ""],
"post_tags" : ["", ""],
"fields" : {
"content" : {}
}
}
}
得到结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.68324494,
"hits": [
{
"_index": "jieba_index",
"_type": "fulltext",
"_id": "2",
"_score": 0.68324494,
"_source": {
"content": "**的伟大时代来临了,欢迎参观北京大学PKU"
},
"highlight": {
"content": [
"**的伟大时代来临了,欢迎参观北京大学PKU"
]
}
}
]
}
}
我想要的是:
"**的伟大时代来临了,欢迎参观北京大学PKU"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.