smartschat / cort Goto Github PK
View Code? Open in Web Editor NEWA toolkit for coreference resolution and error analysis.
License: MIT License
A toolkit for coreference resolution and error analysis.
License: MIT License
hi! I get an error from CoreNLP when I'm trying to predict coreference on raw text using "cort-predict-raw" and standard parameters as described in the manual. this is what I get:
2015-11-05 10:37:23,639 INFO Loading model.
2015-11-05 10:37:55,518 INFO Reading in and preprocessing data.
2015-11-05 10:37:55,519 INFO Starting java subprocess, and waiting for signal it's ready, with command: exec java -Xmx4g -XX:ParallelGCThreads=1 -cp '/Library/Python/2.7/site-packages/stanford_corenlp_pywrapper/lib/:/Users/yuliagrishina/Documents/Software/CoreNLP' corenlp.SocketServer --outpipe /tmp/corenlp_pywrap_pipe_pypid=2721_time=1446716275.52 --configfile /Library/Python/2.7/site-packages/cort/config_files/corenlp.ini
INFO:CoreNLP_JavaServer: Using CoreNLP configuration file: /Library/Python/2.7/site-packages/cort/config_files/corenlp.ini
Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/pipeline/StanfordCoreNLP
at corenlp.JsonPipeline.initializeCorenlpPipeline(JsonPipeline.java:206)
at corenlp.SocketServer.main(SocketServer.java:102)
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
do you have any idea why this happens?
I was using model-pair-train.obj model provided in your repository. On running the command provided for coreference resolution for raw data, I got the following error.
Traceback (most recent call last): File "/usr/local/bin/cort-predict-raw", line 132, in <module> testing_corpus = p.run_on_docs("corpus", args.input_filename) File "/usr/local/lib/python3.4/dist-packages/cort/preprocessing/pipeline.py", line 38, in run_on_docs codecs.open(doc, "r", "utf-8") File "/usr/local/lib/python3.4/dist-packages/cort/preprocessing/pipeline.py", line 82, in run_on_doc pdeprel=None TypeError: __new__() missing 1 required positional argument: 'extra'
This got resolved when I added a line
extra = None
following line 82 in cort/preprocessing/pipeline.py
I find this behaviour counter-intuitive: if you read a corpus (using Corpus.from_file
) and write it out right away (using write_to_file
), all set IDs are lost, i.e. the last column contains only minus signs.
I attached a test script and sample document.
Hello,
I encountered a few problems while trying to train a model with the gold standard version of the conll-2012 training set (*_gold_conll).
The first issue occurs during the conversion of certain trees, when some nodes of the trees are deleted but accessed later:
File "$HOME/.local/bin/cort-train", line 132, in <module>
"r", "utf-8"))
File "$HOME/.local/lib/python2.7/site-packages/cort/core/corpora.py", line 79, in from_file
document_as_strings]))
File "$HOME/.local/lib/python2.7/site-packages/cort/core/corpora.py", line 14, in from_string
return documents.CoNLLDocument(string)
File "$HOME/.local/lib/python2.7/site-packages/cort/core/documents.py", line 401, in __init__
[parse.replace("NOPARSE", "S") for parse in parses]#, include_erased=True
File "$HOME/.local/lib/python2.7/site-packages/StanfordDependencies/StanfordDependencies.py", line 116, in convert_trees
for ptb_tree in ptb_trees)
File "$HOME/.local/lib/python2.7/site-packages/StanfordDependencies/StanfordDependencies.py", line 116, in <genexpr>
for ptb_tree in ptb_trees)
File "$HOME/.local/lib/python2.7/site-packages/StanfordDependencies/JPypeBackend.py", line 141, in convert_tree
sentence.renumber()
File "$HOME/.local/lib/python2.7/site-packages/StanfordDependencies/CoNLL.py", line 111, in renumber
for token in self]
KeyError: 18
This happens for several sentences in the training data set (e.g., document bn/cnn/04/cnn_0432, sentence on lines 272-296). One way to avoid the exception is to set include_erased=True
.
The second issue is caused by one sentence in the training set (document mz/sinorama/10/ectb_1005, lines 980-1012):
Traceback (most recent call last):
File "$HOME/.local/bin/cort-train", line 132, in <module>
"r", "utf-8"))
File "$HOME/.local/lib/python2.7/site-packages/cort/core/corpora.py", line 79, in from_file
document_as_strings]))
File "$HOME/.local/lib/python2.7/site-packages/cort/core/corpora.py", line 14, in from_string
return documents.CoNLLDocument(string)
File "$HOME/.local/lib/python2.7/site-packages/cort/core/documents.py", line 414, in __init__
super(CoNLLDocument, self).__init__(identifier, sentences, coref)
File "$HOME/.local/lib/python2.7/site-packages/cort/core/documents.py", line 97, in __init__
self.annotated_mentions = self.__get_annotated_mentions()
File "$HOME/.local/lib/python2.7/site-packages/cort/core/documents.py", line 111, in __get_annotated_mentions
span, self, first_in_gold_entity=set_id not in seen
File "$HOME/.local/lib/python2.7/site-packages/cort/core/mentions.py", line 174, in from_document
mention_property_computer.compute_gender(attributes)
File "$HOME/.local/lib/python2.7/site-packages/cort/core/mention_property_computer.py", line 89, in compute_gender
if __wordnet_lookup_gender(" ".join(attributes["head"])):
TypeError: sequence item 0: expected string, ParentedTree found
The problems seem to be data-related, as none of them occur when using the *_auto_conll version of the conll-2012 training data.
I noticed some discrepancies between the 2 different traversal orders.
reversed()
on line 108 returns an iterator and if it gets exhausted in the first iteration of the loop on 111, the subsequent result is incorrect.
cort/cort/core/head_finders.py
Lines 107 to 112 in 2349f03
Suggest to change from
to_traverse = reversed(tree)
to
to_traverse = list(reversed(tree) )
Hi,
errors_by_type.visualize()
is giving a problem since escape function is not a part of html library for python 2.7. escape is available for html in python 3.x.
It would be great if there is a fix for this.
Thanks,
Joe
I was trying to use cort straight out of the box to predict coreference chains on raw text, but was unable to get it running. Here's what I did-
cort
and installed cort using pip. The github repo was in another folder called cort_tool
, Stanford CoreNLP tools were in another folder called stanford-corenlp
.model-train-pair.obj
and placed it in cort_tool
folder.input.txt
file with a single sentence.$ cd cort_tool
$ cort-predict-raw -in ~/input.txt -model model-pair-train.obj -extractor cort.coreference.approaches.mention_ranking.extract_substructures -perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron -clusterer cort.coreference.clusterer.all_ante -corenlp ~/stanford-corenlp -suffix out 2>&1 | tee ~/output.txt
I got the following output-
2016-10-03 17:17:55,338 INFO Loading model.
In file included from /home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
from /home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /home/cil/.pyxbld/temp.linux-x86_64-3.4/pyrex/cort/coreference/perceptrons.c:274:
/home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by " \
^
In file included from /home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /home/cil/.pyxbld/temp.linux-x86_64-3.4/pyrex/cort/coreference/perceptrons.c:274:
/home/cil/cort/lib/python3.4/site-packages/numpy/core/include/numpy/__multiarray_api.h:1448:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^
2016-10-03 17:18:02,512 INFO Reading in and preprocessing data.
2016-10-03 17:18:02,513 INFO Starting java subprocess, and waiting for signal it's ready, with command: exec java -Xmx4g -XX:ParallelGCThreads=1 -cp '/home/cil/cort/lib/python3.4/site-packages/stanford_corenlp_pywrapper/lib/*:/home/cil/stanford-corenlp/*' corenlp.SocketServer --outpipe /tmp/corenlp_pywrap_pipe_pypid=20030_time=1475486282.512968 --configfile /home/cil/cort/lib/python3.4/site-packages/cort/config_files/corenlp.ini
INFO:CoreNLP_JavaServer: Using CoreNLP configuration file: /home/cil/cort/lib/python3.4/site-packages/cort/config_files/corenlp.ini
Exception in thread "main" java.lang.UnsupportedClassVersionError: edu/stanford/nlp/pipeline/StanfordCoreNLP : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:803)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at corenlp.JsonPipeline.initializeCorenlpPipeline(JsonPipeline.java:206)
at corenlp.SocketServer.main(SocketServer.java:102)
At this moment, the memory usage of the task cort-predict-raw
became Zero so I did a keyboard interrupt and tried again, but got the same result.
I'm on Ubuntu 16.04.
Can you please help me out?
cort currently needs a lot of RAM, predicting with the latent ranking model on the CoNLL-2012 development data takes ~8GB, mainly due to multiprocessing during feature extraction.
Hi,
the adjust_head_for_nam function in cort.core.head_finders crashes whenever it encounters a named entity type of DURATION. This entity type is sometimes generated by the latest version of CoreNLP.
I guess it shouldn't be too difficult to add a pattern for it, but I don't know what would make sense.
/Christian
2016-04-09 05:04:42,285 INFO Preprocessing en/ep-00-06-15.xml.gz.
2016-04-09 05:20:08,227 INFO Extracting system mentions from en/ep-00-06-15.xml.gz.
2016-04-09 05:20:11,552 ERROR Discarding document en/ep-00-06-15.xml.gz
2016-04-09 05:20:11,619 ERROR Traceback (most recent call last):
File "/home/staff/ch/PycharmProjects/cort/extra/annot-wmt.py", line 197, in
doc.system_mentions = mention_extractor.extract_system_mentions(doc)
File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_extractor.py", line 36, in extract_system_mentions
for span in __extract_system_mention_spans(document)]
File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_extractor.py", line 36, in
for span in __extract_system_mention_spans(document)]
File "/home/staff/ch/PycharmProjects/cort/cort/core/mentions.py", line 153, in from_document
mention_property_computer.compute_head_information(attributes)
File "/home/staff/ch/PycharmProjects/cort/cort/core/mention_property_computer.py", line 248, in compute_head_information
attributes["ner"][head_index])
File "/home/staff/ch/PycharmProjects/cort/cort/core/head_finders.py", line 214, in adjust_head_for_nam
raise Exception("Unknown named entity annotation: " + ner_type)
Exception: Unknown named entity annotation: DURATION
When I read this document from CoNLL-2012 into cort, a TypeError is thrown. The ParentedTree enter "head" in file mention_property_computer.py
around line 241 (head = [head_tree[0]]
). The value can be traced to head_finder but I stopped there because there are a lot of alternative rules.
>>> from cort.core.corpora import Corpus
>>> with open('output/debug.conll') as f:
... Corpus.from_file('test', f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/corpora.py", line 79, in from_file
documents.append(from_string("".join(current_document)))
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/corpora.py", line 14, in from_string
return documents.CoNLLDocument(string)
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 414, in __init__
super(CoNLLDocument, self).__init__(identifier, sentences, coref)
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 97, in __init__
self.annotated_mentions = self.__get_annotated_mentions()
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 111, in __get_annotated_mentions
span, self, first_in_gold_entity=set_id not in seen
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/mentions.py", line 174, in from_document
mention_property_computer.compute_gender(attributes)
File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/mention_property_computer.py", line 91, in compute_gender
if __wordnet_lookup_gender(" ".join(attributes["head"])):
TypeError: sequence item 0: expected str instance, ParentedTree found
I am running cort-train
when these errors happen. My setup is Ubuntu 16.04.3, 64G RAM, 4 CPUs.
Process ForkPoolWorker-9:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
put((job, i, result))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
put((job, i, (False, wrapped)))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
Process ForkPoolWorker-10:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
put((job, i, result))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
put((job, i, (False, wrapped)))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
Process ForkPoolWorker-11:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
put((job, i, result))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
self._writer.send_bytes(obj)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
put((job, i, (False, wrapped)))
File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
I'm trying to train a model derived from CoNLL-2012 training set when I got this error.
This is the details of the model:
('-extractor', 'cort.coreference.approaches.mention_ranking.extract_substructures', '-perceptron', 'cort.coreference.approaches.mention_ranking.RankingPerceptron', '-cost_function', 'cort.coreference.cost_functions.cost_based_on_consistency', '-cost_scaling', '100')
This is the error:
2018-09-10 19:57:49,116 INFO Started epoch 1
Traceback (most recent call last):
File "output/cort/venv/bin/cort-train", line 4, in <module>
__import__('pkg_resources').run_script('cort==0.2.4.5', 'cort-train')
File "/Users/minh/EvEn/output/cort/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/minh/EvEn/output/cort/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1438, in run_script
exec(code, namespace, namespace)
File "/Users/minh/EvEn/output/cort/venv/lib/python3.7/site-packages/cort-0.2.4.5-py3.7.egg/EGG-INFO/scripts/cort-train", line 141, in <module>
perceptron
File "/Users/minh/EvEn/output/cort/venv/lib/python3.7/site-packages/cort-0.2.4.5-py3.7.egg/cort/coreference/experiments.py", line 43, in learn
perceptron.fit(substructures, arc_information)
File "output/cort/venv/lib/python3.7/site-packages/cort-0.2.4.5-py3.7.egg/cort/coreference/perceptrons.pyx", line 182, in cort.coreference.perceptrons.Perceptron.fit
self.__update(cons_arcs,
File "output/cort/venv/lib/python3.7/site-packages/cort-0.2.4.5-py3.7.egg/cort/coreference/perceptrons.pyx", line 331, in cort.coreference.perceptrons.Perceptron.__update
arc_information[arc][0]
KeyError: None
Could you please have a look?
I was trying to load a file which is composed of all gold sentences in CoNLL-2012 dev set when this error occurred. Bellow is the full stack trace:
In [2]: reference = corpora.Corpus.from_file("reference", open("output/Thu-Jan-12-17-22-15-CET-2017.gold.txt"))
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-57d8e778731d> in <module>()
----> 1 reference = corpora.Corpus.from_file("reference", open("output/Thu-Jan-12-17-22-15-CET-2017.gold.txt"))
/Users/cumeo/anaconda/lib/python2.7/site-packages/cort/core/corpora.pyc in from_file(description, coref_file)
77
78 return Corpus(description, sorted([from_string(doc) for doc in
---> 79 document_as_strings]))
80
81
/Users/cumeo/anaconda/lib/python2.7/site-packages/cort/core/corpora.pyc in from_string(string)
12
13 def from_string(string):
---> 14 return documents.CoNLLDocument(string)
15
16
/Users/cumeo/anaconda/lib/python2.7/site-packages/cort/core/documents.pyc in __init__(self, document_as_string)
399 sd = StanfordDependencies.get_instance()
400 dep_trees = sd.convert_trees(
--> 401 [parse.replace("NOPARSE", "S") for parse in parses],
402 )
403 sentences = []
/Users/cumeo/.local/lib/python2.7/site-packages/StanfordDependencies/StanfordDependencies.pyc in convert_trees(self, ptb_trees, representation, universal, include_punct, include_erased, **kwargs)
114 include_erased=include_erased)
115 return Corpus(self.convert_tree(ptb_tree, **kwargs)
--> 116 for ptb_tree in ptb_trees)
117
118 @abstractmethod
/Users/cumeo/.local/lib/python2.7/site-packages/StanfordDependencies/StanfordDependencies.pyc in <genexpr>((ptb_tree,))
114 include_erased=include_erased)
115 return Corpus(self.convert_tree(ptb_tree, **kwargs)
--> 116 for ptb_tree in ptb_trees)
117
118 @abstractmethod
/Users/cumeo/.local/lib/python2.7/site-packages/StanfordDependencies/JPypeBackend.pyc in convert_tree(self, ptb_tree, representation, include_punct, include_erased, add_lemmas, universal)
85 self._raise_on_bad_input(ptb_tree)
86 self._raise_on_bad_representation(representation)
---> 87 tree = self.treeReader(ptb_tree)
88 if tree is None:
89 raise ValueError("Invalid Penn Treebank tree: %r" % ptb_tree)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)
The data looks like this:
Minhs-MacBook-Pro:EvEn cumeo$ head output/Thu-Jan-12-17-22-15-CET-2017.gold.txt
#begin document (bc/cctv/00/cctv_0000); part 000
bc/cctv/00/cctv_0000 0 0 In IN (TOP(S(PP* - - - Speaker#1 * * * *-
bc/cctv/00/cctv_0000 0 1 the DT (NP(NP* - - - Speaker#1 (DATE* * * * -
bc/cctv/00/cctv_0000 0 2 summer NN *) summer - 1 Speaker#1 * * * * -
bc/cctv/00/cctv_0000 0 3 of IN (PP* - - - Speaker#1 * * * * -
bc/cctv/00/cctv_0000 0 4 2005 CD (NP*)))) - - - Speaker#1 *) * * *-
bc/cctv/00/cctv_0000 0 5 , , * - - - Speaker#1 * * * * -
bc/cctv/00/cctv_0000 0 6 a DT (NP(NP* - - - Speaker#1 * (ARG0* * * -
bc/cctv/00/cctv_0000 0 7 picture NN *) picture - 8 Speaker#1 * *) * * -
bc/cctv/00/cctv_0000 0 8 that WDT (SBAR(WHNP*) - - - Speaker#1 * (R-ARG0*) ** -
Anyone has any ideas how to fix this?
Best regards,
Minh
I'm trying to run cort-predict-raw OOTB using the following setup:
cort-predict-raw -in ~/data/test1/*.txt \
-model models/model-pair-train.obj \
-extractor cort.coreference.approaches.mention_ranking.extract_substructures \
-perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron \
-clusterer cort.coreference.clusterer.all_ante \
-corenlp ~/systems/stanford/stanford-corenlp-full-2016-10-31 \
#-features my_features.txt \
For some reason it throws an exception for the string "SEC" (with quotations) in:
Hello my name is "SEC".
If I replace SEC or remove the quotations the file will pass through.
The exception:
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/cort-predict-raw", line 136, in <module>
doc.system_mentions = mention_extractor.extract_system_mentions(doc)
File "/home/ubuntu/.local/lib/python3.5/site-packages/cort/core/mention_extractor.py", line 36, in extract_system_mentions
for span in __extract_system_mention_spans(document)]
File "/home/ubuntu/.local/lib/python3.5/site-packages/cort/core/mention_extractor.py", line 36, in <listcomp>
for span in __extract_system_mention_spans(document)]
File "/home/ubuntu/.local/lib/python3.5/site-packages/cort/core/mentions.py", line 153, in from_document
mention_property_computer.compute_head_information(attributes)
File "/home/ubuntu/.local/lib/python3.5/site-packages/cort/core/mention_property_computer.py", line 248, in compute_head_information
attributes["ner"][head_index])
File "/home/ubuntu/.local/lib/python3.5/site-packages/cort/core/head_finders.py", line 214, in adjust_head_for_nam
raise Exception("Unknown named entity annotation: " + ner_type)
Exception: Unknown named entity annotation: DURATION
Is it possible to visualize only files that contain at least one error? I have a large corpus and after filtering there are only a couple hundreds of errors. So I find myself looking at clean files most of the time (i.e. no annotations besides mention spans).
Hello,
I encounter a problem when trying to visualize a system's recall errors by type, as described in the documentation. My reference and system files are in conll, no errors are displayed when running the code, but the resulting html file doesn't display the document text and any fields in the left panel except for "Documents". When the jquery and jquery.jsPlumb imports in the html file are commented out, everything is correctly displayed (document text, left panel, and gold/system mention boundaries), but without the possibility to interact. Reproduced in the latest Firefox and chrome; python 2.7. The visualization of a document processed with cort-predict-raw seems to work fine.
Thanks!
I visualised coreference errors (errors_by_type.visualize()), but it is not possible to scroll the left part of the visualisation (the right part with the text works well).
I am still using MacOS and Safari, I know it hasn't been tested, just thought you might be interested to know.
I was trying to run cort-predict-raw with following command:
python3.5 /usr/local/bin/cort-predict-raw -in ~/data/pilot_44_docs/*.txt
-model models/model-pair-train.obj
-extractor cort.coreference.approaches.mention_ranking.extract_substructures
-perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron
-clusterer cort.coreference.clusterer.all_ante
-corenlp ~/systems/stanford/stanford-corenlp-full-2016-10-31
and got the following error message:
Traceback (most recent call last):
File "/usr/local/bin/cort-predict-raw", line 136, in
doc.system_mentions = mention_extractor.extract_system_mentions(doc)
File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in extract_system_mentions
for span in __extract_system_mention_spans(document)]
File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in
for span in __extract_system_mention_spans(document)]
File "/usr/local/lib/python3.5/dist-packages/cort/core/mentions.py", line 126, in from_document
i, sentence_span = document.get_sentence_id_and_span(span)
TypeError: 'NoneType' object is not iterable
2017-04-27 09:17:06,058 WARNING Killing subprocess 14154
2017-04-27 09:17:06,395 INFO Subprocess seems to be stopped, exit code -9
It works without a problem with python2 though. I'm running this on Ubuntu16.04.
Is it possible to retrain models (for example, the one's from https://github.com/smartschat/cort/blob/master/COREFERENCE.md#model-downloads) with new data?
I tried training using-
cort-train -in new_retraining_data.conll \
-out pretrained_model.obj \
-extractor cort.coreference.approaches.mention_ranking.extract_substructures \
-perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron \
-cost_function cort.coreference.cost_functions.cost_based_on_consistency \
-n_iter 5 \
-cost_scaling 100 \
-random_seed 23
but I think it overwrites the model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.