GithubHelp home page GithubHelp logo

noahs-ark / semafor Goto Github PK

View Code? Open in Web Editor NEW
95.0 95.0 47.0 83.27 MB

http://www.ark.cs.cmu.edu/SEMAFOR

License: GNU General Public License v3.0

Shell 2.20% HTML 0.27% Perl 5.30% Scala 0.35% Java 83.58% Python 7.89% R 0.41%

semafor's People

Contributors

nschneid avatar sammthomson avatar simonsuster avatar vanatteveldt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

semafor's Issues

Cannot find and load class

hi,

I am having this problem while running semafor.

"Converting postagged input to conll.
Error: Could not find or load main class edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat“

Anyone know how to solve this?

Thanks

Issue with tokenizer.sed

While I am running semafor v3. I am getting following error.
sed: couldn't open file $/home/tony/semafor-master/scripts/tokenizer.sed: No such file or directory

Execution time of semafor (server mode)

Hello,
parsing 1 sentence under server mode takes an average of 15seconds while in web demo it takes 1 second only.
Is there a way to reduce execution time?
Thanks

Problem with Maven in Installation

Hi !
I try to install Semafor since 2 weeks now, and i have a problem with the Maven compilation :

[INFO] Building Semafor 3.0-alpha-04
[INFO] ------------------------------------------------------------------------
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-source-plugin/2.2.1/maven-source-plugin-2.2.1.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.772 s
[INFO] Finished at: 2018-05-31T10:14:16+02:00
[INFO] Final Memory: 8M/34M
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-source-plugin:2.2.1 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-source-plugin:jar:2.2.1: Could not transfer artifact org.apache.maven.plugins:maven-source-plugin:pom:2.2.1 from/to central (https://repo.maven.apache.org/maven2): java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

Does someone know what's the issue ?
Thanks in advance

not working with maven 3.1

Hi,

Just letting you know and documenting for other people that the build is not working with the maven 3.1 series but it is working fine with maven 3.0.5.

Java version:

java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

Using 3.1: mvn package I am getting:

[ERROR] Number of foreign imports: 1
[ERROR] import: Entry[import  from realm ClassRealm[maven.api, parent: null]]
[ERROR]
[ERROR] -----------------------------------------------------: org.sonatype.aether.graph.Dependency
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound
java.lang.NoClassDefFoundError: org/sonatype/aether/graph/Dependency
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2484)
    at java.lang.Class.getDeclaredMethods(Class.java:1827)
    at com.google.inject.spi.InjectionPoint.getInjectionPoints(InjectionPoint.java:674)
    at com.google.inject.spi.InjectionPoint.forInstanceMethodsAndFields(InjectionPoint.java:366)
    at com.google.inject.internal.ConstructorBindingImpl.getInternalDependencies(ConstructorBindingImpl.java:165)
    at com.google.inject.internal.InjectorImpl.getInternalDependencies(InjectorImpl.java:609)
    at com.google.inject.internal.InjectorImpl.cleanup(InjectorImpl.java:565)
    at com.google.inject.internal.InjectorImpl.initializeJitBinding(InjectorImpl.java:551)
    at com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:865)
    at com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:790)
    at com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:278)
    at com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:210)
    at com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:986)
    at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1019)
    at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:982)
    at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1032)
    at org.eclipse.sisu.space.AbstractDeferredClass.get(AbstractDeferredClass.java:48)
    at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
    at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
    at com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:70)
    at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:100)
    at org.eclipse.sisu.plexus.PlexusLifecycleManager.onProvision(PlexusLifecycleManager.java:133)
    at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:109)
    at com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:55)
    at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:68)
    at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1054)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:59)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
    at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:997)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1047)
    at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:993)
    at org.eclipse.sisu.inject.LazyBeanEntry.getValue(LazyBeanEntry.java:82)
    at org.eclipse.sisu.plexus.LazyPlexusBean.getValue(LazyPlexusBean.java:51)
    at org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:260)
    at org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:240)
    at org.apache.maven.shared.dependency.graph.internal.DefaultDependencyGraphBuilder.buildDependencyGraph(DefaultDependencyGraphBuilder.java:60)
    at org.apache.maven.plugins.shade.mojo.ShadeMojo.updateExcludesInDeps(ShadeMojo.java:965)
    at org.apache.maven.plugins.shade.mojo.ShadeMojo.createDependencyReducedPom(ShadeMojo.java:938)
    at org.apache.maven.plugins.shade.mojo.ShadeMojo.execute(ShadeMojo.java:544)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: java.lang.ClassNotFoundException: org.sonatype.aether.graph.Dependency
    at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:50)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:259)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:242)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:227)

This is actually a know issue as when moving from mvn 3.0 to 3.1 as I found on google: http://mail-archives.apache.org/mod_mbox/hbase-issues/201308.mbox/%3CJIRA.12663004.1376095060640.37694.1376095307925@arcas%3E

I am not much of a Java person so I am not sure how to fix it.

Thanks.

Semirings.java: "name clash, have the same erasure" problem

So I just checked out semafor and tried to build it using Maven:

jiehan@tpx1c /m/d/j/D/w/semafor> git log -1
commit a25f817027463923ea21166b2f43464722273fe8
Author: Sam Thomson <[email protected]>
Date:   Fri Jul 26 16:44:39 2013 -0400

    better workaround for Morpha bug (turns "null" into "")
jiehan@tpx1c /m/d/j/D/w/semafor> mvn package
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Semafor 3.0-alpha-04
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ Semafor ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 8 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.0:compile (default-compile) @ Semafor ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 164 source files to /mnt/docs/jiehan/Dropbox/workspace/semafor/target/classes
[INFO] -------------------------------------------------------------
[WARNING] COMPILATION WARNING : 
[INFO] -------------------------------------------------------------
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/XmlUtils.java:[43,41] com.sun.org.apache.xpath.internal.XPathAPI is internal proprietary API and may be removed in a future release
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/XmlUtils.java:[107,45] com.sun.org.apache.xpath.internal.XPathAPI is internal proprietary API and may be removed in a future release
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/XmlUtils.java:[145,45] com.sun.org.apache.xpath.internal.XPathAPI is internal proprietary API and may be removed in a future release
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/XmlUtils.java:[167,45] com.sun.org.apache.xpath.internal.XPathAPI is internal proprietary API and may be removed in a future release
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java: Some input files use unchecked or unsafe operations.
[WARNING] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java: Recompile with -Xlint:unchecked for details.
[INFO] 6 warnings 
[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java:[196,56] name clash: $(java.util.Collection<java.util.Set<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>>) and $(java.util.Collection<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>) have the same erasure
[ERROR] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java:[183,51] name clash: $(java.util.Collection<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>) in edu.cmu.cs.lti.ark.util.Semirings.MaxPath and $(java.util.Collection<V>) in edu.cmu.cs.lti.ark.util.Semirings.Operation have the same erasure, yet neither overrides the other
[INFO] 2 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.245s
[INFO] Finished at: Mon Jul 29 01:52:47 EDT 2013
[INFO] Final Memory: 16M/205M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) on project Semafor: Compilation failure: Compilation failure:
[ERROR] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java:[196,56] name clash: $(java.util.Collection<java.util.Set<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>>) and $(java.util.Collection<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>) have the same erasure
[ERROR] /mnt/docs/jiehan/Dropbox/workspace/semafor/src/main/java/edu/cmu/cs/lti/ark/util/Semirings.java:[183,51] name clash: $(java.util.Collection<edu.cmu.cs.lti.ark.util.ds.Pair<java.lang.Double,edu.cmu.cs.lti.ark.util.ds.path.Path<T>>>) in edu.cmu.cs.lti.ark.util.Semirings.MaxPath and $(java.util.Collection<V>) in edu.cmu.cs.lti.ark.util.Semirings.Operation have the same erasure, yet neither overrides the other
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Could this be a JDK incompatibility?

jiehan@tpx1c /m/d/j/D/w/semafor> java -version
java version "1.7.0_40"
OpenJDK Runtime Environment (IcedTea 2.4.1) (ArchLinux build 7.u40_2.4.1-1-x86_64)
OpenJDK 64-Bit Server VM (build 24.0-b50, mixed mode)

I am new to Java, so maybe I am doing something wrong here. If anyone could point out the problem (or share a compiled JAR with me for now first) that would be wonderful! Thanks!

Preformat to ConLL for server

Are there any plans to have the server job accept plain text? If not is there a way to preformat to ConLL? The link within Token.java to the specific format for this project no longer works so I do not know how to map specific dependencies to output from CoreNLP or some other preprocessing step.

incomplete JSON output

Hello I am trying semafor. I was able to run it and fetch JSON output but I see difference in JSON and XML output. "annotationSets -> frameElements" not showing all the labels of XML. Please help me out to get complete JSON as shown on the web demo.

Test set split

Hi, is there a way to obtain the test set annotation example IDs (or the name of the 23 documents) you used for your experiments on FrameNet 1.5, especially related to the paper "Semi-Supervised Frame-Semantic Parsing for Unknown Predicates", where you used 2,420 sentences from the fulltext annotations?

ERROR: Exception in thread "main" java.lang.NumberFormatException: null

hi,

I received this error when parse the sentences:

Performing frame-semantic parsing.
input-file:/tmp/semafor.1v1uWbbdQ9/conll
output-file:/home/wendy/semafor/bin/review_gold_output.txt
model-dir:/home/wendy/models/semafor_malt_model_20121129
numthreads:

Exception in thread "main" java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at edu.cmu.cs.lti.ark.util.CommandLineOptions$IntOption.set(CommandLineOptions.java:187)
at edu.cmu.cs.lti.ark.util.CommandLineOptions.init(CommandLineOptions.java:267)
at edu.cmu.cs.lti.ark.fn.utils.FNModelOptions.(FNModelOptions.java:40)
at edu.cmu.cs.lti.ark.fn.utils.FNModelOptions.(FNModelOptions.java:36)
at edu.cmu.cs.lti.ark.fn.Semafor.main(Semafor.java:94)

Anyone know how to solve this?

Many thanks.

error in large sentences

When I run the runSemafor.sh script on very large lines of input text (e.g, 458,584 lines), then I got an error on 9389th line. The error message is like:

Problem. Count of line 0 (10) not equal to zeroth line (14).

Is there any way to handle the error by passing the line which causes certain errors?

NumberFormatException at Alphabet creation step during training

A NumberFormatException occurs at step 1 of the Alphabet creation script (training/trainIdModel.sh) when it tries to process some of the lines of cv.train.sentences.frame.elements. The problem is the that numbers separated by : are found instead of expected simple integers.

Example line:
4 Economy economy.n 7 economy 1 Political_region 3:4 Descriptor 5:6 Economy 7

https://github.com/Noahs-ARK/semafor/blob/master/src/main/java/edu/cmu/cs/lti/ark/fn/identification/training/AlphabetCreationThreaded.java#L199

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "3:4"
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.createAlphabet(AlphabetCreationThreaded.java:163)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.main(AlphabetCreationThreaded.java:106)
Caused by: java.lang.NumberFormatException: For input string: "3:4"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.processLine(AlphabetCreationThreaded.java:201)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.access$100(AlphabetCreationThreaded.java:51)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded$1.call(AlphabetCreationThreaded.java:179)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded$1.call(AlphabetCreationThreaded.java:175)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

What does the 3:4 format mean and how should one proceed in order to work around this problem?

include FrameNet IDs in output

There has been a request to include the FrameNet IDs (frame, frame element, and lexical unit where applicable) to SEMAFOR's output.

Command-line interface

A user has asked how to have SEMAFOR output JSON rather than XML, and I realized that I can't tell from the README what the correct flag is for this. Where is the CLI implemented/documented?

Clean up

Wish list: Semafor creates temporary files in directories named something like

        /tmp/semafor.zFGtD9Onbd

It would be nice if it could clean up after itself and delete these directories upon completion.

Maybe with a debug option that would optionally allow us to keep it, for debugging purposes.

Cheers,
David

NoSuchElementException is thrown for some weird sentences

Exception in thread "main" java.util.NoSuchElementException
	at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
	at edu.cmu.cs.lti.ark.fn.utils.DataPoint.buildParsesForLine(DataPoint.java:171)
	at edu.cmu.cs.lti.ark.fn.utils.DataPointWithFrameElements.<init>(DataPointWithFrameElements.java:86)
	at edu.cmu.cs.lti.ark.fn.parsing.DataPrep.load(DataPrep.java:176)
	at edu.cmu.cs.lti.ark.fn.parsing.DataPrep.<init>(DataPrep.java:103)
	at edu.cmu.cs.lti.ark.fn.parsing.CreateAlphabet.getDataPoints(CreateAlphabet.java:124)
	at edu.cmu.cs.lti.ark.fn.parsing.CreateAlphabet.run(CreateAlphabet.java:76)
	at edu.cmu.cs.lti.ark.fn.parsing.ParserDriver.identifyArguments(ParserDriver.java:307)
	at edu.cmu.cs.lti.ark.fn.parsing.ParserDriver.runParser(ParserDriver.java:238)
	at edu.cmu.cs.lti.ark.fn.parsing.ParserDriver.main(ParserDriver.java:108)

This happens for the attached file, whose contents are merely:

| origin = 

When running bin/fnParserDriverMalt.sh (specifying xml format)

failure.txt

Error: could not match input

I'm running Semafor-3.0-alpha-04 and it's generally working great.

In this instance, the conll files gets created correctly (I can include it if that would help in debugging), but the frame-semantic parsing fails with

Error: could not match input

What could be going wrong?

Cheers,
David

2014-09-01_1000_US_MSNBC_Meet_the_Press.conll


Performing frame-semantic parsing.
input-file:/tmp/semafor.ajYsvERezG/conll
output-file:/sweep/2014/2014-09/2014-09-01/2014-09-01_1000_US_MSNBC_Meet_the_Press.json
model-dir:/mnt/tvspare/software/java/semafor-experimental/semafor/models/semafor_malt_model_20121129
numthreads:5
Initializing frame identification model...
Reading serialized required data
Done reading serialized required data
Reading graph from: /mnt/tvspare/software/java/semafor-experimental/semafor/models/semafor_malt_model_20121129/sparsegraph.gz...
Read graph successfully.
Reading model parameters...
100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000 1900000 2000000 2100000 2200000 2300000 2400000 2500000 2600000 2700000 2800000 2900000 3000000 Done reading model parameters.
Initializing alphabet for argument identification..
0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000 1900000 2000000 2100000 2200000 2300000 2400000 2500000 2600000 2700000 2800000 2900000 3000000
parsed sentence 4 in 606 millis.
parsed sentence 1 in 1009 millis.
parsed sentence 5 in 531 millis.
parsed sentence 0 in 1248 millis.
parsed sentence 6 in 346 millis.
parsed sentence 2 in 1366 millis.
parsed sentence 8 in 262 millis.
parsed sentence 10 in 191 millis.
parsed sentence 11 in 89 millis.
parsed sentence 9 in 377 millis.
parsed sentence 7 in 595 millis.
parsed sentence 15 in 69 millis.
parsed sentence 14 in 95 millis.
parsed sentence 16 in 311 millis.
parsed sentence 13 in 617 millis.
parsed sentence 12 in 673 millis.
parsed sentence 20 in 2 millis.
parsed sentence 18 in 136 millis.
parsed sentence 21 in 36 millis.
parsed sentence 22 in 23 millis.
parsed sentence 17 in 552 millis.
parsed sentence 24 in 126 millis.
parsed sentence 26 in 22 millis.
parsed sentence 23 in 159 millis.
parsed sentence 27 in 68 millis.
parsed sentence 29 in 23 millis.
parsed sentence 25 in 177 millis.
parsed sentence 28 in 241 millis.
parsed sentence 19 in 1440 millis.
parsed sentence 3 in 4081 millis.
parsed sentence 33 in 8 millis.
parsed sentence 31 in 20 millis.
parsed sentence 30 in 30 millis.
parsed sentence 34 in 29 millis.
parsed sentence 32 in 84 millis.
parsed sentence 35 in 88 millis.
parsed sentence 40 in 0 millis.
parsed sentence 39 in 12 millis.
parsed sentence 42 in 1 millis.
parsed sentence 43 in 0 millis.
parsed sentence 44 in 4 millis.
parsed sentence 45 in 61 millis.
parsed sentence 36 in 101 millis.
parsed sentence 41 in 120 millis.
parsed sentence 46 in 52 millis.
parsed sentence 37 in 146 millis.
parsed sentence 50 in 2 millis.
parsed sentence 38 in 0 millis.
parsed sentence 47 in 442 millis.
parsed sentence 53 in 11 millis.
parsed sentence 49 in 442 millis.
parsed sentence 52 in 77 millis.
parsed sentence 56 in 1 millis.
parsed sentence 57 in 55 millis.
parsed sentence 48 in 518 millis.
parsed sentence 51 in 497 millis.
parsed sentence 58 in 54 millis.
parsed sentence 54 in 155 millis.
parsed sentence 59 in 89 millis.
parsed sentence 62 in 33 millis.
parsed sentence 61 in 55 millis.
parsed sentence 65 in 20 millis.
parsed sentence 66 in 6 millis.
parsed sentence 55 in 207 millis.
parsed sentence 64 in 51 millis.
parsed sentence 69 in 9 millis.
parsed sentence 63 in 119 millis.
parsed sentence 67 in 89 millis.
parsed sentence 68 in 93 millis.
parsed sentence 72 in 34 millis.
parsed sentence 70 in 93 millis.
parsed sentence 74 in 16 millis.
parsed sentence 76 in 7 millis.
parsed sentence 75 in 29 millis.
parsed sentence 77 in 21 millis.
parsed sentence 73 in 67 millis.
parsed sentence 79 in 100 millis.
parsed sentence 71 in 223 millis.
parsed sentence 81 in 40 millis.
parsed sentence 83 in 48 millis.
parsed sentence 82 in 58 millis.
parsed sentence 84 in 49 millis.
parsed sentence 86 in 7 millis.
parsed sentence 85 in 63 millis.
parsed sentence 78 in 294 millis.
parsed sentence 80 in 405 millis.
parsed sentence 60 in 941 millis.
parsed sentence 90 in 0 millis.
parsed sentence 89 in 4 millis.
parsed sentence 93 in 0 millis.
parsed sentence 88 in 6 millis.
parsed sentence 87 in 13 millis.
parsed sentence 92 in 17 millis.
parsed sentence 97 in 11 millis.
parsed sentence 94 in 36 millis.
parsed sentence 95 in 36 millis.
parsed sentence 99 in 0 millis.
parsed sentence 100 in 10 millis.
parsed sentence 101 in 17 millis.
parsed sentence 98 in 44 millis.
parsed sentence 102 in 26 millis.
parsed sentence 96 in 105 millis.
parsed sentence 104 in 42 millis.
parsed sentence 103 in 58 millis.
parsed sentence 107 in 0 millis.
parsed sentence 106 in 6 millis.
parsed sentence 108 in 49 millis.
parsed sentence 112 in 5 millis.
parsed sentence 113 in 7 millis.
parsed sentence 91 in 218 millis.
parsed sentence 111 in 55 millis.
parsed sentence 116 in 17 millis.
parsed sentence 114 in 88 millis.
parsed sentence 109 in 150 millis.
parsed sentence 117 in 49 millis.
parsed sentence 118 in 52 millis.
parsed sentence 121 in 9 millis.
parsed sentence 122 in 0 millis.
parsed sentence 120 in 60 millis.
parsed sentence 119 in 163 millis.
parsed sentence 115 in 233 millis.
parsed sentence 123 in 126 millis.
parsed sentence 126 in 9 millis.
parsed sentence 125 in 96 millis.
parsed sentence 128 in 70 millis.
parsed sentence 129 in 15 millis.
parsed sentence 130 in 23 millis.
parsed sentence 124 in 215 millis.
parsed sentence 131 in 23 millis.
parsed sentence 127 in 121 millis.
parsed sentence 105 in 956 millis.
java.util.concurrent.ExecutionException: java.lang.Error: Error: could not match input
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at edu.cmu.cs.lti.ark.fn.Semafor$2.run(Semafor.java:176)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.Error: Error: could not match input
at uk.ac.susx.informatics.Morpha.zzScanError(Morpha.java:52702)
at uk.ac.susx.informatics.Morpha.next(Morpha.java:54582)
at edu.cmu.cs.lti.ark.util.nlp.MorphaLemmatizer.getLemma(MorphaLemmatizer.java:23)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer$1.apply(Lemmatizer.java:17)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer$1.apply(Lemmatizer.java:15)
at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:491)
at java.util.AbstractList$Itr.next(AbstractList.java:358)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyFromCollection(ImmutableList.java:284)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:253)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.Sentence.(Sentence.java:25)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer.addLemmas(Lemmatizer.java:15)
at edu.cmu.cs.lti.ark.fn.Semafor.addLemmas(Semafor.java:330)
at edu.cmu.cs.lti.ark.fn.Semafor.parseSentence(Semafor.java:225)
at edu.cmu.cs.lti.ark.fn.Semafor$3.call(Semafor.java:199)
at edu.cmu.cs.lti.ark.fn.Semafor$3.call(Semafor.java:195)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Exception in thread "Thread-2" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Error: Error: could not match input
at edu.cmu.cs.lti.ark.fn.Semafor$2.run(Semafor.java:182)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.util.concurrent.ExecutionException: java.lang.Error: Error: could not match input
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at edu.cmu.cs.lti.ark.fn.Semafor$2.run(Semafor.java:176)
... 1 more
Caused by: java.lang.Error: Error: could not match input
at uk.ac.susx.informatics.Morpha.zzScanError(Morpha.java:52702)
at uk.ac.susx.informatics.Morpha.next(Morpha.java:54582)
at edu.cmu.cs.lti.ark.util.nlp.MorphaLemmatizer.getLemma(MorphaLemmatizer.java:23)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer$1.apply(Lemmatizer.java:17)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer$1.apply(Lemmatizer.java:15)
at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:491)
at java.util.AbstractList$Itr.next(AbstractList.java:358)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyFromCollection(ImmutableList.java:284)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:253)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.Sentence.(Sentence.java:25)
at edu.cmu.cs.lti.ark.util.nlp.Lemmatizer.addLemmas(Lemmatizer.java:15)
at edu.cmu.cs.lti.ark.fn.Semafor.addLemmas(Semafor.java:330)
at edu.cmu.cs.lti.ark.fn.Semafor.parseSentence(Semafor.java:225)
at edu.cmu.cs.lti.ark.fn.Semafor$3.call(Semafor.java:199)
at edu.cmu.cs.lti.ark.fn.Semafor$3.call(Semafor.java:195)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
parsed sentence 135 in 8 millis.
parsed sentence 136 in 44 millis.
parsed sentence 132 in 75 millis.
parsed sentence 134 in 109 millis.
parsed sentence 133 in 594 millis.

runMalt.sh tweak

Should bin/runMalt.sh have a path for java on line 71?

time ${JAVA_HOME_BIN}/java -Xmx2g -jar maltparser-1.7.2.jar ...

BTW we're experimenting with -XX:+UseNUMA -- would you expect a speed-up? We're not seeing any improvements in median speed, but the occasional slower processing runs vanish. I don't know if there's a simple way to test for a NUMA-capable machine.

Cheers,
David

Results different form online demo

I'm getting different results with this code than with the online demo at http://demo.ark.cs.cmu.edu/parse

For the sentence "she is working on making a wonderful cake" the demo gives me the frames Working_on, Manufacturing, Desireability, and Food.

When I run that same sentence through the code here I get the frames Working_on, Causality, Desireability, and Food.

I've seen differences on a lot of sentences, and I much prefer the results from the demo. Is there any way you can tell me how to get those results with this code?

In config.sh there is a line export TURBO_MODEL_DIR="{BASE_DIR}/models/turbo_20130606".

Is the Turbo model what I instead of the Malt model? If so, can you tell me where I can get it and how to make the sure code uses it?

Thanks!

Exception in thread "main" java.util.concurrent.RejectedExecutionException

I was running Semafor on a dataset with 404291 sentences.

However it stopped at sentence 12339 with the following errors.

Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@277c0f21 rejected from java.util.concurrent.ThreadPoolExecutor@6073f712[Shutting down, pool size = 50, active threads = 50, queued tasks = 17, completed tasks = 12335]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at edu.cmu.cs.lti.ark.fn.Semafor.runParser(Semafor.java:195)
at edu.cmu.cs.lti.ark.fn.Semafor.main(Semafor.java:100)

Thanks for any advice and helps.

Part of speech tagger is broken

********************************************************************** Part-of-speech tagging tokenized data.... /mnt/d/Programs/semafor/master3/scripts/jmx /mnt/d/Programs/semafor/master3 Read 11692 items from tagger.project/word.voc Read 45 items from tagger.project/tag.voc Read 42680 items from tagger.project/tagfeatures.contexts Read 42680 contexts, 117558 numFeatures from tagger.project/tagfeatures.fmap Read model tagger.project/model : numPredictions=45, numParams=117558 Read tagdict from tagger.project/tagdict *This is MXPOST (Version 1.0)* *Copyright (c) 1997 Adwait Ratnaparkhi* I/O Error during tokenize: java.io.IOException: Incorrect function

Converting postagged input to conll

Hello,
when I try to run semafor, it stops in the Converting postagged input to conll phase.

Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
TEMP_DIR: /tmp/semafor.oHswfdoPiw
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129

Tokenizing file: Data/Cause.txt

real 0m0.039s
user 0m0.000s
sys 0m0.000s
Finished tokenization.

Part-of-speech tagging tokenized data....
/opt/semafor/scripts/jmx /opt/semafor/bin
Read 11692 items from tagger.project/word.voc
Read 45 items from tagger.project/tag.voc
Read 42680 items from tagger.project/tagfeatures.contexts
Read 42680 contexts, 117558 numFeatures from tagger.project/tagfeatures.fmap
Read model tagger.project/model : numPredictions=45, numParams=117558
Read tagdict from tagger.project/tagdict
This is MXPOST (Version 1.0)
Copyright (c) 1997 Adwait Ratnaparkhi
Sentence: 0 Length: 1 Elapsed Time: 0.024 seconds.
Sentence: 1 Length: 0 Elapsed Time: 0.0 seconds.

real 0m1.937s
user 0m0.800s
sys 0m0.048s
/opt/semafor/bin
Finished part-of-speech tagging tokenized data.

Converting postagged input to conll.
Exception in thread "main" java.lang.IllegalArgumentException:
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:83)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:115)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:100)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.convertStream(ConvertFormat.java:94)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.main(ConvertFormat.java:76)
Caused by: java.lang.IllegalArgumentException: PosToken must have 2 "_"-separated fields
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.Token.fromPosTagged(Token.java:248)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$2.decodeToken(SentenceCodec.java:28)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:79)
... 6 more

Any help you can give will be greatly appreciated.

ArrayIndexOutOfBoundsException

On the current git/master version, I get an ArrayIndexOutOfBoundsException

java.lang.ArrayIndexOutOfBoundsException: 20
    at edu.cmu.cs.lti.ark.util.nlp.parse.DependencyParse.getHeuristicHead(DependencyParse.java:383)

I call it with the corenlp parse of the sentence

If I had to rely on a traditional pension, I would not have had the flexibility that I currently enjoy."

The whole session, files, and stack trace are at https://gist.github.com/vanatteveldt/9180176

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.