GithubHelp home page GithubHelp logo

chalk's People

Contributors

bethard avatar dlwh avatar hamnis avatar jasonbaldridge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chalk's Issues

input file path is mis interpreted by MascSlab

Given the below code to print all sentences under data.txt I have ended up with an exception. The stack trace has also been attached. I could observe from the trace that input file path is misinterpreted.

/home/codeWorld/workspaces/scala-nlp/target/scala-2.10/classes/data-s.xml

object ChalkDemo extends App {
  val url = this.getClass.getResource("/data.txt")
  val slab = MascSlab(url)
  val sentSlab = MascSlab.s(slab)
}

Exception in thread "main" java.io.FileNotFoundException: /home/codeWorld/workspaces/scala-nlp/target/scala-2.10/classes/data-s.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at java.io.FileInputStream.(FileInputStream.java:101)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:189)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:812)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333)
at scala.xml.factory.XMLLoader$class.loadXML(XMLLoader.scala:40)
at scala.xml.XML$.loadXML(XML.scala:57)
at scala.xml.factory.XMLLoader$class.load(XMLLoader.scala:54)
at scala.xml.XML$.load(XML.scala:57)
at chalk.corpora.MascSlab$.s(MascUtil.scala:325)
at com.prassee.chalk.ChalkDemo$delayedInit$body.apply(ChalkDemo.scala:10)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at com.prassee.chalk.ChalkDemo$.main(ChalkDemo.scala:6)
at com.prassee.chalk.ChalkDemo.main(ChalkDemo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

Transient NullPointerException in sentence detection

When trying sentenceDetector.sentDetect(someText), I sometimes get:

NullPointerException: null (GISModel.java:127)
[error] chalk.learn.model.IndexHashTable.get(IndexHashTable.java:118)
[error] chalk.learn.maxent.GISModel.eval(GISModel.java:127)
[error] chalk.learn.maxent.GISModel.eval(GISModel.java:107)
[error] chalk.learn.maxent.GISModel.eval(GISModel.java:99)
[error] chalk.tools.sentdetect.SentenceDetectorME.sentPosDetect(SentenceDetectorME.java:185)
[error] chalk.tools.sentdetect.SentenceDetectorME.sentDetect(SentenceDetectorME.java:134)
...

If I rerun the test, this error usually immediately goes away. Anyone else experienced this?

Is this project active?

Just a question. From the wiki I can understand that this project spawned from Apache open NLP. The last commit I was seeing in this project was in 2014, but Apache open NLP seems pretty active.

I am just curious if development has halted or has it moved to another project.

We are a full scala shop so looking out for a reliable NLP library in scala.

remove build script from bin/

Since the driver script for chalk (bin/chalk) is intended to be included in the users' PATH, the bin/build script and sbt jar probably should not be in this location.

Remove subpackages that are now in Nak.

When moving breeze.learn to Nak, it was necessary to pull the serialization sub package and some classes in chalk.data into nak. These should go away from Chalk, which can depend on Nak for them (as planned).

Cannot build

On commit f073dd1, executing ./build update compile results in an error. Here's what I did to reproduce this error:

git clone [email protected]:scalanlp/chalk.git
cd chalk
./build update compile

Here's the error message:

chalk [master] % ./build update compile
[info] Loading project definition from /home/malcolm/local/development/pure/scala/chalk/project
[info] Set current project to chalk (in build file:/home/malcolm/local/development/pure/scala/chalk/)
[info] Updating {file:/home/malcolm/local/development/pure/scala/chalk/}default-230b82...
[info] Resolving org.scalanlp#breeze-core_2.10;0.4-SNAPSHOT ...
[warn] module not found: org.scalanlp#breeze-core_2.10;0.4-SNAPSHOT
[warn] ==== local: tried
[warn] /home/malcolm/.ivy2/local/org.scalanlp/breeze-core_2.10/0.4-SNAPSHOT/ivys/ivy.xml
[warn] ==== sonatype snapshots: tried
[warn] https://oss.sonatype.org/content/repositories/snapshots/org/scalanlp/breeze-core_2.10/0.4-SNAPSHOT/breeze-core_2.10-0.4-SNAPSHOT.pom
[warn] ==== sonatype releases: tried
[warn] https://oss.sonatype.org/content/repositories/releases/org/scalanlp/breeze-core_2.10/0.4-SNAPSHOT/breeze-core_2.10-0.4-SNAPSHOT.pom
[warn] ==== public: tried
[warn] http://repo1.maven.org/maven2/org/scalanlp/breeze-core_2.10/0.4-SNAPSHOT/breeze-core_2.10-0.4-SNAPSHOT.pom
[info] Resolving org.scalanlp#breeze-math_2.10;0.4-SNAPSHOT ...
[warn] module not found: org.scalanlp#breeze-math_2.10;0.4-SNAPSHOT
[warn] ==== local: tried
[warn] /home/malcolm/.ivy2/local/org.scalanlp/breeze-math_2.10/0.4-SNAPSHOT/ivys/ivy.xml
[warn] ==== sonatype snapshots: tried
[warn] https://oss.sonatype.org/content/repositories/snapshots/org/scalanlp/breeze-math_2.10/0.4-SNAPSHOT/breeze-math_2.10-0.4-SNAPSHOT.pom
[warn] ==== sonatype releases: tried
[warn] https://oss.sonatype.org/content/repositories/releases/org/scalanlp/breeze-math_2.10/0.4-SNAPSHOT/breeze-math_2.10-0.4-SNAPSHOT.pom
[warn] ==== public: tried
[warn] http://repo1.maven.org/maven2/org/scalanlp/breeze-math_2.10/0.4-SNAPSHOT/breeze-math_2.10-0.4-SNAPSHOT.pom
[info] Resolving org.scala-lang#scala-actors;2.10.1 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.scalanlp#breeze-core_2.10;0.4-SNAPSHOT: not found
[warn] :: org.scalanlp#breeze-math_2.10;0.4-SNAPSHOT: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.scalanlp#breeze-core_2.10;0.4-SNAPSHOT: not found
unresolved dependency: org.scalanlp#breeze-math_2.10;0.4-SNAPSHOT: not found
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:214)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:122)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:121)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:117)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:117)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:105)
at sbt.IvySbt.liftedTree1$1(Ivy.scala:52)
at sbt.IvySbt.action$1(Ivy.scala:52)
at sbt.IvySbt$$anon$3.call(Ivy.scala:61)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98)
at xsbt.boot.Locks$GlobalLock.withChannelRetries$1(Locks.scala:81)
at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102)
at xsbt.boot.Using$.withResource(Using.scala:11)
at xsbt.boot.Using$.apply(Using.scala:10)
at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62)
at xsbt.boot.Locks$GlobalLock.liftedTree1$1(Locks.scala:52)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:61)
at sbt.IvySbt.withIvy(Ivy.scala:102)
at sbt.IvySbt.withIvy(Ivy.scala:98)
at sbt.IvySbt$Module.withModule(Ivy.scala:117)
at sbt.IvyActions$.update(IvyActions.scala:121)
at sbt.Classpaths$$anonfun$work$1$1.apply(Defaults.scala:955)
at sbt.Classpaths$$anonfun$work$1$1.apply(Defaults.scala:953)
at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$58.apply(Defaults.scala:976)
at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$58.apply(Defaults.scala:974)
at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:978)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:973)
at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
at sbt.Classpaths$.cachedUpdate(Defaults.scala:981)
at sbt.Classpaths$$anonfun$47.apply(Defaults.scala:858)
at sbt.Classpaths$$anonfun$47.apply(Defaults.scala:855)
at sbt.Scoped$$anonfun$hf10$1.apply(Structure.scala:586)
at sbt.Scoped$$anonfun$hf10$1.apply(Structure.scala:586)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311)
at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:41)
at sbt.std.Transform$$anon$5.work(System.scala:71)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
at sbt.Execute.work(Execute.scala:238)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
error sbt.ResolveException: unresolved dependency: org.scalanlp#breeze-core_2.10;0.4-SNAPSHOT: not found
[error] unresolved dependency: org.scalanlp#breeze-math_2.10;0.4-SNAPSHOT: not found
[error] Total time: 2 s, completed Sep 30, 2013 9:56:07 PM

It appears as if there's a problem downloading the SNAPSHOT jars...

Add optimizations for maximum entropy tokenization

Depending on the version of opennlp and whichever fork you use off of it (language of choice), you can extend tokenization to support certain pre-optimizations such as: preserving hashtags, urls, @mentions, email addresses, emoticons..etc. It would be nice if chalk supported these kind of extensive features.

Remove OpenNLP code from Chalk

I had thought to do an adaptation of the OpenNLP Tools code as part of the process, but have decided instead to wipe that out and go with the breeze.process->chalk transition directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.