GithubHelp home page GithubHelp logo

berkeleyparser's People

berkeleyparser's Issues

Unable to load model from jar resource

Currently the load method requires a file name.  This could be minimally
refactored to take an input stream as an option instead:
e.g. 
public static ParserData Load(InputStream inStream) {
    ParserData pData = null;
    try {
      GZIPInputStream gzis = new GZIPInputStream(inStream); // Compressed
      ObjectInputStream in = new ObjectInputStream(gzis); // Load objects
      pData = (ParserData)in.readObject(); // Read the mix of grammars
      in.close(); // And close the stream.
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    } catch (ClassNotFoundException e) {
      System.out.println("Class not found!");
      return null;
    }
    return pData;
  }

  public static ParserData Load(String fileName) {
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(fileName); // Load from file
      return Load(fis);
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    }
    finally {
      try {
        if (fis != null) fis.close();
      }
      catch (IOException e) {
        e.printStackTrace();
      }
    }
  }


Original issue reported on code.google.com by [email protected] on 24 Jul 2009 at 4:36

Setting binarization type of a parser

What steps will reproduce the problem?

public Parser getParser(String grammarFile, Options opts) {
    double threshold = 1.0;
    ParserData pData = ParserData.Load(grammarFile);
    Grammar grammar = pData.getGrammar();
    Numberer.setNumberers(pData.getNumbs());
    Parser parser = new CoarseToFineMaxRuleParser(grammar,
pData.getLexicon(), threshold,-1,opts.viterbi, opts.substates, opts.scores,
opts.accurate, false, true, true);
    // parser.binarization = pData.getBinarization(); // HERE LIES THE ISSUE
    return parser;
}

What is the expected output? What do you see instead?

Since the 'binarization' attribute of the parser is package-level
protected, there seems to be no way of setting the binarization type.

Suggestion: create a setter for the binarization attribute.

Original issue reported on code.google.com by [email protected] on 21 Jul 2009 at 10:00

No 5 split-merge cycle grammar for English in Downloads

In the README it is advised to use a grammar with 5 split-merge cycles when 
parsing non-WSJ text. Since most of the text in the universe is actually not 
from the WSJ, it would be most useful if this 5 split-merge grammar would be 
available in the downloads.

Cheers

Original issue reported on code.google.com by [email protected] on 23 Mar 2012 at 7:02

multiple spaces in input without -tokenize

What steps will reproduce the problem?

Have two spaces or more between words in input

example: echo "a  b" | java -jar berkeleyParser.jar -gr eng_sm5.gr
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
    at java.lang.String.charAt(String.java:687)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getSignature(Unknown Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getCachedSignature(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.score(Unknown Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.initializeChart(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.doPreParses(Unknown
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at edu.berkeley.nlp.PCFGLA.BerkeleyParser.main(BerkeleyParser.java:190)


If there is only one space, one obtains a parse tree.
echo "a b" | java -jar berkeleyParser2.jar -gr eng_sm5.gr 
( (NP (DT a) (X (SYM b))) )

If you run the parser with tokenization (-tokenize), it works fine.

Suggestion: track the line number in the input and show it when printing
the trace. Makes debugging easier.

Original issue reported on code.google.com by [email protected] on 11 Feb 2009 at 9:15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.