jiangfeng1124 / berkeleyparser Goto Github PK

Automatically exported from code.google.com/p/berkeleyparser

Java 100.00%

berkeleyparser's People

berkeleyparser's Issues

Unable to load model from jar resource

Currently the load method requires a file name.  This could be minimally
refactored to take an input stream as an option instead:
e.g. 
public static ParserData Load(InputStream inStream) {
    ParserData pData = null;
    try {
      GZIPInputStream gzis = new GZIPInputStream(inStream); // Compressed
      ObjectInputStream in = new ObjectInputStream(gzis); // Load objects
      pData = (ParserData)in.readObject(); // Read the mix of grammars
      in.close(); // And close the stream.
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    } catch (ClassNotFoundException e) {
      System.out.println("Class not found!");
      return null;
    }
    return pData;
  }

  public static ParserData Load(String fileName) {
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(fileName); // Load from file
      return Load(fis);
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    }
    finally {
      try {
        if (fis != null) fis.close();
      }
      catch (IOException e) {
        e.printStackTrace();
      }
    }
  }

Original issue reported on code.google.com by [email protected] on 24 Jul 2009 at 4:36

Setting binarization type of a parser

What steps will reproduce the problem?

public Parser getParser(String grammarFile, Options opts) {
    double threshold = 1.0;
    ParserData pData = ParserData.Load(grammarFile);
    Grammar grammar = pData.getGrammar();
    Numberer.setNumberers(pData.getNumbs());
    Parser parser = new CoarseToFineMaxRuleParser(grammar,
pData.getLexicon(), threshold,-1,opts.viterbi, opts.substates, opts.scores,
opts.accurate, false, true, true);
    // parser.binarization = pData.getBinarization(); // HERE LIES THE ISSUE
    return parser;
}

What is the expected output? What do you see instead?

Since the 'binarization' attribute of the parser is package-level
protected, there seems to be no way of setting the binarization type.

Suggestion: create a setter for the binarization attribute.

Original issue reported on code.google.com by [email protected] on 21 Jul 2009 at 10:00

No 5 split-merge cycle grammar for English in Downloads

In the README it is advised to use a grammar with 5 split-merge cycles when 
parsing non-WSJ text. Since most of the text in the universe is actually not 
from the WSJ, it would be most useful if this 5 split-merge grammar would be 
available in the downloads.

Cheers

Original issue reported on code.google.com by [email protected] on 23 Mar 2012 at 7:02

multiple spaces in input without -tokenize

What steps will reproduce the problem?

Have two spaces or more between words in input

example: echo "a  b" | java -jar berkeleyParser.jar -gr eng_sm5.gr
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
    at java.lang.String.charAt(String.java:687)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getSignature(Unknown Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getCachedSignature(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.score(Unknown Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.initializeChart(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.doPreParses(Unknown
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at edu.berkeley.nlp.PCFGLA.BerkeleyParser.main(BerkeleyParser.java:190)


If there is only one space, one obtains a parse tree.
echo "a b" | java -jar berkeleyParser2.jar -gr eng_sm5.gr 
( (NP (DT a) (X (SYM b))) )

If you run the parser with tokenization (-tokenize), it works fine.

Suggestion: track the line number in the input and show it when printing
the trace. Makes debugging easier.

Original issue reported on code.google.com by [email protected] on 11 Feb 2009 at 9:15

jiangfeng1124 / berkeleyparser Goto Github PK

berkeleyparser's People

berkeleyparser's Issues

Unable to load model from jar resource

Setting binarization type of a parser

No 5 split-merge cycle grammar for English in Downloads

multiple spaces in input without -tokenize

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs