GithubHelp home page GithubHelp logo

hypenet's People

Contributors

vered1986 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hypenet's Issues

what is the form of Wikipedia ? xml or json or text?

thanks. I just use the xml but it failed. it says "ValueError: Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].sent_start "

parse_wikipedia.py produces a very large file with a newer version of spacy

The original corpus in the paper was processed using spacy version 0.99. Using a newer spacy version creates a much larger triplet file (over 11TB, while the original file was ~900GB). For now the possible solutions are:

  1. Use spacy version 0.99 - install using:
    pip install spacy==0.99
    python -m spacy.en.download all --force

  2. Limit parse_wikipedia.py to a specific vocabulary as in LexNET.

I'm working on figuring out what happens in the newer spacy version, and writing a memory-efficient version of parse_wikipedia.py, in case the older spacy version is the buggy one, and the number of paths should in fact be much larger.

Thanks @christos-c for finding this bug!

wikipedia eump file

Hey Vered,
I am very interested in trying your code too. but i don't know the format of wikipedia dump file. Is it xml or json ?

dynet verson

Hi,I got some error like this:
terminate called after throwing an instance of 'std::invalid_argument'
what(): Attempting to define parameters before initializing DyNet. Be sure to call dynet::initialize() before defining your model.
Aborted (core dumped)
What's your dynet verson?How can I fix it?

False Negatives in the dataset

Hello, upon experimenting with the dataset I came across several examples where a hypernym relationship exists but is labelled as False (mostly novels).
Here are a few examples from the test dataset (lexical split) -

saraswatichandra novel False
pollyanna novel False
jurassic park novel False
makamisa novel False
the hunger games novel False
the secret novel False
...

You mention in the paper that the dataset was created via distant supervision and only the positives are manually audited. Could I state that the dataset is noisy and needs to be cleaned up a bit? Or are these, according to you, truly False annotations?
Thank You

Bug in saving the model

Currently, only the NN parameters are saved (lookup tables, W1, b1, etc) but the LSTM parameters are not saved.

KeyError: '\xf0\x93\x86\x8e\xf0\x93\x85\x93\xf0\x93\x8f\x8f\xf0\x93\x8a\x96'

Hello, I need to reproduce the results on a subset of your dataset and I met some problems including pid killed in parsing, ascii error in create_*_1.py and key error in create_*_2.py. Some of them are the same as @wt123u in another issue.

I delete & before line 40 in create_*.sh to solve the pid killed problem.

I add sys.setdefaultencoding('utf-8') to solve the ascii error.

Then I met the KeyError in create_*_2.py, I tried to solve it by putting x_id, y_id, path_id = term_to_id_db[x], term_to_id_db[y], path_to_id_db.get(path, -1) to the try block, finally I got a db file nearly 70GB. When I train the model, it shows Pairs without paths: 1549 , all dataset: 20314. Continuing to train can damage the results, so it would be unfair.

I am using the 20181201 version of wiki dump and spacy 1.9.0, can the different versions or the above changes be the reason of KeyError? What can I do to get fair results? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.