GithubHelp home page GithubHelp logo

deep-named-entity-recognition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deep-named-entity-recognition's Issues

The training data origin

I am writing my diploma thesis, and while on the search for dataset (which I found in several other projects too) I found this project, which helped me tremendously. (So thank you for your work)
But could you tell me the origin of the dataset? (as I am using it too) I cannot seem to find any mention on the internet about it, so I cannot cite it properly in my work.

Memory overflow

Hi!

1.) I changed your wordvec.txt to a new one (trained with google wordvec, and 2.5 M tokens, 100 features).
2.) Changed the hardcoded 300 values to 100.
3.) Changed the annotated corpus to a new one with 9576 words in it.
4.) Started ner_train.py
I get massive memory overflow during this row in ner_train.py:
reader = DataUtil(WORDVEC_FILEPATH, TAGGED_NEWS_FILEPATH)
In data_util.py this row generates a lot of memory (after 4000 rows in the raw_data it fills 120 GB of RAM (with swap)):
self.wordvecs = np.vstack((self.wordvecs, new_wv))
Also getting this error when limiting the wordvec's number of rows to 100000 or 1000.
Can you help me?

Value Error

ValueError: Improper config format: {u'l2': 0.0, u'name': u'WeightRegularizer', u'l1': 0.0}

Any idea what causes this?

Value Error

Using TensorFlow backend.
Traceback (most recent call last):
File "ner_train.py", line 16, in
layer_arg = int(sys.argv[2])
ValueError: invalid literal for int() with base 10: 'news_tagged_data.txt'

When i run python3 ner_train.py wordvecs.txt news_tagged_data.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.