kpsychas / word_etymologist Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 934 KB

Classifier of words based on their origin

License: MIT License

Python 100.00%

linguistic-annotation-framework

word_etymologist's People

Contributors

Stargazers

Watchers

word_etymologist's Issues

Create better validation function

The output of NN is a sequence of classification labels, here True, False,
but it is unlikely if at all possible to have only a single or even just two letter in a row
classified as True/False and the rest of the letters being classified the opposite.

There is need of a function that can even be thought as an extra layer that makes
output "smoother".
A smoother output satisfies the following requirements.

Should be as close as possible to the original output sequence
Given a parameter n and a classifications sequence S,
then for every label in S there is a subsequence of length
n that contains the label and every label in the subsequence is
the same.
If length of subsequence is less than n, all labels in output sequence
should be the same
Assume that the latest labels have more weight.

Improve Error Handling

The project currently skips words that are invalid without printing details about the Error.
Better error handling should include:

The error type
For all foreseen errors make a decision of whether to interrupt program and print a trace or continue training if it is not fatal.

Add option of systematic training that uses full dataset

Add an option to train the network over the whole dataset one or more times.
Currently words are picked randomly and some words may never be picked.

Train Dataset on Custom Word File

The name of annotated word file should be a parameter of train.py script.
Optional extra features

Multiple input files can be loaded in the same script and merged into one structure.
Duplicate words are removed.

Create GUI for interactive session

The following project can be a starting point.

https://github.com/PySimpleGUI/PySimpleGUI

Allow Interactive training of model

This can be a separate script that prompts used to input a word and optionally what its expected output is.
The model should respond with the predicted output.

Bonus:
Keep a history of training inputs and allow user to reuse them.

Enable training words in batches

Currently words are offered to the network one by one. A better framework would allow multiple words to be trained at the same time. To do that we either need to pad all words in a batch to have the same size, or group words by length and sample words from one group at a time to make a batch.

python train.py --program 1
python train.py --program 2

kpsychas / word_etymologist Goto Github PK

word_etymologist's People

Contributors

Stargazers

Watchers

word_etymologist's Issues

Create better validation function

Improve Error Handling

Add option of systematic training that uses full dataset

Train Dataset on Custom Word File

Create GUI for interactive session

Allow Interactive training of model

Enable training words in batches

Create model that handles words of arbitrary length and is bidirectional

Implement Batch Training of Bidirectional LSTM

Add CI/CD support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs