- corpus
- Repository of texts
- models
- Storage for models
- normalized
- Storage for normalized texts
- build
- Code for the project
From the root of the project run
p2v [queue] OPTIONS
queue
is used to submit the job for running on cloud resources
word2vec ARGS
to run word2vec command directly
pre corpus/x/x
to normalize the text
raw COMMANDS
to run raw commands in the env
sync_up
to sync local data to cloud archives for processing
sync_down
to sync changes on the cloud back to the local file system
Will add more options for particular functions as we need them.
- train /corpus/x/x/ /model/x/x/
- etc
Corpus will be bundled in /corpus in the container. It is also mounted over the top of the container so you will not need to rebuild if yout make changes to it.
Models will be bundled in /models in the container. It is also mounted over the top of the container so you will n ot need to rebuild if yout make changes to it.
Normalized text filed are bundled in /normalized in the container. It is also mounted over the top of the container so you will not need to rebuild if yout make changes to it.