This is an interesting endeavor - we want to move data from the hosted Machine API to

S3 hosted machine and ClearML integration about machine HOT 4 CLOSED

johnml1135 commented on June 2, 2024

S3 hosted machine and ClearML integration

from machine.

Comments (4)

johnml1135 commented on June 2, 2024

So, how is each job structured? We need generic corpora (we have files that have the same name, different languages and may end up conflicting because multiple corpora are selected...
Assume two folders:

S3:/jobs/<my_job_id>/src
S3:/jobs/<my_job_id>/trg
These files are added:
For each file in a corpus, the files are named <corpus_id>.<normal_name>.txt (already made into a txt by the time it's here)
- Since the files mirror in each folder, that is how
- The text files may start with a "ID\t" format for Bible references
For keyterms, add files but with ".keyterm" or some such extension
Add the config.yaml file in the root, calling out the source and target text, the keywords and the parent model.
After the whole process is done, the translated files would be at:
S3:/jobs/<my_job_id>/translations/<normal_name>.txt
This could be built in a few stages:

Update machine.py to use the S3 buckets
Update machine.py to look for these specific files in this specific way
Update machine to use s3
Update machine to push the specific files in the specific way

from machine.

ddaspit commented on June 2, 2024

Most of the job code has already been implemented. I found a bug in the .NET library that I am using to access S3. I am working on fixing that now. Once that is fixed, we just need S3 buckets to point at. Here is the current structure:

s3://bucket_name/builds/build_id/
- train.src.txt: source sentences from all files in all corpora
- train.trg.txt: target sentences from all files in all corpora
- pretranslate.src.json: source sentences to pretranslate
- pretranslate.trg.json: pretranslated target sentences

from machine.

ddaspit commented on June 2, 2024

Downloading parent models and uploading child models have not been implemented yet. Here is the planned structure:

s3://bucket_name/parent_models/lang/: parent model for target language lang
s3://bucket_name/models/engine_id/: child model for engine engine_id

from machine.

johnml1135 commented on June 2, 2024

So, to get the secrets, etc. for machine and machine.py, would we do this?:

For Machine, use Rancher secrets and pull it into a configuration file as environment variables
For Machine.py, we should be able to use user based secrets. But then we would also need to register the clearml user in machine with Rancher secrets as well.

Does this match your understanding?

from machine.

Recommend Projects

S3 hosted machine and ClearML integration about machine HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs