GithubHelp home page GithubHelp logo

Comments (4)

johnml1135 avatar johnml1135 commented on June 2, 2024

So, how is each job structured? We need generic corpora (we have files that have the same name, different languages and may end up conflicting because multiple corpora are selected...
Assume two folders:

  • S3:/jobs/<my_job_id>/src
  • S3:/jobs/<my_job_id>/trg
    These files are added:
  • For each file in a corpus, the files are named <corpus_id>.<normal_name>.txt (already made into a txt by the time it's here)
    • Since the files mirror in each folder, that is how
    • The text files may start with a "ID\t" format for Bible references
  • For keyterms, add files but with ".keyterm" or some such extension
  • Add the config.yaml file in the root, calling out the source and target text, the keywords and the parent model.
    After the whole process is done, the translated files would be at:
  • S3:/jobs/<my_job_id>/translations/<normal_name>.txt
    This could be built in a few stages:
  • Update machine.py to use the S3 buckets
  • Update machine.py to look for these specific files in this specific way
  • Update machine to use s3
  • Update machine to push the specific files in the specific way

from machine.

ddaspit avatar ddaspit commented on June 2, 2024

Most of the job code has already been implemented. I found a bug in the .NET library that I am using to access S3. I am working on fixing that now. Once that is fixed, we just need S3 buckets to point at. Here is the current structure:

  • s3://bucket_name/builds/build_id/
    • train.src.txt: source sentences from all files in all corpora
    • train.trg.txt: target sentences from all files in all corpora
    • pretranslate.src.json: source sentences to pretranslate
    • pretranslate.trg.json: pretranslated target sentences

from machine.

ddaspit avatar ddaspit commented on June 2, 2024

Downloading parent models and uploading child models have not been implemented yet. Here is the planned structure:

  • s3://bucket_name/parent_models/lang/: parent model for target language lang
  • s3://bucket_name/models/engine_id/: child model for engine engine_id

from machine.

johnml1135 avatar johnml1135 commented on June 2, 2024

So, to get the secrets, etc. for machine and machine.py, would we do this?:

  • For Machine, use Rancher secrets and pull it into a configuration file as environment variables
  • For Machine.py, we should be able to use user based secrets. But then we would also need to register the clearml user in machine with Rancher secrets as well.

Does this match your understanding?

from machine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.