Comments (4)
Could we also provide a default value for replicas
(1) and tfReplicaType
(master)?
That would make using TfJob for single VM training a breeze.
from training-operator.
Yes. I think we should try to provide sensible defaults for as much as possible to make it as easy possible to get started. I just want to be specific in the GitHub issue.
from training-operator.
#31 is out fore review.
from training-operator.
Merged
from training-operator.
Related Issues (20)
- Migrate to controller-runtime logger HOT 3
- Support CertManager for the Webhook cert generation HOT 1
- Unable to start elastic PyTorchJob example HOT 5
- Commonize webhook validations at the some points
- Update developer documentation for arm HOT 1
- Aunpun1.00 HOT 1
- Update pytorch launcher component in Kubeflow Pipelines repository HOT 2
- Update developer guide to handle missing training-operator-webhook-cert HOT 2
- Job Status is failed, when scale-in ps. HOT 4
- Failed K8s nodes leave jobs hanging indefinitely HOT 3
- Update examples for `train` API HOT 1
- [Question] Training Operator v1.8 Release Date HOT 1
- Why manifests/base/service.yaml does not include webhook server port (443) in version 1.7.0~1.5.0? HOT 7
- Not getting Kubeflow Training SDK v1.7 when installing `kubeflow-training` HOT 13
- Flaky Test: [It] should create desired Pods and Services: Distributed TFJob (4 workers, 2 PS) is succeeded
- MPIJob requires service names for the pods. HOT 3
- Add DeepSpeed Example with MPI Operator HOT 8
- chore(style): provide type for `STORAGE_INITIALIZER_VOLUME` constant
- fix(compatability): match-case syntax only compatible with Python3.10 HOT 3
- Export Fine-Tuned LLM after Trainer is Complete HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from training-operator.