Comments (3)
If you want to install the training-operator and mpi-operator into the same cluster, we need to disable the MPIJob in the training-operator like this: #1972 (comment)
Also, I would recommend using the MPIJob v2 (mpi-operator) since we are preparing the deprecation and removal of the MPIJob v1. #1906
from training-operator.
/kind support
from training-operator.
@tenzen-y: The label(s) kind/support
cannot be applied, because the repository doesn't have them.
In response to this:
/kind support
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from training-operator.
Related Issues (20)
- Support MLX on Kubernetes with Kubeflow HOT 2
- Migrate to controller-runtime logger HOT 3
- Support CertManager for the Webhook cert generation HOT 1
- Unable to start elastic PyTorchJob example HOT 5
- Commonize webhook validations at the some points
- Update developer documentation for arm HOT 1
- Aunpun1.00 HOT 1
- Update pytorch launcher component in Kubeflow Pipelines repository HOT 2
- Update developer guide to handle missing training-operator-webhook-cert HOT 2
- Job Status is failed, when scale-in ps. HOT 4
- Failed K8s nodes leave jobs hanging indefinitely HOT 3
- Update examples for `train` API HOT 1
- [Question] Training Operator v1.8 Release Date HOT 1
- Why manifests/base/service.yaml does not include webhook server port (443) in version 1.7.0~1.5.0? HOT 7
- Not getting Kubeflow Training SDK v1.7 when installing `kubeflow-training` HOT 13
- Flaky Test: [It] should create desired Pods and Services: Distributed TFJob (4 workers, 2 PS) is succeeded
- Add DeepSpeed Example with MPI Operator HOT 8
- chore(style): provide type for `STORAGE_INITIALIZER_VOLUME` constant
- fix(compatability): match-case syntax only compatible with Python3.10 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from training-operator.