GithubHelp home page GithubHelp logo

cliffpracht / metacontroller-mpi-operator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from piersharding/metacontroller-mpi-operator

0.0 0.0 0.0 53 KB

MPI Operator based on MetaController

License: Apache License 2.0

Makefile 9.38% Shell 5.97% Dockerfile 19.93% Python 64.72%

metacontroller-mpi-operator's Introduction

MPI Operator

Developed with MetaController and based on https://github.com/everpeace/kube-openmpi and https://github.com/kubeflow/mpi-operator.

This MPI Kubernetes Operator provides a Kubernetes native interface to building MPI clusters and running jobs.

Deploy

First you must have MetaController:

make metacontroller

Next deploy the Operator:

make deploy

Test

An MPI cluster relies on a base image that encapsulates the MPI application dependencies and facilitates the MPI communication. An example of this is the included mpibase image, which can be built using:

make build_mpibase && make push_mpibase

You can use the default images on Docker Hub or you must ensure that you configure your own Docker registry details by setting appropriate values for:

PULL_SECRET = "gitlab-registry"
GITLAB_USER = you
REGISTRY_PASSWORD = your-registry-password
GITLAB_USER_EMAIL = "[email protected]"
CI_REGISTRY = gitlab.somewhere.com
CI_REPOSITORY = repository/uri
MPIBASE_IMAGE = $(CI_REGISTRY)/$(CI_REPOSITORY)/mpibase:latest

set in PrivateRules.mak

Launch the helloworld job:

make test

Once everything starts, the logs are available in the launcher pod.

Scheduling modes

The CRD for MPIJobs has two parameters: replicas(int) and daemons(boolean). Specifying only replicas will leave it up to the scheduler where to place the worker pods on the cluster, but if in addition daemons is set to true (see mpi-test-demons.yaml) then the Pod AntiAffinity rules are applied and the Kubernetes scheduler will force the workers onto individual nodes - if available. initContainers check availability of the workers, prior to executing the launcher, so if any Pods are stuck in Pending then they are dropped out of the worker list.

metacontroller-mpi-operator's People

Contributors

piersharding avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.