psiayn / goheiko Goto Github PK

View Code? Open in Web Editor NEW

8.0 3.0 2.0 74 KB

Heiko reinvented in go!

Go 100.00%

goheiko's Introduction

Heiko

Heiko rewritten in go!

Heiko is a lightweight distributed node manager ( at least aims to be that ).

Installation

Using Go Get

go get github.com/psiayn/heiko

From Source

git clone https://github.com/psiayn/heiko.git
cd heiko
go install .

Usage

General overview.

Usage:
  heiko [command]

Available Commands:
  help        Help about any command
  init        Runs initialization of Jobs
  start       Start a new heiko job
  stop        Stops a running heiko daemon

Flags:
      --config string   config file (default is $PWD/.heiko/config.yaml)
  -h, --help            help for heiko
  -n, --name string     Unique name to give (or given) to this heiko job

Use "heiko [command] --help" for more information about a command.

Heiko uses a config.yml to store info about jobs and nodes of the cluster. A sample config has been provided in examples/sample-config.yml. The default path for the config is at .heiko/config.yml in the current directory where you would like to start heiko from. You can also specify config manually.

Authentication

By default Heiko uses SSH keys for authentication. If no path to keys are specified, Heiko will attempt to generate a keypair at ~/.ssh/heiko/ and transfer them to the node (user will be prompted for auth in this case).

If on the other hand keys are specified, Heiko will directly attempt to establish a connection using the key (user is responsible to have transferred the keys prior to usage).

Finally, heiko does support the use of SSH passwords for authentication. Although, it is advised not to use passwords as they are stored as plain text in the config file.

Basic Usage

heiko start/init --config path/to/config

You can initialize heiko, which for now runs the init jobs from your config.yml. More about the config can be found in Wiki.

heiko init -n <name you want to give>

Starting heiko in normal mode

heiko start -n <name you want to give>

Starting heiko in daemon mode

heiko start -n <name you want to give> -d

Once your in daemon mode, you can stop the daemon as follows.

heiko stop -n <name of the daemon you gave earlier>

goheiko's People

Contributors

Stargazers

Watchers

Forkers

gituser143 u5surf

goheiko's Issues

More schedulers

A scheduler in heiko is the part that chooses which node to run a given task on. Tasks are sent over a task channel. The scheduler receives tasks over this channel and runs that task on a node (in a new goroutine).

Currently (as of fb8b464) heiko has just one scheduler - a random scheduler that selects a node at random.

A few ideas for new schedulers:

Cyclic scheduler: cycle through the list of nodes. For example, if a task runs on node 1, the next task will run on node 2 and so on until it wraps around to node 1 and continues.
Least loaded scheduler: run on the node which is running the least number of tasks. Will need to keep track of tasks running per node. Maybe a priority queue too.
The heiko scheduler: this was the scheduler used in the python implementation of heiko (TODO: update link once it's moved). It uses a bunch of information from the nodes such as number of CPUs, amount of memory, amount of free memory, current load, etc. and prioritizes nodes based on them. Doesn't require keeping track of number of running tasks, but will need to run a set of commands repeatedly on the nodes to get all of this information on a regular basis.

Before implementing new schedulers it would be a good idea to have an interface for a scheduler. If you look at the random scheduler, only one line of it actually does the job of choosing a node to run the task on: https://github.com/psiayn/heiko/blob/fb8b464ec2d4308fde9a4d6cfbcc2762d2851783/internal/scheduler/scheduler.go#L41
Everything else is boilerplate that will have to be repeated if a new scheduler is added with the current code structure.

Tasks to complete before (or along with) implementing a new scheduler:

Refactor scheduler code to be modular - to reduce boilerplate. Maybe using interfaces, nameless fields (composition), a combination of them or something else.
Add a flag in the CLI (under cmd/, uses cobra) to select a scheduler
Map strings to scheduler functions/structs for the above

Infinite loop when all nodes are down

The issue

When all nodes are down (or you have only one node and it's down), heiko keeps trying to connect repeatedly. This leads to high CPU usage (6-10% out of 800% on my laptop, which isn't very high but is abnormal).

Expected behavior

heiko should slow down trying to connect to a node when connections repeatedly keep breaking, have some timeout between connections.

Possible solution(s)

One idea I could think of:

Store a boolean indicating whether the last connection failed or not, for each node. Defaults to true (no failure)
Before connecting, if this is true, add some sleep before connecting
If connection fails, set this to true

One improvement to this would be to store the last 3 (or 5, or N) failures, increase the sleep time as more failures occur.

This puts the responsibility of choosing an appropriate node on the scheduler. If it chooses a node that fails constantly, it will lead to delayed execution.

Init command improvements.

As of right now init command only runs commands specifies in the config file.
Init should in theory do the following:

setup connections ( solved by PR #5 )
sync data across nodes

Separate out node and job configuration

The Problem

Currently, all of the heiko configuration is one file .heiko/config.yml. This includes both nodes and jobs. This makes it difficult for the config to be shared (in a git repository for example) without exposing details of the user's nodes which can possible include passwords. The idea is this

Allow projects to have a .heiko directory (like here) that only has details of what tasks are needed to run this project using heiko. Similar to a Dockerfile or docker-compose.yml. Having a .heiko directory in different projects also increases exposure to heiko (to some extent).
Allow users to have a common set of nodes that they can use for any project.

Proposed Solution

Break up config.yml into two files jobs.yml and nodes.yml.

jobs.yml will live in .heiko/jobs.yml and be local to each project. It only contains the jobs part of the configuration. This allows it to be added to version control.
nodes.yml can be in .heiko/nodes.yml (local to the project, but not checked into version control) or if that does not exist, the user's global ~/.heiko/nodes.yml will be used. This way, the same nodes can be used for multiple projects.

Issues in implementation

Some possible issues:

we might have to keep two separate configuration variables, that can become difficult to manage and might change signatures of some functions.

PID File is supposed to be deleted on heiko stop.

PID File generated in daemon mode is supposed to be deleted via the context.Release() call. As of right now, any form of interrupt ( SIGTERM / SIGINT ) doesn't seem to change it's behaviour.

Also go-daemon seems to be pretty old, could that be the issue?

Use ssh keys instead of password based auth

Copying of keys ( if not done already ) should probably be done in the init phase.

Tilde (~) not recognized in key file path

The issue

Heiko allows a custom SSH private key to be used for auth, instead of the one generated by heiko. This can be set in the configuration as shown:
https://github.com/psiayn/heiko/blob/fb8b464ec2d4308fde9a4d6cfbcc2762d2851783/examples/sample-config.yml#L9

Generally, these keys are stored in ~/.ssh (which is $HOME/.ssh). Providing the path with the ~ (tilde) in it is not recognized by heiko, it exits with this error:

init: SSH Key ~/.ssh/somekey.pub for node somenode does not exist: stat ~/.ssh/somekey.pub: no such file or directory

The solution

When the paths are read on this line:
https://github.com/psiayn/heiko/blob/fb8b464ec2d4308fde9a4d6cfbcc2762d2851783/internal/config/sshSetup.go#L110-L112

If it starts with ~/, it must be expanded appropriately. Possible approaches for this are here and here.

psiayn / goheiko Goto Github PK

goheiko's Introduction

Heiko

Installation

Using Go Get

From Source

Usage

Authentication

Basic Usage

goheiko's People

Contributors

Stargazers

Watchers

Forkers

goheiko's Issues

The issue

Expected behavior

Possible solution(s)

The Problem

Proposed Solution

Issues in implementation

The issue

The solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs