lilypad-tech / lilypad Goto Github PK

View Code? Open in Web Editor NEW

42.0 11.0 10.0 4.75 MB

run AI workloads easily in a decentralized GPU network. https://www.youtube.com/watch?v=zeG2F-JANjI

Home Page: https://lilypad.tech

License: Apache License 2.0

Go 56.34% Dockerfile 1.26% Solidity 14.69% Shell 1.70% TypeScript 24.18% Cuda 1.84%

ai blockchain compute crypto mistral-7b sdxl stable-diffusion wasm web3 depin

lilypad's Introduction

LIKE THIS PROJECT?

PLEASE STAR US AND HELP US GROW! <3

Lilypad 🍃

Lilypad enables users to run containerised AI workloads easily in a decentralized GPU network, where anyone can get paid to connect their compute nodes to the network and run container jobs. Users have access to easily run jobs such as Stable Diffusion XL and cutting edge open source LLMs both on chain, from CLI and via Lilypad AI Studio on the web.

Visit the Lilypad Docs site for a more comprehensive overview to getting up and running including a Quick Start Guide

Getting started running container jobs on Lilypad

Jobs (containers) can be run on Lilypad by utilising the Installable CLI, also available for installation through the Go toolchain. After setting up the necessary pre-requisites, the CLI enables users to run jobs as described below:

lilypad run cowsay:v0.0.4 -i Message="moo"

The current list of modules can be found in the following repositories:

Containerised job modules can be built and added to the available module list; for more details visit the building a job documentation. If you would like to contribute, open a pull request on this repository to add your link to the list above.

Getting started running a Node on Lilypad Network

As a distributed network Lilypad also brings with it the ability to run as a node and contribute to the GPU and compute capabilities. See the documentation on running a node which contains more details instructions and overview for getting set up.

The Lilypad Community

Read our Blog

Join the Discord

Check out our videos on YouTube

lilypad's People

Contributors

Stargazers

Watchers

Forkers

tokenmoon developersteve spencerkinney likhita-8091 lilypad-tech ardata-tech rhochmayr chaobingya kquirapas eltociear

lilypad's Issues

module parameters are untrusted input and should be treated as such

https://github.com/bacalhau-project/lilypad/blob/main/module_templates/cowsay.json for example doesn't escape the contents of the {{.Message}} variable, we need to upgrade our template language to support string escaping.

[core] Diagram the infra to run a job

https://www.notion.so/lilypadnetwork/Lilypad-Network-layout-and-onchain-job-flow-diagram-e6ecd10e4fdc42d7a500d2aebe9a920d

Error: No solver service specified - please use SERVICE_SOLVER or --service-solver

Using the mac m1 binary provided by Hiro Hamada here https://bacalhauproject.slack.com/archives/C055K39J9QW/p1696537153091329?thread_ts=1696438693.499459&cid=C055K39J9QW.

Running: lilypad run github.com/username/repo:tag -i Message=moo
Error: No solver service specified - please use SERVICE_SOLVER or --service-solver
export SERVICE_SOLVER="0x3C44CdDdB6a900fa2b585dd299e03d12FA4293BC" solves this.

Running: lilypad run github.com/username/repo:tag -i Message=moo. again
Error: No mediators services specified - please use SERVICE_MEDIATORS or --service-mediators
export SERVICE_MEDIATORS="0x90F79bf6EB2c4f870365E785982E1f101E93b906"

Works fine .

Headless CLI

Request/suggestion from someone at the Ethglobal hackathon is to add a --silent flag to the lilypad run command that nulls all output except the returned output result directory response. This would allow the CLI to be built into a serverside/code workflow more seamlessly.

[core] Looks like bacalhau is connecting to peers even in private mode, let's fix this

bacalhau-project/bacalhau#3095

Lilypad module local template file referencing

As a module builder it would be handy to include a -f /directory/lilypad_module.json.tmpl during module build testing, referencing instead from a local development directory to speed up the module development process. This would remove the need to run lilypad jobs from tagged github deployments.

export private key bash history dangers

https://protos.com/brazilian-crypto-streamer-loses-60k-after-showing-private-keys-recovers-it/#:~:text=A%20Brazilian%20crypto%20streamer%20lost,but%20it%20was%20too%20late.

In the docs one is asked to export their ethereum private key into a bash variable, this exposes it in bash history and thus reverse search.

If its possible I would change this setup to use docker compose with the .env_file variable, so one can create a .env and run the docker container that way

[core] Update local dev readme, devnet readme and remove old contributing and testnet deploy readmes.

[core] Tests scenarios for job flows

[core] Pack together the bacalhau node and resource provider service into a docker container

[core] Isolate the mediator service into a docker container and add to CD

Local network usage of Arbitrum

Arbitrum

The current infra has a docker instance for the chain, it exposes two ports one for http and one for ws.
Once the chain container starts we need to 1) fund the admin account and 2) boot (fund accounts, compile and deploy contracts).

Let's keep the same processes/api/flow and if possible the same addresses for accounts and contracts.

[core] Update contributing to reflect local development setup

Closing since this is being tracked here

[core] Add integration tests to CI for lilypad

If anything at least a single test that can run a whole job to prevent regressions from being released.

Apple Silicon support for Lilypad RPs

[core] How to trigger/work with the mediator?

What's the expectation here?

If an audit fails, do we get an output?
Where does the alarm sound?
Does it run a % of the time with all jobs?

Fix error: module does not exist

This happens when the code at pkg/module/utils.go checks if the directory exists but not the actual code inside the repo (also checks that .git exists).

Potential fix: check for the templ file and if does not, trigger the process to clone the repo.

The current workaround (remove /tmp/lilypad/data/repos) works, because it removes the empty dir and the next execution clones the repo.

cc/ @noryev

Schedule jobs to run with Lilypad

We want to be ready to load up the Lilypad network with jobs that RPs can run when the incentivized testnet is launched. A simple way of doing this is using cron or systemd timers as a JC.

To start just get the scheduling running with any job (like the cowsay example). Later, we can aim to get it working with something more useful (like a data processing flow that uses LLMs).

New Public Issue

Specs and programatic tests for consensus mechanism

"JC has already agreed"

I was testing cancelling the spinner on SIGINT, which meant hitting ^C up enter a bunch of times on lilypad run. After doing this 3-4 times, I started getting the error above. After getting into this state, it seems never to recover. An issue in the smart contracts perhaps?

[infra] Add notifications when cloudflare tunnels are down/degraded

Docs: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/monitor-tunnels/notifications/

Once lilypad has pulled a module's git repo, it never updates it

Lilypad clones and uses the contents of git repos when using modules.

The code for this is here:
https://github.com/bacalhau-project/lilypad/blob/main/pkg/module/shortcuts/shortcuts.go
https://github.com/bacalhau-project/lilypad/blob/main/pkg/module/utils.go

The repos get cloned both in the CLI/client (job creator) and the server (resource provider).

Currently if you're a module developer, if any lilypad process (client or server) clones a repo, then you push more changes to it, the client or server will look in the existing cached checkout for the new refs (e.g. tags) and fail to find them, resulting in errors like #9

One solution would be to always pull the git repo before trying to find a ref in it (repo.ResolveRevision), but the problem with that is that it makes github.com (or wherever people are hosting their modules) into a single point of failure for our nice decentralized system. It would be better to only pull updates into the git repo if finding the ref fails. This way, if nodes have already run modules (which will normally be the case) and even if github is down, they'll still be able to proceed from their local cached state.

So:

In the error handler for ResolveRevision https://github.com/bacalhau-project/lilypad/blob/main/pkg/module/utils.go#L89-L93, if the error is "ref not found" then try doing a git pull and then try to resolve the revision again. Only error if it fails the second time

Ideally write a test to cover this case. A fix without a test is acceptable though!

Test Board Item

[core] Add a faucet to the devnet infra with CD

Add Automatic Pricing Update Feature to Solver Control Loop

Currently, the pricing for the onchain job manager is set manually by administrators, which is not ideal for maintaining a dynamic and efficient marketplace. To address this, we propose adding an automatic pricing update feature to the solver control loop. This will enable the solver to dynamically adjust the pricing based on the current market conditions and resource offers.

Proposed Changes

Implement a mechanism in the solver control loop that periodically queries the current pool of resource offers to determine the current market price.
Utilize this information to automatically update the pricing by calling the setRequiredDeposit function on the contract.
Repeat this process at regular intervals to ensure that the pricing remains up-to-date and responsive to changing market conditions.

Are RPs downloading any data that is uploaded to them?

Ensure RPs are not running arbitrary code when uploaded via a Lilypad run + Github repo command.

[core] Check go installation of `lilypad` works

Reference: https://discord.com/channels/1212897693450641498/1237401839603814410/1237402224146124851

[core] local IPFS node?

Is this the reason why jobs take too long

Lilypad-ify four new modules for SDXL & Mistral-7B fine-tuning and inference

We want four new Lilypad modules:

sdxl-finetune
sdxl-inference
mistral-finetune
mistral-inference

Dockerfiles:

Docker images:

They should copy the formula of: https://github.com/bacalhau-project/lilypad-module-lora-training and https://github.com/bacalhau-project/lilypad-module-lora-inference which is the same for stable diffusion 1.5

Here are the commands you need to run inside the containers:

sdxl-finetune

bind-mount /config.toml, /input and /output
config.toml should contain

# for sdxl fine tuning

[general]
enable_bucket = true                        # Whether to use Aspect Ratio Bucketing

[[datasets]]
resolution = 1024                           # Training resolution
batch_size = 4                              # Batch size

  [[datasets.subsets]]
  image_dir = '/input' # Specify the folder containing the training images
  caption_extension = '.txt'                # Caption file extension; change this if using .txt
  num_repeats = 10                          # Number of repetitions for training images

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py \
	  --pretrained_model_name_or_path=./sdxl/sd_xl_base_1.0.safetensors \
  	--dataset_config=/config.toml \
  	--output_dir=./output \
  	--output_name=lora \
  	--save_model_as=safetensors \
  	--prior_loss_weight=1.0 \
  	--max_train_steps=400 \
  	--vae=madebyollin/sdxl-vae-fp16-fix \
  	--learning_rate=1e-4 \
  	--optimizer_type=AdamW8bit \
  	--xformers \
  	--mixed_precision=fp16 \
  	--cache_latents \
  	--gradient_checkpointing \
  	--save_every_n_epochs=1 \
  	--network_module=networks.lora

The input should be a folder of images with captions of images in text files e.g. foo.jpg should have a foo.txt with a caption
Based on https://github.com/kohya-ss/sd-scripts

sdxl-inference

Given an input lora in bind-mounted /input directory, inference is then just:

accelerate launch --num_cpu_threads_per_process 1 sdxl_minimal_inference.py \
	--ckpt_path=sdxl/sd_xl_base_1.0.safetensors \
	--lora_weights=/input/lora.safetensors \
	--prompt="cj hole for sale sign in front of a posh house with a tesla in winter with snow" \
	--output_dir=/output

mistral-finetune

accelerate launch -m axolotl.cli.train examples/mistral/qlora-instruct.yml

mistral-inference

accelerate launch -m axolotl.cli.inference examples/mistral/qlora-instruct.yml

[infra] Decentralize solvers

Mac and ARM builds

We should ship builds for Mac and for ARM.

Go is good at cross-compiling binaries, so we just need to:

Update the CI script to cross-compile, here: https://github.com/bacalhau-project/lilypad/blob/main/.circleci/config.yml#L16

This might be as simple as

GOOS=darwin GOARCH=arm64 go build -o build/Darwin-arm64/lilypad
GOOS=darwin GOARCH=x86_64 go build -o build/Darwin-x86_64/lilypad
GOOS=linux GOARCH=arm64 go build -o build/Linux-arm64/lilypad
GOOS=linux GOARCH=x86_64 go build -o build/Linux-x86_64/lilypad

Adjust the CI script to include those paths in the workspace https://github.com/bacalhau-project/lilypad/blob/main/.circleci/config.yml#L20 - when this is done, they should all start appearing as downloadable binaries in the github releases https://github.com/bacalhau-project/lilypad/releases/tag/v2.0.0-e52556d (need to give them different names or something?)
Update the installation instructions https://github.com/bacalhau-project/lilypad#install-cli to tell the user to download the right version. You can parse uname output inline in the curl command to automate this, or whatever to suit your tastes
Test on an M1 Mac, or ask a friend who has one
Remove the "1. With Go toolchain" section from the readme, that was just a workaround for not having multiple architectures/OS builds. That info could move to CONTRIBUTING.md or just get deleted

Maybe even do Windows for extra points? lilypad.exe 😎

Fix Australia GPU nodes

[core] Define release strategy

We want to define the release strategy. Currently, we have CI+CD pipelines on changes to main for devnet, with the intention of testing things there and then releasing to testnet (manually). We may want to take a look at Release please for the strategy.

We also want to add

semver to releases (that are in sync or work ok with go versioning)

cc @AquiGorka @bgins

[core] Specs: Pricing mechanism

Figure out how to define pricing to run jobs.
@Zorlin has the idea of using steps for a job to define effort which can be a good 1:1 to price.

[core] Security audit

[core] Do we need any extra infra for prometheus/grafana?

Specs: core automations

How to update contracts?
How to deploy? Manual? Set the pk in ci seems to risky
Does it make sense to have CI+CD for upgrading contracts?
Tests

Improve cli installation experience

Regards to the following comment: https://discord.com/channels/1212897693450641498/1222749187804758057/1238288718792622120

Ping a discord channel when network is down

[core] Add tracing to the lilypad job lifecycle (use open telemetry standard)

We would like to trace the events in the job lifecycle to determine where jobs fail or execute slowly. In a broad view, the job lifecycle starts with a job offer and resource offer that are sent to the solver. The solver matches the offers to create a deal. The resource provider executes the job, and sends the job creator a reference to the results.

We can break this task into a few pieces:

Co-authored by @bgins

[core] Continuous deployment for smart contracts

Specs: how to correctly match Jobs to GPU sizes

Current workaround: RAM hack (use RAM to determine job matching).

The goal of this task is to define how to correctly do the matching so that when works gets started on it whoever takes on the task has clarity on how to do it.

Lilysaas architecture specs for refactor + refactor kickoff

[cron] Kickout misbehaving nodes

See Wings' comment below

Curl not returning anything

https://stackoverflow.com/questions/67880900/curl-doesnt-return-anything

Escaped json values are awkward

We json encode the string inputs to modules (i.e. the -i in lilypad run cowsay:v0.0.1 -i Message=moo) here: https://github.com/bacalhau-project/lilypad/blob/main/pkg/module/utils.go#L191-L199

Users are not trusted by nodes. Nodes should be able to trust module authors. If user input can make arbitrary changes to the job spec, this breaks the trust relationship.

Hence we want to avoid users doing nefarious things like including newlines and quotes in the untrusted user input.

This means that values in the json template, like https://github.com/bacalhau-project/lilypad-module-cowsay/blob/main/lilypad_module.json.tmpl#L16

like {{.Message}}, don't need to be escaped by the user with double quotes, but instead get automatically quoted for them. (Because the json serialization of a string has double quotes around it).

However, this makes it very awkward to substitute a value into an existing string, which many module authors want to do. In fact, we want to do it ourselves e.g. here: https://github.com/bacalhau-project/lilypad-module-sdxl/blob/main/lilypad_module.json.tmpl#L24

It would be much nicer to write

                    "PROMPT={{if .Prompt}}{{.Prompt}}{{else}}question mark floating in space{{end}}",

instead of

                    {{if .PromptEnv}}{{.PromptEnv}}{{else}}"PROMPT=question mark floating in space"{{end}},

And then the user would be able to specify the prompt like lilypad run sdxl:v0.9-lilypad1 -i Prompt="hoo haw" instead of lilypad run sdxl:v0.9-lilypad1 -i PromptEnv="PROMPT=hoo haw" which is sad and annoying. But of course that would be insecure, because the user could write unquoted newlines and double quotes into PromptEnv and mess up the template.

To solve this, we need a way to securely substitute strings into other strings while still ensuring no JSON escape.

In text/template, which we use to parse the template you can define custom functions: https://pkg.go.dev/text/template#Template.Funcs

So,

define a custom function which allows, given an untrusted user input string and, say, a printf style template string, for the module author to write:

                    {{subst "PROMPT=%s" .Prompt}},

Or something like that, where subst would:

JSON decode .Prompt (we still want it encoded by default, for security reasons)
printf it into the first given argument
JSON encode the resulting string for security

[core] Specs: Market matching

Need to put in research here to figure out how to do something like this, find someone that's done this before and copy them (hopefully clone someone that's been audited)

Spec allowing prices to float for jobs instead of being fixed by the network.
Once the spec is ready, start development on floating prices

[infra] Unit of effort

Part of: #56

Enable different billing for different size jobs within an individual module (steps = 500 vs steps = 50 - maps one-to-one to execution time)

Keep in mind:

Abusable (setting infinite steps on all nodes)
Important to prevent abuse as this lets us know how much to pay for executing jobs

[core] Limit containers that can be executed

Every official node should follow the allowlist defined in https://github.com/orgs/Lilypad-Tech/projects/4/views/2?pane=issue&itemId=60261819