GithubHelp home page GithubHelp logo

Comments (7)

richardliaw avatar richardliaw commented on June 26, 2024 1

Will keep this issue open to make sure we write up a full example though!

from tune-sklearn.

richardliaw avatar richardliaw commented on June 26, 2024

Great question :) It is entirely possible to deploy tune-sklearn on spot instances as you would do in standard Ray code.

We can put together a full example later (or simply add it to the tune documentation). I think failure handling should work out of the box.

In your script:

# sklearn.py
ray.init(address="auto")
clf = TuneSearchCV(...)
clf.fit()

and

ray up <yaml>
ray submit <yaml> sklearn.py

from tune-sklearn.

rohan-gt avatar rohan-gt commented on June 26, 2024

Ah amazing!

from tune-sklearn.

rohan-gt avatar rohan-gt commented on June 26, 2024

Reopened the issue for the full example. I'm still not completely sure how checkpointing and fault tolerance would be implemented using the tune-sklearn API

from tune-sklearn.

richardliaw avatar richardliaw commented on June 26, 2024

ah, I think tune-sklearn actually automatically should handle that for you.

from tune-sklearn.

richardliaw avatar richardliaw commented on June 26, 2024

I just looked into this, and I think there are some missing RayTune parameters to be supported (i.e., tune.run(max_retries=X)), so I'll submit a PR to do this.

Hypothetically, you can do something like this:

  1. Start with a YAML
# Save as `ray_cluster.yaml`
cluster_name: ray-tune-cluster
provider: {type: aws, region: us-west-2}
auth: {ssh_user: ubuntu}
min_workers: 3
max_workers: 3

# Deep Learning AMI (Ubuntu) Version 21.0
head_node:
 - InstanceType: c5.xlarge, 
 - ImageId: ami-0b294f219d14e6a82, 
 - InstanceMarketOptions: 
   - MarketType: spot

worker_nodes: 
 - InstanceType: c5.xlarge, 
 - ImageId: ami-0b294f219d14e6a82, 
 - InstanceMarketOptions: 
   - MarketType: spot
  1. On any Tune script, you should be able to add 2 lines:
+ray.init(address="auto")
-search = TuneSearchCV(...)
+search = TuneSearchCV(..., max_retries=3)
  1. Run this on a Ray cluster. You'll be able to leverage multiple nodes for your search:
$ ray submit ray_cluster.yaml <your_script.py>

The above submit command is equivalent to:

$ ray attach ray_cluster.yaml  # ssh
ubuntu@ip-172-31-24-53:~$ python <your_script.py>

It will also 1. retry failed trials and 2. restart failed nodes. @rohan-gt is this what you're thinking about?

from tune-sklearn.

rohan-gt avatar rohan-gt commented on June 26, 2024

@richardliaw yes this is fine

from tune-sklearn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.