GithubHelp home page GithubHelp logo

Comments (2)

ncclementi avatar ncclementi commented on July 26, 2024

Thank you @scharlottej13 for opening this issue and starting the conversation.

Currently, the tutorial is a bit long, it's taking between 1.5-2h to complete and there are certain topics that can probably be removed to make the material easier to chew.

Since this tutorial is targeted as introductory I think we can remove:

  • Cut Futures section
    • This is a more advanced topic and we might want to just remove the section.
  • Avoid Single Machine vs Distributed Schedulers explanation as we recommend people use the distributed scheduler
    • This will imply redesigning the notebooks to use the distributed scheduler since the beginning, which will also help in exposing the dashboard earlier.

cc: @pavithraes @rrpelgrim as you have taught similar tutorials, what's your experience with these topics, is there anything else that might be confusing/advanced for beginners?

from dask-mini-tutorial.

avriiil avatar avriiil commented on July 26, 2024

Thanks for starting this conversation @scharlottej13!

I agree with @ncclementi, I think the tutorial atm tries to be too exhaustive and complete (lots of task graphs, starting off with Delayed, etc.) which can be intimidating for novice users. I think we should move towards 'wow-ing' people with the power of Dask first...and only then explaining how it works.

The analogy we're using in evangelism atm is that we want to show people a shiny race car, get them to step in and take it for a test drive (no mechanic skills or understanding of the inner workings of the engine needed here), and then be super impressed by the results. If at that point people are like - "Hey, how does this actually work?" or "Hey, can I take out the engine and build my own car/hovercraft/spaceship?"...then we can dive into that.

With that in mind, what I've been doing is:

  1. Start with a no-code slide Deck to build intuition and excitement around what Dask is and the problems it can solve for you -- ~10 minutes

-- move to notebooks --

  1. Start with a quick flashy 'showing off' of the various Dask race cars: Dask.dataframe to scale pandas, Dask.array to scale numpy, Dask.ml to scale sklearn and a very quick sneak-peak into the engine with a simple dask.delayed example (to tease any intermediate/expert users in the room) -- ~10 minutes

  2. Then jump into the Dask.dataframe and take it for a test drive. Show them how to move from pandas to Dask and how to control that car (API, etc.) -- ~20 minutes

  3. Then jump into the Dask-ML car and take it for a drive -- ~15 minutes

  4. Then say - "Cool stuff, right? Do you want to know how this works?" and talk a little more about delayed -- ~10 minutes

  5. Skip Schedulers and Futures

  6. Q&A

My alternative layout of the notebooks lives here for now, but would like to synthesise efforts and end up with a set of 'master' notebooks and slides in this coiled/dask-mini-tutorial repo that we can then fork whenever we give a presentation.
https://github.com/coiled/coiled-resources/tree/main/dask-tutorial/notebooks

My non-code slides live here -- these need iteration, not totally convinced by my own narrative line on this one:
https://docs.google.com/presentation/d/1BMhxuTuOg1jRYFANDvbb-GNszpyH-JKPKnbGEsO5GtQ/edit?usp=sharing

We should also refresh the longer data-science-at-scale tutorial with some of these messaging strategies in mind.

curious what @MrPowers thinks based on his meetup experiences.

from dask-mini-tutorial.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.