GithubHelp home page GithubHelp logo

ibis-project / ibis-tutorial Goto Github PK

View Code? Open in Web Editor NEW
24.0 5.0 12.0 421 KB

Ibis tutorial repository

Python 9.40% Dockerfile 0.53% Shell 0.17% Jupyter Notebook 89.90%
analytics data database duckdb ibis postgres pycon2024 python sql tutorial

ibis-tutorial's Issues

Add quick poll at the start to gather details about tutorial attendees

We should gather some data about tutorial attendees next time we give the tutorial. This would be useful both for steering the tutorial towards topics interesting towards users, as well as better understanding our potential user base.

Questions like:

  • Job role (data scientist, data engineer, etc...)
  • Job field of some form (geospatial, finance, idk, ...)
  • Why did you sign up for this tutorial?
  • What are you hoping to learn?

We might make use of a realtime polling service (post a link, can see live results as they come in). Here's one option, but there are certainly others.

Tutorial Feedback from PyCon US

Participant feedback

  • Introduce joins better and before we first use them
  • Maybe improve join documentation in ibis itself
  • Add mapping to Table.rename docstring
  • Don't use jupyterlab, just use codespaces directly
  • More ibis context at the beginning
    • Why ibis?
    • Ibis is opensource
    • How to install
    • Where does ibis sit in the space of tools
      • How does it differ from other tools like sqlalchemy
    • What pain points does it solve?

Our Feedback

  • Design slides/intro better to cover setup issues/late arrivals
    so by the time we start executing things everyone should be
    ready.
  • Maybe move to using postgres for notebook 1 instead of
    duckdb to better differentiate client & backend architecture.
  • Hammer more on how performant duckdb is
  • Notebook 1 also shows to_* methods, maybe want to diminish
    that there in favor of notebook 2.
  • Introduction of _ in notebook 2 feels a bit off. It works, but
    it's distracting.
    • Maybe add at the end of notebook 1 after doing a complex
      compound expression (motivation) to close out section on
      expressions.
  • Perhaps we want to redesign notebook 2 to be more about data
    import/export rather than "python ecosystem" things? Could talk
    more here about the ibis.read_* methods as well as
    create_table maybe.
  • Notebook 3 still downloads the data when that's already done in
    notebook 0.
  • Notebook 0 all downloads should be scripted so they don't need
    to repeat once they've been done.
  • Better mix exercises in notebook 1
  • Don't use %load for solutions, instead make use of html
    summary/detail tags to do dropdowns in a markdown box.
  • Better contrast on slides (no darkmode for projectors)
  • At end of notebook 1 it might be good to poke around live at tab
    completion/existence of other methods on ibis.Table
    • Maybe have an exercise where they poke around and try a
      method we didn't talk about?
  • Add more "optional extras" to exercises to better occupy speedy
    students.

refactor: things to mention in intro

Some common themes that have come up during previous presentations that we could address directly at the beginning:

  1. For nearly all backends, memory is effectively not a consideration anymore, so there is no nbytes to check because the limiting factor is your hard-drive (or the hard-drive of the system you are connecting to)
  2. Assumptions we all make when we thing of dataframes because of the implicit ordering in in-memory data structures:
    a. positional joining -- I just want to smash these two tables together (you can, but it's trickier than you expect)
    b. why can't I operate across these two columns in different tables?

More TODOS:

  • Add a note about Zulip
  • Add a QR code linking to viewable copy of slides for those that want them
  • add a rough schedule to intro slides

refactor(tutorial): pycon 2024 todos

TODO:

Postgres (Phillip)

Set up codespace with recent-ish PyPI data in a postgres instance inside the codespace

We can maybe plan to allow users to use either codespace or their local for first few notebooks (with limited support from us)
Codespace required for PyPI exercises b/c postgres instance required

Intro QMD for how to navigate the codespace with links to the notebooks

Stop installing entirety of dev tools, mostly just ibis + duckdb,postgres,polars,altair

PyPI (Gil)

Update to work with 9.0
Get textblob with recursive CTE for .sql example

Ibis is not an island (Jim)

  • multiple in-memory formats as inputs
  • multiple in-memory formats as outputs
  • __array__ and __dataframe__ protocols
  • tools with direct downstream integration

Move deferred operator (Gil)

motivated by chaining longer expressions, also peek at internals
intermedaite variable vs. chained expression same op tree
huzzah

leads into extra syntactic sugar of selectors

Collecting previous questions for intro front-matter in tutorial (Naty)

Good q's from pydata nyc 2023

UDFs

move UDFs to end or out entirely, or just sprinkle in PyPI exercises as necessary

quarto convert '01 - Getting Started.qmd'
quarto convert '01 - Getting Started.ipynb'

Convert intro slides to Quarto (Gil)

Scipy 2024 - swap pypi data

Opening this issue to discuss changing the pypi dataset for scipy since in our tutorial proposal for Scipy, we said:

We are looking at swapping out the PyPI data exploration exercise for a more applicable set of problems for the SciPy audience and are currently vetting available datasets. The purpose of those exercises is to bring together all of the various methodologies covered in the previous sections and to demonstrate more realistic end-to-end data analysis problems. Our goal is that even if the particular problem set isn't a perfect match with an attendees field of study, that the lessons learned will be easily transferable to other data domains.

We also, promise a small section showcasing some of the geospatial features.

For the geospatial part we can use some of the datasets that we use in the geospatial blogs. But I'm not sure what would be a good replacement for the pypi part.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.