GithubHelp home page GithubHelp logo

tonyfast / restartable Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 76 KB

some thoughts on restart and run all

Home Page: http://nbviewer.jupyter.org/format/slides/github/tonyfast/restartable/blob/master/readme.ipynb

Jupyter Notebook 67.69% Python 32.31%
notebook procedural-notebooks reusability jupyter python

restartable's Introduction

Procedural Notebooks

Typically a notebook's author will begin an idea from a blank documents in an editable state. Through cycles of interactive computing an author will transform the notebook's data by adding narrative, code, and metadata. The notebook's cells are parts of a whole computable document described by the notebook format.

The interactive in-memory editing mode is a critical, but fleeting stage in the life of a computable document. Notebooks spend most of their existence as whole & static files on disk. The static state of notebooks are reusable; and for notebooks to be reusable they must be reused.

Procedural notebooks are readable and reusable literate documents that can be executed successfully in other contexts like documention, module development, or external jobs. This notebook explores the reusability of procedural notebooks that successfully Restart and Run All.

This literate document can be viewed as a notebook, presentation, or View on GitHub

Binder

Motivation

Procedural notebooks are inspired by Paco Nathan's Oriole: a new learning medium based on Jupyter + Docker given at Jupyter Day Atlanta 2016. In Paco's unofficial styleguide for authoring Jupyter notebooks he suggests:

clear all output then "Run All" -- or it didn't happen

Procedural notebooks

  • ... restart and run all, or they don't. Their reusability can be tested in different contexts.
  • ... change over time
  • ... encapsulate cycles of non-structured, structured, and literate programming actions.
  • ... can be executed in other contexts like testing, document conversion, or compute...
  • ... can be tested as parts in interactive mode
  • ... can be tested as a whole in a procedural mode
  • ... may be used to create sophisticated software projects.

This notebook is procedural notebook

Its cells Restart and Run All to create a module and python package called particles:

  • Create, describe, and test source code for a project we call particles.
  • Copy the source code to a notebook called particles.ipynb
  • Convert particles.ipynb to particles.py and a Python package called particles.

particles is inspired by the New York Times R&D The Future of News is Not an Article. particles treat elements of computable documents as data and modular components.

Procedurally create the particles module

readme.ipynb generates the particles module either in interactive mode, or procedurally from a converted Python script.

attach is a callable used by readme to append the recent Input as cell source to particles.ipynb; _it is erroneous to the particles module. If the readme.ipynb cells are run out of order then particles.ipynb could be created incorrectly.

from nbformat import v4, NotebookNode
nb, particles = 'particles.ipynb', v4.new_notebook();
def attach(nb:NotebookNode=particles)->None:
    """attach an input to another notebook removing attach statements.
    >>> nb = v4.new_notebook();
    >>> assert attach(nb) or ('doctest' in nb.cells[-1].source)"""
    'In' in globals() and nb.cells.append(v4.new_code_cell('\n'.join(
        str for str in In[-1].splitlines() if not str.startswith('attach'))))
    %%file requirements.txt
    pandas
    matplotlib
Overwriting requirements.txt

build particles.ipynb

Many cells in readme.ipynb have lived and died before you read this line.

The code cell below will be appended to particles.ipynb. It __import__s tools into readme.ipynb's interactive mode. It now becomes quite easy to iteratively develop and test parts of the procedural document.

attach(particles)
"""particles treat notebooks as data"""
'particles treat notebooks as data'
attach(particles)
from nbformat import reads, v4 
from pandas import concat, DataFrame, to_datetime 
from pathlib import Path 

callables in particles

Create two main functions for particles to export

attach()
def read_notebooks(dir:str='.')->DataFrame:
    """Read a directory of notebooks into a pandas.DataFrame
    >>> df = read_notebooks('.')
    >>> assert len(df) and isinstance(df, DataFrame)"""
    return concat({
        file: DataFrame(reads(file.read_text(), 4)['cells'])
        for file in Path(dir).glob('*.ipynb')
    }).reset_index(-1, drop=True)

The read_notebooks index is a pathlib object containing extra metadata. files_to_data extracts the stat properties for each file.

attach()
def files_to_data(df:DataFrame)->DataFrame:
    """Transform an index of Path's to a dataframe of os_stat.
    >>> df = files_to_data(read_notebooks())
    """
    stats, index = [], df.index.unique()
    for file in index:
        stat = file.stat() 
        stats.append({
            key: to_datetime(
                getattr(stat, key), unit=key.endswith('s') and key.rsplit('_')[-1] or 's'
            ) if 'time' in key else getattr(stat, key)
            for key in dir(stat) if not key.startswith('_') and not callable(getattr(stat, key))})
    # Append the change in time to the dataframe.
    return DataFrame(stats, index).pipe(lambda df: df.join((df.st_mtime - df.st_birthtime).rename('dt')))

Control Flow in Procedural Notebooks

A procedural notebooks will use clues from a namespace to decide what statements to execute in different contexts.

if __name__ != '__main__': assert __name__+'.py' == __file__

In Jupyter mode

__name__ == '__main__', but nothing is known about the python object __file__.

In Setup mode

__name__ == '__main__' and assert __file__ .

As a python package mode

__name__ + '.py' == __file__.

get_ipython

The get_ipython context must be manually imported to use magics in converted notebooks.

    from IPython import get_ipython

Controlling value assignment

Introspect the interactive Jupyter namespace to control expressions in procedural notebooks.

thing = get_ipython().user_ns.get('thing', 42):

readme procedures to make particles

Below are the procedures to test and create the particles package.

  • doctests were declared in each of our functions. doctest can be run in an interactive notebook session, unittest cannot.

    doctest catches a lot of errors when it is in the Restart and Run All pipeline. It is a great place to stash repeatedly typed statements.

  • When the tests pass write the particles.ipynb notebook.

if __name__ == '__main__':
    print(__import__('doctest').testmod())
    Path(nb).write_text(__import__('nbformat').writes(particles))
TestResults(failed=0, attempted=5)
  • Transform both readme.ipynb and the newly minted particles.ipynb to python scripts.
  • Autopep it because we can.
  • Rerun the same tests on particles.py
if __name__ == '__main__' and '__file__' not in globals():
    !jupyter nbconvert --to python --TemplateExporter.exclude_input_prompt=True particles.ipynb readme.ipynb
    !autopep8 --in-place --aggressive readme.py particles.ipynb
    !python -m doctest particles.py & echo "success"
    !jupyter nbconvert --to markdown --TemplateExporter.exclude_input_prompt=True readme.ipynb
[NbConvertApp] Converting notebook particles.ipynb to python
[NbConvertApp] Writing 1234 bytes to particles.py
[NbConvertApp] Converting notebook readme.ipynb to python
[NbConvertApp] Writing 10488 bytes to readme.py
success
[NbConvertApp] Converting notebook readme.ipynb to markdown
[NbConvertApp] Writing 10407 bytes to readme.md
  • setuptools will install the particles package using the conditions for setup mode.

    Install the particles package

    python readme.py develop

    if __name__ == '__main__' and '__file__' in globals():
        __import__('setuptools').setup(
            name="particles", 
            py_modules=['particles'], 
            install_requires=['notebook', 'pandas'])

Reusing particles

A notebook that can be imported is reusable.

Particles can now be imported into the current scope. particles allow the user to explore notebooks and their cells as data.

import particles
assert particles.__file__.endswith('.py')
%matplotlib inline
from matplotlib import pyplot as plt
df = particles.read_notebooks()
df.sample(5)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
cell_type execution_count metadata outputs source
readme.ipynb code NaN {} [] if __name__ == '__main__':\n print(__import...
readme.ipynb markdown NaN {'slideshow': {'slide_type': '-'}} NaN ### In Jupyter mode\n\n> **`__name__`** == **`...
readme.ipynb code NaN {'collapsed': True} [] from IPython import get_ipython
particles.ipynb code NaN {} [] def files_to_data(df:DataFrame)->DataFrame:\n ...
readme.ipynb markdown NaN {} NaN > __particles__ is inspired by the New York T...

Quantifying lines of code

df.source.str.split('\n').apply(len).groupby([df.index, df.cell_type]).sum().to_frame('lines of ...').unstack(-1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead tr th {
    text-align: left;
}
</style>
lines of ...
cell_type code markdown
particles.ipynb 26.0 NaN
readme.ipynb 66.0 108.0

The distribution of markdown and code cells in the particles project.

    
    df.cell_type.groupby(df.index).value_counts().unstack('cell_type').apply(lambda df: df.plot.pie() and plt.show());

png

png

Summary

This document must Restart and Run All to acheive the goals of creating the particles module.

  • Procedural notebooks Restart and Run All or they don't; they can be tested.
  • Not all notebooks survive, the lucky ones become procedural notebooks.
  • Literate procedural notebooks reinforce readability and reusability to reproducible work.
  • Procedural tend to maintain a longer shelf life than an exploratory notebook.
  • Interactive programming is complex and an author will rely on multiple styles of programming to acheive a procedural document.

restartable's People

Contributors

tonyfast avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.