Typically a notebook's author will begin an idea from a blank documents in an editable state. Through cycles of interactive computing an author will transform the notebook's data by adding narrative, code, and metadata. The notebook's cells are parts of a whole computable document described by the notebook format.
The interactive in-memory editing mode is a critical, but fleeting stage in the life of a computable document. Notebooks spend most of their existence as whole & static files on disk. The static state of notebooks are reusable; and for notebooks to be reusable they must be reused.
Procedural notebooks are readable and reusable literate documents that can be executed successfully in other contexts like documention, module development, or external jobs. This notebook explores the reusability of procedural notebooks that successfully Restart and Run All.
This literate document can be viewed as a notebook, presentation, or View on GitHub
Procedural notebooks are inspired by Paco Nathan's Oriole: a new learning medium based on Jupyter + Docker given at Jupyter Day Atlanta 2016. In Paco's unofficial styleguide for authoring Jupyter notebooks he suggests:
clear all output then "Run All" -- or it didn't happen
- ... restart and run all, or they don't. Their reusability can be tested in different contexts.
- ... change over time
- ... encapsulate cycles of non-structured, structured, and literate programming actions.
- ... can be executed in other contexts like testing, document conversion, or compute...
- ... can be tested as parts in interactive mode
- ... can be tested as a whole in a procedural mode
- ... may be used to create sophisticated software projects.
Its cells Restart and Run All to create a module and python package called particles:
- Create, describe, and test source code for a project we call particles.
- Copy the source code to a notebook called particles.ipynb
- Convert particles.ipynb to particles.py and a Python package called particles.
particles is inspired by the New York Times R&D The Future of News is Not an Article. particles treat elements of computable documents as data and modular components.
readme.ipynb generates the particles module either in interactive mode, or procedurally from a converted Python script.
attach
is a callable used by readme to append the recent In
put as cell source to particles.ipynb; _it is erroneous to the particles module. If the readme.ipynb cells are run out of order then particles.ipynb could be created incorrectly.
from nbformat import v4, NotebookNode
nb, particles = 'particles.ipynb', v4.new_notebook();
def attach(nb:NotebookNode=particles)->None:
"""attach an input to another notebook removing attach statements.
>>> nb = v4.new_notebook();
>>> assert attach(nb) or ('doctest' in nb.cells[-1].source)"""
'In' in globals() and nb.cells.append(v4.new_code_cell('\n'.join(
str for str in In[-1].splitlines() if not str.startswith('attach'))))
%%file requirements.txt
pandas
matplotlib
Overwriting requirements.txt
Many cells in readme.ipynb have lived and died before you read this line.
The code cell below will be appended to particles.ipynb. It __import__s tools into readme.ipynb's interactive mode. It now becomes quite easy to iteratively develop and test parts of the procedural document.
attach(particles)
"""particles treat notebooks as data"""
'particles treat notebooks as data'
attach(particles)
from nbformat import reads, v4
from pandas import concat, DataFrame, to_datetime
from pathlib import Path
Create two main functions for particles to export
attach()
def read_notebooks(dir:str='.')->DataFrame:
"""Read a directory of notebooks into a pandas.DataFrame
>>> df = read_notebooks('.')
>>> assert len(df) and isinstance(df, DataFrame)"""
return concat({
file: DataFrame(reads(file.read_text(), 4)['cells'])
for file in Path(dir).glob('*.ipynb')
}).reset_index(-1, drop=True)
The read_notebooks
index is a pathlib object containing extra metadata. files_to_data
extracts the stat
properties for each file.
attach()
def files_to_data(df:DataFrame)->DataFrame:
"""Transform an index of Path's to a dataframe of os_stat.
>>> df = files_to_data(read_notebooks())
"""
stats, index = [], df.index.unique()
for file in index:
stat = file.stat()
stats.append({
key: to_datetime(
getattr(stat, key), unit=key.endswith('s') and key.rsplit('_')[-1] or 's'
) if 'time' in key else getattr(stat, key)
for key in dir(stat) if not key.startswith('_') and not callable(getattr(stat, key))})
# Append the change in time to the dataframe.
return DataFrame(stats, index).pipe(lambda df: df.join((df.st_mtime - df.st_birthtime).rename('dt')))
A procedural notebooks will use clues from a namespace to decide what statements to execute in different contexts.
if __name__ != '__main__': assert __name__+'.py' == __file__
__name__
=='__main__'
, but nothing is known about the python object__file__
.
__name__
=='__main__'
andassert __file__
.
__name__ + '.py'
==__file__
.
The get_ipython
context must be manually imported to use magics in converted notebooks.
from IPython import get_ipython
Introspect the interactive Jupyter namespace to control expressions in procedural notebooks.
thing = get_ipython().user_ns.get('thing', 42):
Below are the procedures to test and create the particles
package.
-
doctest
s were declared in each of our functions.doctest
can be run in an interactive notebook session,unittest
cannot.doctest
catches a lot of errors when it is in the Restart and Run All pipeline. It is a great place to stash repeatedly typed statements. -
When the tests pass write the particles.ipynb notebook.
if __name__ == '__main__':
print(__import__('doctest').testmod())
Path(nb).write_text(__import__('nbformat').writes(particles))
TestResults(failed=0, attempted=5)
- Transform both readme.ipynb and the newly minted particles.ipynb to python scripts.
- Autopep it because we can.
- Rerun the same tests on particles.py
if __name__ == '__main__' and '__file__' not in globals():
!jupyter nbconvert --to python --TemplateExporter.exclude_input_prompt=True particles.ipynb readme.ipynb
!autopep8 --in-place --aggressive readme.py particles.ipynb
!python -m doctest particles.py & echo "success"
!jupyter nbconvert --to markdown --TemplateExporter.exclude_input_prompt=True readme.ipynb
[NbConvertApp] Converting notebook particles.ipynb to python
[NbConvertApp] Writing 1234 bytes to particles.py
[NbConvertApp] Converting notebook readme.ipynb to python
[NbConvertApp] Writing 10488 bytes to readme.py
success
[NbConvertApp] Converting notebook readme.ipynb to markdown
[NbConvertApp] Writing 10407 bytes to readme.md
-
setuptools
will install the particles package using the conditions for setup mode.Install the particles package
python readme.py develop
if __name__ == '__main__' and '__file__' in globals():
__import__('setuptools').setup(
name="particles",
py_modules=['particles'],
install_requires=['notebook', 'pandas'])
A notebook that can be imported is reusable.
Particles can now be imported into the current scope. particles allow the user to explore notebooks and their cells as data.
import particles
assert particles.__file__.endswith('.py')
%matplotlib inline
from matplotlib import pyplot as plt
df = particles.read_notebooks()
df.sample(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
cell_type | execution_count | metadata | outputs | source | |
---|---|---|---|---|---|
readme.ipynb | code | NaN | {} | [] | if __name__ == '__main__':\n print(__import... |
readme.ipynb | markdown | NaN | {'slideshow': {'slide_type': '-'}} | NaN | ### In Jupyter mode\n\n> **`__name__`** == **`... |
readme.ipynb | code | NaN | {'collapsed': True} | [] | from IPython import get_ipython |
particles.ipynb | code | NaN | {} | [] | def files_to_data(df:DataFrame)->DataFrame:\n ... |
readme.ipynb | markdown | NaN | {} | NaN | > __particles__ is inspired by the New York T... |
df.source.str.split('\n').apply(len).groupby([df.index, df.cell_type]).sum().to_frame('lines of ...').unstack(-1)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
lines of ... | ||
---|---|---|
cell_type | code | markdown |
particles.ipynb | 26.0 | NaN |
readme.ipynb | 66.0 | 108.0 |
df.cell_type.groupby(df.index).value_counts().unstack('cell_type').apply(lambda df: df.plot.pie() and plt.show());
This document must Restart and Run All to acheive the goals of creating the particles module.
- Procedural notebooks Restart and Run All or they don't; they can be tested.
- Not all notebooks survive, the lucky ones become procedural notebooks.
- Literate procedural notebooks reinforce readability and reusability to reproducible work.
- Procedural tend to maintain a longer shelf life than an exploratory notebook.
- Interactive programming is complex and an author will rely on multiple styles of programming to acheive a procedural document.