GithubHelp home page GithubHelp logo

kynan / nbstripout Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 94.0 290 KB

strip output from Jupyter and IPython notebooks

License: Other

Python 62.47% Shell 0.90% Jupyter Notebook 36.63%
filter git hacktoberfest hooks ipython ipython-notebook jupyter jupyter-notebook

nbstripout's Introduction

tests downloads PyPI version conda-forge version supported Python versions Python package formats license GitHub stars GitHub forks

nbstripout: strip output from Jupyter and IPython notebooks

Reads a notebook from a file or stdin, strips output and some metadata, and writes the "cleaned" version of the notebook to the original file or stdout.

Intended to be used as a Git filter or pre-commit hook for users who don't want to track output in Git.

Roughly equivalent to the "Clear All Output" command in the notebook UI, but only "visible" to Git: keep your output in the file on disk, but don't commit the output to Git. This helps minimizing diffs and reduce file size.

Originally based on https://gist.github.com/minrk/6176788.

Python 3 only

As of version 0.4.0, nbstripout supports Python 3 only. If you need to use Python 2.7, install nbstripout 0.3.10:

pip install nbstripout==0.3.10

Screencast

This screencast demonstrates the use and working principles behind the nbstripout utility and how to use it as a Git filter:

image

Installation

You can download and install the latest version of nbstripout from the Python package index PyPI as follows:

pip install --upgrade nbstripout

When using the Anaconda Python distribution, install nbstripout via the conda package manager from conda-forge:

conda install -c conda-forge nbstripout

Usage

Strip output from IPython / Jupyter / Zeppelin notebook (modifies the file in-place):

nbstripout FILE.ipynb [FILE2.ipynb ...]
nbstripout FILE.zpln

Force processing of non .ipynb files:

nbstripout -f FILE.ipynb.bak

For using Zeppelin mode while processing files with other extensions use:

nbstripout -m zeppelin -f <file.ext>

Write to stdout e.g. to use as part of a shell pipeline:

cat FILE.ipynb | nbstripout > OUT.ipynb
cat FILE.zpln | nbstripout -m zeppelin > OUT.zpln

or

nbstripout -t FILE.ipynb | other-command

Set up the git filter and attributes as described in the manual installation instructions below:

nbstripout --install

Set up the git filter using .gitattributes:

nbstripout --install --attributes .gitattributes

Specify a different path to the Python interpreter to be used for the git filters (default is the path to the Python interpreter used when nbstripout is installed). This is useful if you have Python installed in different or unusual locations across machines, e.g. /usr/bin/python3 on your machine vs /usr/local/bin/python3 in a container or elsewhere.

nbstripout --install --python python3

Using just python3 lets each machine find its Python itself. However, keep in mind that depending on your setup this might not be the Python version you want or even fail because an absolute path is required.

Set up the git filter in your global ~/.gitconfig:

nbstripout --install --global

Set up the git filter in your system-wide $(prefix)/etc/gitconfig (most installations will require you to sudo):

[sudo] nbstripout --install --system

Remove the git filter and attributes:

nbstripout --uninstall

Remove the git filter from your global ~/.gitconfig and attributes:

nbstripout --uninstall --global

Remove the git filter from your system-wide $(prefix)/etc/gitconfig and attributes:

[sudo] nbstripout --uninstall --system

Remove the git filter and attributes from .gitattributes:

nbstripout --uninstall --attributes .gitattributes

Check if nbstripout is installed in the current repository (exits with code 0 if installed, 1 otherwise):

nbstripout --is-installed

Print status of nbstripout installation in the current repository and configuration summary of filter and attributes if installed (exits with code 0 if installed, 1 otherwise):

nbstripout --status

Do a dry run and only list which files would have been stripped:

nbstripout --dry-run FILE.ipynb [FILE2.ipynb ...]

Print the version:

nbstripout --version

Show help and usage instructions:

nbstripout --help

Configuration files

The following table shows in which files the nbstripout filter and attribute configuration is written to for given extra flags to --install and --uninstall:

flags filters attributes
none .git/config .git/info/attributes
--global ~/.gitconfig ~/.config/git/attributes
--system $(prefix)/etc/gitconfig $(prefix)/etc/gitattributes
--attributes=.gitattributes .git/config .gitattributes
--global --attributes=.gitattributes ~/.gitconfig .gitattributes

Install globally

Usually, nbstripout is installed per repository so you can choose where to use it or not. You can choose to set the attributes in .gitattributes and commit this file to your repository, however there is no way to have git set up the filters automatically when someone clones a repository. This is by design, to prevent you from executing arbitrary and potentially malicious code when cloning a repository.

To install nbstripout for all your repositories such that you no longer need to run the installation once per repository, install as follows:

mkdir -p ~/.config/git  # This folder may not exist
nbstripout --install --global --attributes=~/.config/git/attributes

This will set up the filters and diff driver in your ~/.gitconfig and instruct git to apply them to any .ipynb file in any repository.

Note that you need to uninstall with the same flags:

nbstripout --uninstall --global --attributes=~/.config/git/attributes

Install system-wide

To install nbstripout system-wide so that it applies to all repositories for all users, install as follows (most installations will require you to sudo):

[sudo] nbstripout --install --system

This will set up the filters and diff driver in $(prefix)/etc/gitconfig and instruct git to apply them to any .ipynb file in any repository for any user.

Note that you need to uninstall with the same flags:

[sudo] nbstripout --uninstall --system

Apply retroactively

nbstripout can be used to rewrite an existing Git repository using git filter-repo to strip output from existing notebooks. This invocation operates on all ipynb files in the repo:

    #!/usr/bin/env bash
    # get lint-history with callback from https://github.com/newren/git-filter-repo/pull/542
    ./lint-history.py --relevant 'return filename.endswith(b".ipynb")' --callback '
    import json, warnings, nbformat
    from nbstripout import strip_output
    from nbformat.reader import NotJSONError
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=UserWarning)
            notebook = nbformat.reads(blob.data, as_version=nbformat.NO_CONVERT)
        # customize to your needs
        strip_output(notebook, keep_output=False, keep_count=False, keep_id=False, extra_keys=["metadata.widgets","metadata.execution","cell.attachments"], drop_empty_cells=True,  drop_tagged_cells=[],strip_init_cells=False, max_size=0)
        old_len = len(blob.data)
        blob.data = (nbformat.writes(notebook) + "\n").encode("utf-8")
        if old_len != len(blob.data):
            print(change.blob_id, change.filename, old_len, len(blob.data))
    except NotJSONError as e:
         print("ERROR", type(e), change.blob_id, filename)
    '

Removing empty cells

Drop empty cells i.e. cells where source is either empty or only contains whitespace:

nbstripout --drop-empty-cells

Removing [init]{.title-ref} cells

By default nbstripout will keep cells with init_cell: true metadata. To disable this behavior use:

nbstripout --strip-init-cells

Removing entire cells

In certain conditions it might be handy to remove not only the output, but the entire cell, e.g. when developing exercises.

To drop all cells tagged with "solution" run:

nbstripout --drop-tagged-cells="solution"

The option accepts a list of tags separated by whitespace.

Keeping some output

Do not strip the execution count/prompt number:

nbstripout --keep-count

Do not strip outputs that are smaller that a given max size (useful for removing only large outputs like images):

nbstripout --max-size 1k

Do not strip the output, only metadata:

nbstripout --keep-output

Do not reassign the cell ids to be sequential (which is the default behavior):

nbstripout --keep-id

To mark special cells so that the output is not stripped, you can either:

  1. Set the keep_output tag on the cell. To do this, enable the tags toolbar (View > Cell Toolbar > Tags) and then add the keep_output tag for each cell you would like to keep the output for.

  2. Set the "keep_output": true metadata on the cell. To do this, select the "Edit Metadata" Cell Toolbar, and then use the "Edit Metadata" button on the desired cell to enter something like:

    {
      "keep_output": true,
    }
    

You can also keep output for an entire notebook. This is useful if you want to strip output by default in an automated environment (e.g. CI pipeline), but want to be able to keep outputs for some notebooks. To do so, add the option above to the notebook metadata instead. (You can also explicitly remove outputs from a particular cell in these notebooks by adding a cell-level metadata entry.)

Another use-case is to preserve initialization cells that might load customized CSS etc. critical for the display of the notebook. To support this, we also keep output for cells with:

{
  "init_cell": true,
}

This is the same metadata used by the init_cell nbextension.

Stripping metadata

The following metadata is stripped by default:

  • Notebook metadata: signature, widgets
  • Cell metadata: ExecuteTime, collapsed, execution, heading_collapsed, hidden, scrolled

Additional metadata to be stripped can be configured via either

  • git config (--global/--system) filter.nbstripout.extrakeys, e.g. :

    git config --global filter.nbstripout.extrakeys '
      metadata.celltoolbar
      metadata.kernelspec
      metadata.language_info.codemirror_mode.version
      metadata.language_info.pygments_lexer
      metadata.language_info.version
      metadata.toc
      metadata.notify_time
      metadata.varInspector
      cell.metadata.heading_collapsed
      cell.metadata.hidden
      cell.metadata.code_folding
      cell.metadata.tags
      cell.metadata.init_cell'
    
  • the --extra-keys flag, which takes a space-delimited string as an argument, e.g. :

    --extra-keys="metadata.celltoolbar cell.metadata.heading_collapsed"
    

Note: Only notebook and cell metadata is currently supported and every key specified via filter.nbstripout.extrakeys or --extra-keys must start with metadata. for notebook and cell.metadata. for cell metadata.

You can keep certain metadata that would be stripped by default with either

  • git config (--global/--system) filter.nbstripout.keepmetadatakeys, e.g.:

    git config --global filter.nbstripout.keepmetadatakeys '
      cell.metadata.collapsed
      cell.metadata.scrolled'
    
  • the --keep-metadata-keys flag, which takes a space-delimited string as an argument, e.g.:

    --keep-metadata-keys="cell.metadata.collapsed cell.metadata.scrolled"
    

Note: Previous versions of Jupyter used metadata.kernel_spec for kernel metadata. Prefer stripping kernelspec entirely: only stripping some attributes inside kernelspec may lead to errors when opening the notebook in Jupyter (see #141).

Excluding files and folders

To exclude specific files or folders from being processed by the nbstripout filters, add the path and exception to your filter specifications defined in .git/info/attributes or .gitattributes:

docs/** filter= diff=

This will disable nbstripout for any file in the docs directory.:

notebooks/Analysis.ipynb filter= diff=

This will disable nbstripout for the file Analysis.ipynb located in the notebooks directory.

To check which attributes a given file has with the current config, run:

git check-attr -a -- path/to/file

For a file to which the filter applies you will see the following:

$ git check-attr -a -- foo.ipynb
foo.ipynb: diff: ipynb
foo.ipynb: filter: nbstripout

For a file in your excluded folder you will see the following:

$ git check-attr -a -- docs/foo.ipynb
foo.ipynb: diff:
foo.ipynb: filter:

Manual filter installation

Set up a git filter and diff driver using nbstripout as follows:

git config filter.nbstripout.clean '/path/to/nbstripout'
git config filter.nbstripout.smudge cat
git config filter.nbstripout.required true
git config diff.ipynb.textconv '/path/to/nbstripout -t'

This will add a section to the .git/config file of the current repository.

If you want the filter to be installed globally for your user, add the --global flag to the git config invocations above to have the configuration written to your ~/.gitconfig and apply to all repositories.

If you want the filter to be installed system-wide, add the --system flag to the git config invocations above to have the configuration written to $(prefix)/etc/gitconfig and apply to all repositories for all users.

Create a file .gitattributes (if you want it versioned with the repository) or .git/info/attributes (to apply it only to the current repository) with the following content:

*.ipynb filter=nbstripout
*.ipynb diff=ipynb

This instructs git to use the filter named nbstripout and the diff driver named ipynb set up in the git config above for every .ipynb file in the repository.

If you want the attributes be set for .ipynb files in any of your git repositories, add those two lines to ~/.config/git/attributes. Note that this file and the ~/.config/git directory may not exist.

If you want the attributes be set for .ipynb files in any git repository on your system, add those two lines to $(prefix)/etc/gitattributes. Note that this file may not exist.

Using nbstripout as a pre-commit hook

pre-commit is a framework for managing git pre-commit hooks.

Once you have pre-commit installed, add the following to the .pre-commit-config.yaml in your repository:

repos:
- repo: https://github.com/kynan/nbstripout
  rev: 0.7.1
  hooks:
    - id: nbstripout

Then run pre-commit install to activate the hook.

Warning

In this mode, nbstripout is used as a git hook to strip any .ipynb files before committing. This also modifies your working copy!

In its regular mode, nbstripout acts as a filter and only modifies what git gets to see for committing or diffing. The working copy stays intact.

Troubleshooting

Known issues

Certain Git workflows are not well supported by nbstripout:

  • Local changes to notebook files that are made invisible to Git due to the nbstripout filter do still cause conflicts when attempting to sync upstream changes (git pull, git merge etc.). This is because Git has no way of resolving a conflict caused by a non-stripped local file being merged with a stripped upstream file. Addressing this issue is out of scope for nbstripout. Read more and find workarounds in #108.

Show files processed by nbstripout filter

Git has no builtin support for listing files a clean or smudge filter operates on. As a workaround, change the setup of your filter in .git/config, ~/.gitconfig or $(prefix)/etc/gitconfig as follows to see the filenames either filter operates on:

[filter "nbstripout"]
    clean  = "f() { echo >&2 \"clean: nbstripout $1\"; nbstripout; }; f %f"
    smudge = "f() { echo >&2 \"smudge: cat $1\"; cat; }; f %f"
    required = true

nbstripout's People

Contributors

ankitrokdeonsns avatar arobrien avatar baldwint avatar boun avatar casperdcl avatar ibressler avatar janosh avatar jdriordan avatar jluttine avatar jonashaag avatar jpeacock29 avatar kynan avatar mforbes avatar minrk avatar nobodyinperson avatar ohjeah avatar oogali avatar ooim avatar paw-lu avatar plpeeters avatar psthomas avatar pugio avatar rpytel1 avatar scottcode avatar simonbiggs avatar thijss avatar tnilanon avatar utsekaj42 avatar wpbonelli avatar zertrin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nbstripout's Issues

Apply retroactively?

Hi, sorry if this is already possible, but it would be great to apply this retroactively to a repository where I have already committed notebook files with output. I would think this would involve some kind of filter-branch wizardry. Is it already possible? Is this a feature you would want added?

Screencast

I've had a go at recording a screencast. Not quite happy with it yet, I think it could be much shorter. Any suggestions welcome!

nbstripout install fails

Am I using it wrong?

(py35)
klay6683 at macd2860 in ~PYTHONPATH/planet4 (master●)
$ nbstripout install
Traceback (most recent call last):
  File "/Users/klay6683/miniconda3/envs/py35/bin/nbstripout", line 9, in <module>
    load_entry_point('nbstripout==0.2.0', 'console_scripts', 'nbstripout')()
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 127, in main
    sys.exit(install())
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 111, in install
    attrfile = path.join(git_dir, 'info', 'attributes')
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/posixpath.py", line 89, in join
    genericpath._check_arg_types('join', a, *p)
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/genericpath.py", line 145, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
(py35)

Can we have a `--is_installed` check?

I'm loosing track where I have nbstripout in use and where not.
Could we have a little check run like:
nbstripout --is_installed
that simple returns: "nbstripout is installed in this repo" or "nbstripout is not installed in this repo" respectively?
Would be great, thanks!

Creating a branch with notebook outputs stripped

Is there a way of using nbstripout that would allow me to create a branch of cleaned notebooks from a branch that contains notebooks with populated output cells (eg ones with output cells populated that can be used for testing with nbval).

I'm thinking of a private github repo workflow where there is a testing-master branch containing executed notebooks with populated test output cells that begets a release branch containing notebooks that can be zipped and distributed to students.

Presumably, a variant of nbstripout could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?

nbstripout --uninstall fails

$ nbstripout --uninstall
Traceback (most recent call last):
  File "/lscr_paper/allan/miniconda3/envs/msth/bin/nbstripout", line 11, in <module>
    sys.exit(main())
  File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 248, in main
    sys.exit(uninstall(args.attributes))
  File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
    f.write(''.join(lines))
ValueError: I/O operation on closed file.

I've added a few comments to the file, and they disappeared along with the line added by nbstripout --install, so I guess the file was overwritten with a blank file.

The problem happens on Linux running Bash and on OS X running Zsh.

I'm using the Anaconda Python Distribution, Python 3.5.2, inside a conda environment, and I installed using pip install nbstripout.

nbstripout installation causes incorrect 'git status' results

I seem to see changes for notebooks I haven't touched since I installed nbstripout. Interestingly, when I uninstall nbstripout and install it again, the changes are gone. Besides general performance issue, this prevents us from adopting nbstripout as a team.

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
... list of notebooks I haven't touched ...

----(after uninstall nbstripout and install it again)---

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

Piping not working on Python 3.5

I'm trying to clean the test notebooks by using a pipe but it doesn't work on Python 3.5. I just get empty result:

$ cat tests/test_metadata.ipynb | nbstripout 

Running nbstripout tests/test_metadata.ipynb correctly cleans the notebook inplace.

I tried with Python 2.7 and then it outputs the normal cleaned notebook when piping. Any ideas what could be wrong?

$ pip freeze
decorator==4.0.11
docopt==0.6.2
docutils==0.13.1
ipython==6.0.0
ipython-genutils==0.2.0
jedi==0.10.2
jsonschema==2.5.1
jupyter-core==4.3.0
nbformat==4.3.0
nbstripout==0.3.0
path.py==10.1
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.14
ptyprocess==0.5
Pygments==2.2.0
simplegeneric==0.8.1
six==1.10.0
testpath==0.3
traitlets==4.3.2
wcwidth==0.1.6

`nbstripout --install` fails in Windows Command Prompt

I'm running Enthought Canopy Python on Windows 10, and when running nbstripout --install on a newly created git repo, I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\scripts\nbstripout.exe\__main__.py", line 9, in <module>
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 254, in main
    sys.exit(install(args.attributes))
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 160, in install
    git_dir = check_output(['git', 'rev-parse', '--git-dir']).strip()
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 710, in __init__
    errread, errwrite)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 958, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Everything works fine in the MINGW64 git bash terminal that is installed with git on Windows. This may be an issue for people who use the Windows Command Prompt for their git work and would like to use this helpful package.

Reference:

https://stackoverflow.com/a/35670418/336001

Install full path for diff.ipynb.textconv

Currently nbstripout installs the full path for filter.nbstripout.clean but not for diff.ipynb.textconv shouldn't it use the full path to nbstripout.py for both? This is especially relevant on windows with conda where nbstripout is not necessarily in path.

Metadata python version attribute

The metadata["language_info"]["version"] = ... causes merge conflicts when using different python versions. While this has no affect on the workings on the notebook, can we also stripout this attribute?

it doesn't work automatically and show nothing when running "git diff"

I follow the demo from youtube as follow.

(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test  git init
Initialized empty Git repository in /Users/zane/Desktop/test/.git/
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  git add Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ✚  git commit -a
[master (root-commit) 8bac162] qq
 1 file changed, 51 insertions(+)
 create mode 100644 Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  nbstripout --install
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  nbstripout --status
nbstripout is installed in repository b'/Users/zane/Desktop/test'

Filter:
  clean = b'"/Users/zane/Desktop/one_shot_face_recognition_env/bin/python3" "/Users/zane/Desktop/one_shot_face_recognition_env/lib/python3.5/site-packages/nbstripout.py"'
  smudge = b'cat'
  required = b'true'
  diff= b'nbstripout -t'

Attributes:
  b'*.ipynb: filter: nbstripout'

Diff Attributes:
  b'*.ipynb: diff: ipynb'
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ●  git add Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ✚  git diff --cached

But after running git diff --cached I can not see anything, it just empty. After running nbstripout --uninstall, git diff --cached work fine as usual.

It does not strip output until I run nbstripout Untitled.ipynb explicitly.

Not working across OS X and Linux?

Doesn't seem to be working accross OS X and Linux? Path to Python seems to be hardcoded – any suggest workarounds will be greatly appreciated :)

I often work on a remove server, sometimes issuing command in the ssh terminal (Linux, Bash), and at other times in my local terminal (OS X, Zsh), since I also mount the relevant folders using sshfs.

nbstripout seems to be working fine on the platform on which run nbstripout --install, but crashes on the other (and crashes commands such as git status and git diff). The behavior is the same whether I run the install-command on Linux and test on OS X, or vice versa.

Example output (from Linux, install-command on OS X):

$ git status
/Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: 1: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python: not found
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed

Example output (from OS X, install-command on Linux):

(py35) ❯ git status
/lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py: /lscr_paper/allan/miniconda3/envs/msth/bin/python: No such file or directory
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed

Consider stripping out only IMAGES as option

Frequently the output is useful as informal "testing" of results,
and there is very little overhead in keeping non-graphical results.
Images, however, add considerable unwanted bulk to a commit
for any version control system (unless those images are very
expensive to reproduce, or are historical for some reason).

Proposal: provide an option to strip out only output cells with images.

nbstripout crashes when input is empty

Steps to reproduce

$ cat /dev/null | nbstripout

Expected results

Zero bytes of output; exit code 0.

Actual results

Stack trace; exit code 1. E.g.:

Traceback (most recent call last):
  [...]
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  [...]
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: ''...

Comments

Version 0.3.1 on OS X, installed with Conda.

This make it harder to recover from bad Git commits, e.g. caused by #55, because Git won't do anything unless its filters exit successfully. One must disable the filter, untangle things, and then re-enable it.

Thank you!

Add stripping for associated .py files

Could you please add a feature for stripping associated py files?
In py files there are lines
# In[XXX]:
which should be stripped back to
# In[ ]:

It would be immensely helpful to have this feature!

Thanks,
Andrew

Only output should be kept for keep_output and init_cell

Currently, nbstripout ignores keep_output and init_cell cells entirely. However, this means that lots of stuff beyond the output stays, such as execution_count. I think keeping this does not meet the design goals, and it can cause Git conflicts.

Feature request: Treat these two types of cells normally, except keep the output.

Needs uninstall

Uninstall would be helpful for people just trying things out.

Performance issue

Everytime i go into a repo with lots of notebooks, it takes several seconds before i get my prompt back, which can be annoying...
I'm wondering if the official PreProcessor of the notebook tools is faster than your manual filtering?

Stripping other metadata (execute time)

I'm using the (hugely useful) Execute Time nbextension, which produces cell metadata like this:

{
 "cell_type": "code",
 "execution_count": null,
 "metadata": {
  "ExecuteTime": {
   "end_time": "2016-08-15T11:49:40.736068",
   "start_time": "2016-08-15T11:49:40.145965"
  },
  "code_folding": [],
  "collapsed": false
 },
 "outputs": [],
 "source": []
}

But naturally I end up with a load of changes seen by git after running nbstripout because of re-running the cells and updating the ExecuteTime field. Would you accept a pull request explicitly stripping this out, or alternatively one which strips all but whitelisted metadata entries?

encoding issues

Hi,

When stripping a notebook that contains unicode I get an error if I use nbstripout as a filter, piping to a file, e.g. cat test.ipynb | nbstripout > out.ipynb, git also fails

File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbstripout-0.2.2-py2.7.egg/nbstripout.py", line 176, in main
    write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
  File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbformat/__init__.py", line 169, in write
    fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 20464: ordinal not in range(128)

then sys.stdout.encoding is None (which defaults to 'ascii' codec)

As a workaround I found to modify the last few lines of nbstripout.py:

        import codecs
        sys.stdout = codecs.getwriter('utf8')(sys.stdout)
        write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)

(according to this stack overflow entry, but I am not sure if this is really a correct solution )
I am on os x, anaconda python 2.7 with nbformat 4.0.1

Gregor

PS thanks for your work, I really like nbstripout install

Pip install on Anaconda doesn't work

Running on Anaconda for windows results in the following:
C:\Anaconda>pip install nbstripoutput
Collecting nbstripoutput Could not find a version that satisfies the requirement nbstripoutput (from ve rsions: ) No matching distribution found for nbstripoutput
Any idea how I could get this to work? Thanks!

Missing trailing newline

Line 170 of nbstripout.py reads as f.write('\n*.ipynb filter=nbstripout'). With the context manager on line 169, this appends the filter to .gitattributes without a trailing new line character.

This might be a matter of opinion, but it would be nice to have a trailing new character, especially because this line is being appended to the end of the file. Also, since the script includes from __future__ import print_function, this line could be rewritten as print('\n*.ipynb filter=nbstripout', file=f) to accomplish this.

Programmatically run nbstripout and output new file.

I am trying to run nbstripout programmatically within a Python script and create a new file from the input file (and not in-place). The documentation doesn't have this use case covered. Is there a way I can do it?

The clean filter fails if there is whitespace in the paths

The clean filter fails if there is whitespace in the paths to python and/or the nbstripout.py script.

C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py: C:/Program: No such file or directory
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed -1
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed
fatal: example.ipynb: clean filter 'nbstripout' failed

git diff includes output when used in combination with nbdime

I'd like to combine nbdime and nbstripout when working with Git repositories: (in some cases) I don't want the output to be committed, but I would still like to use the nicer diff of the input cells provided by nbdime. Simply installing both does not work: nbdime always seems to take over meaning that the output isn't stripped out anymore when nbdime is active.

Maybe there is a simple configuration solution to that, but I am not familiar enough with the Git plugin architecture to come up with any.

I am using nbstripout 0.2.9, nbdime 0.1.0, and Git 2.11.0 on a Mac.

TypeError with python 3.4 and nbformat 4.0.1

$ nbstripout < notebooks/example.ipynb 
Traceback (most recent call last):
  File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/bin/.nbstripout-wrapped", line 12, in <module>
    sys.exit(main())
  File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/lib/python3.4/site-packages/nbstripout.py", line 184, in main
    write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
  File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/site-packages/nbformat/__init__.py", line 169, in write
    fp.write(s)
  File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/codecs.py", line 374, in write
    self.stream.write(data)
TypeError: must be str, not bytes

EDIT: Seems like this might be fixed by f0056f5?

uninstall fails

v: 0.2.8 from pip
Python 3.5 using most recent miniconda on Mac 10.11.6

(stable) └─❱❱❱ nbstripout --uninstall                                  +5357 16:31 ❰─┘
Traceback (most recent call last):
  File "/Users/klay6683/miniconda3/envs/stable/bin/nbstripout", line 11, in <module>
    sys.exit(main())
  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 248, in main
    sys.exit(uninstall(args.attributes))
  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
    f.write(''.join(lines))
ValueError: I/O operation on closed file.

Installed filter fails on Windows

Both
sys.executable
and
path.abspath(__file__)
give strings with '' in the path. Git (version 1.9.4) however only interprets paths with '/' correctly, so when adding an ipynb file one encounters a " command not found" error. I have locally patched this with a
.replace('\\', '/')
but I'm not convinced that this is a safe transformation for path and file names on Linux.

Support installing to .gitattributes?

Hi, this is a great tool, thanks! I suppose currently nbstripout install installs to .git/info/attributes. Would it be possible to support installing to .gitattributes with some flag? For instance, nbstripout install --attributes=.gitattributes or something better. Of course, one can just run mv .git/info/attributes .gitattributes but still it'd be just a nice little polishing to support both installing options. What do you think?

Differences in handling of "collapsed" metadata?

I have something weird going on with the status of the "collapsed": true (and false) lines in a git diff of a .ipynb file.

I did a commit, nbstripout did its work, and I can see that what was commited contains "metadata": { "collapsed": false }, (as seen with git diff HEAD^ as well as on the remote Gitlab repo).

I pulled this commit on another computer. Now when I do a git diff, I get this:

    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [

So the current state of the file is correct, but it thinks the repository contains this line as "collapsed": true, even though when I do git diff HEAD^ i can see that commited line contains "false". Doing git checkout . does not change anything.

Is it possible that nbstripout's handling is not symmetric in committing/checkout vs. status and diff?

P.S. On both computers I'm using Anaconda (Python 3.5.2), with nbstripout 0.2.9 from conda-forge. Config:

$ nbstripout --status

nbstripout is installed in repository b'/path/to/my/repo'

Filter:
  clean = b'/path/to/anaconda3/bin/python /path/to/anaconda3/lib/python3.5/site-packages/nbstripout.py'
  smudge = b'cat'
  required = b'true'

Attributes:
  b'*.ipynb: filter: nbstripout'

Usage vs. Jupyter save hook

What is the advantage of doing this vs just having Jupyter let do it as described in the docs?

Maybe the fact that one can switch it on per repo and not generally for all Jupyter notebook activities?

If that is the reason, how can one reliably implement the concept of also version-controlling the python script version of the notebook using nbstripout? Because the trouble I'm seeing is that if I only let Jupyter do the python script generation and have nbstripout working as a pre-commit hook, then the script conversion will suffer from creating git diff noise because the pre-commit hook is run later, not when I save the notebook. So one would have to include the script generation somehow into the nbstripout tool I guess to have this possibility working well?

ENH: Provide way of marking cells to keep output.

In many of my notebooks I have an initialization cell who's output provides important formatting information (such as MathJaX macro definitions). It would be nice to be able to somehow signal nbstripout to keep certain cells.

This should probably use some sort of cell metadata, but perhaps could use some sort of pragma:

# nbstripout: keep

for example.

Add tests

Some simple unit tests to make sure we don't accidentally break stuff.

pip 0.3.0 version is outdated

I found that the pip version doesn't have -t switch so had to use the git version.

$ nbstripout -t my_notebook.ipynb
usage: nbstripout [-h] [--install] [--uninstall] [--is-installed] [--status]
[--attributes FILEPATH] [--version] [--force]
[files [files ...]]
nbstripout: error: unrecognized arguments: -t
$ pip install --upgrade nbstripout
Requirement already up-to-date: nbstripout in /Users/jinyoung.kim/Package/Conda36/anaconda/envs/py27/lib/python2.7/site-packages
$ nbstripout --version
0.3.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.