GithubHelp home page GithubHelp logo

judahrand / soorgeon Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ploomber/soorgeon

0.0 0.0 0.0 552 KB

Convert monolithic Jupyter notebooks ๐Ÿ“™ into maintainable Ploomber pipelines. ๐Ÿ“Š

Home Page: https://ploomber.io

License: Apache License 2.0

Python 100.00%

soorgeon's Introduction

Soorgeon

Join our community | Newsletter | Contact us | Blog | Website | YouTube

header

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Open JupyerLab

Note: Soorgeon is in alpha, help us make it better.

Install

Compatible with Python 3.7 and higher.

pip install soorgeon

Usage

[Optional] Testing if the notebook runs

Before refactoring, you can optionally test if the original notebook or script runs without exceptions:

# works with ipynb files
soorgeon test path/to/notebook.ipynb

# and notebooks in percent format
soorgeon test path/to/notebook.py

Optionally, set the path to the output notebook:

soorgeon test path/to/notebook.ipynb path/to/output.ipynb

soorgeon test path/to/notebook.py path/to/output.ipynb

Refactoring

To refactor your notebook:

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

# use alternative serializer (cloudpickle or dill) if notebook 
# contains variables that cannot be serialized using pickle 
soorgeon refactor nb.ipynb --serializer cloudpickle
soorgeon refactor nb.ipynb --serializer dill

To learn more, check out our guide.

Cleaning

Soorgeon has a clean command that applies black for .ipynb and .py files:

soorgeon clean path/to/notebook.ipynb

or

soorgeon clean path/to/script.py

Linting

Soorgeon has a lint command that can apply [flake8]:

soorgeon lint path/to/notebook.ipynb

or

soorgeon lint path/to/script.py

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory data analysis notebook:

cd soorgeon/examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd soorgeon/examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Community

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

Telemetry

We collect anonymous statistics to understand and improve usage. For details, see here

soorgeon's People

Contributors

edublancas avatar 94rain avatar wxl19980214 avatar idomic avatar e1ha avatar dependabot[bot] avatar edblancas avatar neelasha23 avatar jramirez857 avatar rrhg avatar grnnja avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.