GithubHelp home page GithubHelp logo

jaumeamoresds / nbmodular Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 875 KB

Convert notebooks to modular code

Home Page: https://jaumeamoresds.github.io/nbmodular/

License: MIT License

Python 33.33% Jupyter Notebook 66.52% CSS 0.13% Shell 0.02%
data-science ipython-magic modularization nbdev notebooks software-engineering

nbmodular's Issues

in place changing the code in a previous function cell

Currently, this creates another code cell at the end of the list, keeping the previous ones. If --merge is not added, the previous ones are just set to not valid. Let's see this with an example:

%%function 
def add1 (x):
	y = x + 1
	return y
	
%%function --merge
def add1 (x, x2):
	y2 = x2 + 1
	return y, y2
	
%%function
def add1 (x, x2):
	y = x + 1
	y2 = x2 + 1
	print (f'{x} + 1 = {y}')
	print (f'{x2} + 1 = {y2}')
	return y, y2

The result is a list with three code cells: the first two with valid=False, and the last one with valid=True
Imagine the third cell replaced the first cell, i.e., the code in the third definition is written in the first cell. We could add a flag --replace-cell that makes it replace. But we need to add also an index of what cell to replace.

We could also add two magic lines:

  • %list_cells function_name listing the code cells existing for a particular function_name, along with the function indexes, to know which one needs to be replaced.
  • %delete funcion_cell function_name idx deleting the function cell in position idx of the list of function cells self.code_cells[function_name].

Explain advantage about using logger

We just initialize it before the function, use it, and it will automatically be added to function arguments
With visitor we might not even need to initialize before, provided we don't run the function

short term issues (pool of issues)

  • Exclude function from pipeline
  • Explain --data separately later: it just allows to have the test=False as argument, see examples where this can be useful.
  • (?) Magic that updates pipeline function (evaluates it)
  • (?) Remove names that are calls - no need to evaluate those objects. See defined function myfunc (don't remember what this is about)

I think these are already covered:

  • Set logging level with magic line
  • Indicate name of input/output to be used in pipeline, in case it is not the same as names in arguments/return_values
  • Indicate name of python file and path to it (relative to lib_path) as magic line.

Do linear scan for previous_variables

  • loaded names that have first been stored cannot be previous variables
  • this requires an ordered scan using a tree visitor. See ast_examples.ipynb in nbs/dev_tutorials folder

debug function

Add magic debug_function which doesn't run function but instead debugs it. It could use the variables in memory or passed as arguments:

%%debug_function --input-values a=1 b=2
def add_and_print(a, b):
    c = a + b
    print (c)

indicate pipeline name

set_pipe (function_name, pipeline_name)

if pipeline_name != 'None' and pipeline_name not in self.pipelines:
   self.pipelines.append (pipeline_name)

for pipeline in self.pipelines:
   self.create_pipeline (pipeline)

def create_pipeline (pipeline=None):
    if pipeline is None:
        pipeline = self.file_name_without_extension
    function_list = self.get_function_list (pipeline=pipeline)

get_function_list (self, ..., pipeline=None):
   if pipeline is not None:
        function_list = (... and f.pipeline==pipeline)

long term issues (Pool of issues)

  • Hierarchical objects with current values as attributes
  • Copy previous values and restore the variables to have the previous values after running the function
  • optional: warning message when return doesn't include variable name but function call
  • allow to exclude / include local variables to be stored in object, to avoid issues
    • Do this in two ways:
      • Delete the variable (del), with the disadvantage that we won't be able to inspect it later on.
      • Delete the variable only when a new cell magic is executed, so that we can still inspect the variables created in the last cell, and then move on to execute the next cell, at which point we remove previous variables that were memory-consuming.
      • We might as well, more in the long-term future, delete variables based on how much memory they consume, using some threshold parameter.
  • allow to store previous_values in _info object as follows:
    1. store the values in locals(), using the same code that is used now for storing current_values in info_object. This code is run before the code from the cell is run,
      so that the locals() are those from before the cell, i.e., the previous values
    2. use the same trick as the one used in keep_variables_in_memory: introduce a boolean flag created_previous_values in dict => although it should work here.
    3. if the flag does not exist, run second code that, instead of storing values in locals, stores them in disk => although this should not happen here.
  • Write ipython script where magic functions are written using https://stackoverflow.com/questions/10361206/how-to-run-ipython-magic-from-a-script
  • Using the AST, see if the first time a variable is stored, in the same statement it is also loaded. If that's the case, or if there was a load for the same variable prior to the current statement, the variable should be in the previous_variable list. Otherwise, it shouldn't. See if ast.walk preserves the ordering of statements when traversing the tree. I think so.
  • Have function_info object be attached to function object, <my_function>.info = current function info. Have the current values attached directly to the function object.

Allow different values for arguments that use defaults

  • When using kwarguments, do not initialize variables that already exist in memory
  • Allow to indicate the values of arguments when calling magic cell:
%%function --input-values a=1 b=2
def add_and_print (a, b=10):
   c = a + b
   print (c)

TO CHECK

  • keyword arguments

  • avoid calling previous code when merging! This can be very inefficient and wrong if code has side-effects.

    • option 1) join the AST trees from different cells, but only run the last cell
    • option 2) use current_values: if previous_values is in current_values of previous cells, it cannot be a previous value
    • experiment with cases: x loaded in cell 1 (prev value), y created in cell 1, z loaded in cell 2 (new prev value), y loaded in cell 2 (not prev value, since it is in created_values of prev cell)
  • magic line %set allows to set any param.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.