GithubHelp home page GithubHelp logo

vallettea / koala Goto Github PK

View Code? Open in Web Editor NEW
142.0 142.0 60.0 2.73 MB

Transpose your Excel calculations into python for better performances and scaling.

License: GNU General Public License v3.0

Python 100.00%

koala's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

koala's Issues

Evaluation inconsistency

Evaluation does not always give the same result.

Example:
With InputData!G14 as 2018 in the .XLS,

print 'First evaluation', sp.evaluate('Cashflow!G187') # => outputs -2966.25862693
sp.set_value('InputData!G14', 0) # this is to avoid direct evaluation
sp.set_value('InputData!G14', 2025)
print 'Second evaluation', sp.evaluate('Cashflow!G187') # => outputs -3719.5504961

With InputData!G14 as 2025 in the .XLS,

print 'First evaluation', sp.evaluate('Cashflow!G187') # => outputs -2582.30664008
sp.set_value('InputData!G14', 0) # this is to avoid direct evaluation
sp.set_value('InputData!G14', 2025)
print 'Second evaluation', sp.evaluate('Cashflow!G187') # => outputs -2582.30663952

Cellmap inconsistency

When you prune your graph, the cellmap of the reduced graph has a smaller nb of cells than the original cellmap. But Rangeshave been created with the original cellmap, so they might have a valid reference to a cell that doesn't exist in the reduced cellmap.
This problem get solved by dumping/loading the graph, since Ranges are recreated from the reduced cellmap.

But still, such inconsistency should be addressed

Offset doesn't work with Ranges

When use the OFFSET height and width so that the output Cell is actually a Range, it is most probable that this output doesn't exist in the cellmap, leading to errors.

Set up a detailed Benchmark

Related to #17.

We need to understand exactly where we gain perfs and where we simplify the graphs.
A detailed benchmark is then needed.

The main 3 options we've added are:

  • volatile cleaning
  • pruning (inputs selection)
  • outputs selection

For each of these options, we want to know:

  • what is the size reduction of the graph (node, edges) ?
  • what is the time reduction of gen_graph ?
  • what is the time reduction of set_value ?
  • what is the time reduction of evaluate ?

Use a single Tokenizer

Currently, 2 different tokenizers are used in Koala:

  • the main tokenizer is the one from Pycel, is used when constructing the graph (in koala/ast/tokenizer.pyx)
  • a secondary tokenizer from Openpyxel used when reading the cells of type range to be able to translate the formulas (in koala/openpyxl/tokenizer.py).

We need to merge the 2 into one to avoid complexity.

No need to remove all index

we don't need to remove all index (only the one that give address) and not the one giving back a value. For the moment, we remove all.

Is the clean_volatiles() cache a source of bad evaluations ?

There is a cache dictionary in the Spreadsheet.clean_volatiles() function, whose purpose is to reduce the amount of expression calculated, when the formula is the same as one previously found.
The problem is that sometimes, the same formula, called from a different cell, will evaluate differently.

This might lead to bad evaluations, and might explain #44.
But performance might be impacted.

Fix_cell() bug

After some experiments, fixing a cell in the middle of a calculation chain has proven to output fixed results. More investigation is needed.

Set up automated tests before commit

Not urgent at all.
Just in the future we will need to automatically launch tests before committing.
But before that, we need to structure a little bit our testing procedure.

Open the possibility to clean_volatiles() from Spreadsheet

Currently, it is necessary to call ExcelCompiler.clean_volatiles(), which will call the Spreadsheet equivalent.

But calling directly Spreadsheet.clean_volatiles() won't generate a new graph.
Opening this possibility requires to rethink how ast.__init__() works.

Are string values flatten ?

excelutils.py l424,* flatten method* for cells values:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):

@iOiurson How do you feel with that ?

Rename Volatiles

Volatile functions in Excel are functions that always trigger evaluation (see: http://www.decisionmodels.com/calcsecretsi.htm)

What we have called "volatiles" in our code is actually functions that output a reference to a cell, which is not the same.

For the sake of clarity, we need to rename what we call volatiles in our code.

Should we clean white spaces in formulas ?

White spaces in formulas are a problem:

  • if you clean them up, text variables that include white spaces are perverted
  • if you don't, clean_volatiles() function might end up not replacing parts of formula since revert_rpn() (which outputs the part of the formula to replace) returns a formula without white spaces.

The current set up is not replacing white spaces.

VDB function with partials

excellib.vdb() doesn't output exactly the same result when using partial start_period or end_period (meaning, floats)

RangeCore.apply_all on Range with different sizes

Our current strategy is not to fill Ranges with empty cells.
But this might lead to apply_all operations on Ranges with different sizes, raising an Exception.

We might need to consider filling the missing cells values with zeros on such occasions.

Authorize ":" tokenizer when you have inputs that influence 'INDEX' or 'OFFSET' formula

When you have inputs that can modify cells with formulas containing INDEX or OFFSET, you don't want to pre parse your formula to clean the volatiles.
So you need to able to calculate entirely your workbook (even if it takes a great amout of time).

Currently, this is not possible and leads to evaluation errors due to bad parsing of ":" characters.
A generic mode addressing this case needs to be available.

False circular references

Formulas like:
=(totalDecom-SUM(INDEX(FA_RecCostsDecom;1;1):INDEX(FA_RecCostsDecom;1;CA_Periods-1)))*Deprec_UOPRates when calculated on a cell referenced as FA_RecCostsDecom trigger infinite loop.

This is because currently our koala algorithm reevaluates a range each time it sees it in a formula.
A good way to handle this would be to store Ranges (in a koala sense) in a Spreadsheet.range_dict object so that when koala encounters a Range it already knows, it can directly use the values without reevaluating the Range (then avoiding the infinite loop).

2 problems though:

  • this means the way to initialize Ranges must be adjusted so that a Range is created in the dict on the first element inserted (otherwise the previous formula wouldn't work either)
  • this might be a lot of effort for a few cases, since this might not happen that many times

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.