GithubHelp home page GithubHelp logo

opendilab / treevalue Goto Github PK

View Code? Open in Web Editor NEW
219.0 4.0 3.0 997.88 MB

Here are the most awesome tree structure computing solutions, make your life easier. (这里有目前性能最优的树形结构计算解决方案)

Home Page: https://opendilab.github.io/treevalue/

License: Apache License 2.0

Makefile 0.54% Python 70.87% Shell 0.48% Cython 28.11%
data-structures framework python3 tree tree-structure nested-structures

treevalue's Introduction


Twitter PyPI PyPI - Python Version Loc Comments

Docs Deploy Code Test Badge Creation Package Release codecov

GitHub Org's stars GitHub stars GitHub forks GitHub commit activity GitHub issues GitHub pulls Contributors GitHub license

TreeValue is a generalized tree-based data structure mainly developed by OpenDILab Contributors.

Almost all the operations can be supported in the form of trees in a convenient way to simplify the structure processing when the calculation is tree-based.

Outline

Overview

When we build a complex nested structure, we need to model it as a tree structure, and the native list and dict in Python are often used to solve this problem. However, it takes a lot of codes and some complex and non-intuitive calculation logic, which is not easy to modify and extend related code and data, and parallelization is impossible.

Therefore, we need a kind of more proper data container, named TreeValue. It is designed for solving the following problems:

  • Ease of Use: When the existing operations are applied to tree structures such as dict, they will become completely unrecognizable, with really low readability and maintainability.
  • Diversity of Data: In the tree structure operation, various abnormal conditions (structure mismatch, missing key-value, type mismatch, etc.) occur from time to time, and the code will be more complicated if it needs to be handled properly.
  • Scalability and Parallelization: When any multivariate operation is performed, the calculation logic needs to be redesigned under the native Python code implementation, and the processing will be more complicated and confusing, and the code quality is difficult to control.

Getting Started

Prerequisite

treevalue has been fully tested in the Linux, macOS and Windows environments and with multiple Python versions, and it works properly on all these platforms.

However, treevalue currently does not support PyPy, so just pay attention to this when using it.

Installation

You can simply install it with pip command line from the official PyPI site.

pip install treevalue

Or just from the source code on github

pip install git+https://github.com/opendilab/treevalue.git@main

For more information about installation, you can refer to the installation guide.

After this, you can check if the installation is processed properly with the following code

from treevalue import __version__
print('TreeValue version is', __version__)

Quick Usage

You can easily create a tree value object based on FastTreeValue.

from treevalue import FastTreeValue

if __name__ == '__main__':
    t = FastTreeValue({
        'a': 1,
        'b': 2.3,
        'x': {
            'c': 'str',
            'd': [1, 2, None],
            'e': b'bytes',
        }
    })
    print(t)

The result should be

<FastTreeValue 0x7f6c7df00160 keys: ['a', 'b', 'x']>
├── 'a' --> 1
├── 'b' --> 2.3
└── 'x' --> <FastTreeValue 0x7f6c81150860 keys: ['c', 'd', 'e']>
    ├── 'c' --> 'str'
    ├── 'd' --> [1, 2, None]
    └── 'e' --> b'bytes'

And t is structure should be like this

Not only a visible tree structure, but abundant operation supports is provided. You can just put objects (such as torch.Tensor, or any other types) here and just call their methods, like this

import torch

from treevalue import FastTreeValue

t = FastTreeValue({
    'a': torch.rand(2, 5),
    'x': {
        'c': torch.rand(3, 4),
    }
})

print(t)
# <FastTreeValue 0x7f8c069346a0>
# ├── a --> tensor([[0.3606, 0.2583, 0.3843, 0.8611, 0.5130],
# │                 [0.0717, 0.1370, 0.1724, 0.7627, 0.7871]])
# └── x --> <FastTreeValue 0x7f8ba6130f40>
#     └── c --> tensor([[0.2320, 0.6050, 0.6844, 0.3609],
#                       [0.0084, 0.0816, 0.8740, 0.3773],
#                       [0.6523, 0.4417, 0.6413, 0.8965]])

print(t.shape)  # property access
# <FastTreeValue 0x7f8c06934ac0>
# ├── a --> torch.Size([2, 5])
# └── x --> <FastTreeValue 0x7f8c069346d0>
#     └── c --> torch.Size([3, 4])
print(t.sin())  # method call
# <FastTreeValue 0x7f8c06934b80>
# ├── a --> tensor([[0.3528, 0.2555, 0.3749, 0.7586, 0.4908],
# │                 [0.0716, 0.1365, 0.1715, 0.6909, 0.7083]])
# └── x --> <FastTreeValue 0x7f8c06934b20>
#     └── c --> tensor([[0.2300, 0.5688, 0.6322, 0.3531],
#                       [0.0084, 0.0816, 0.7669, 0.3684],
#                       [0.6070, 0.4275, 0.5982, 0.7812]])
print(t.reshape((2, -1)))  # method with arguments
# <FastTreeValue 0x7f8c06934b80>
# ├── a --> tensor([[0.3606, 0.2583, 0.3843, 0.8611, 0.5130],
# │                 [0.0717, 0.1370, 0.1724, 0.7627, 0.7871]])
# └── x --> <FastTreeValue 0x7f8c06934b20>
#     └── c --> tensor([[0.2320, 0.6050, 0.6844, 0.3609, 0.0084, 0.0816],
#                       [0.8740, 0.3773, 0.6523, 0.4417, 0.6413, 0.8965]])
print(t[:, 1:-1])  # index operator
# <FastTreeValue 0x7f8ba5c8eca0>
# ├── a --> tensor([[0.2583, 0.3843, 0.8611],
# │                 [0.1370, 0.1724, 0.7627]])
# └── x --> <FastTreeValue 0x7f8ba5c8ebe0>
#     └── c --> tensor([[0.6050, 0.6844],
#                       [0.0816, 0.8740],
#                       [0.4417, 0.6413]])
print(1 + (t - 0.8) ** 2 * 1.5)  # math operators
# <FastTreeValue 0x7fdfa5836b80>
# ├── a --> tensor([[1.6076, 1.0048, 1.0541, 1.3524, 1.0015],
# │                 [1.0413, 1.8352, 1.2328, 1.7904, 1.0088]])
# └── x --> <FastTreeValue 0x7fdfa5836880>
#     └── c --> tensor([[1.1550, 1.0963, 1.3555, 1.2030],
#                       [1.0575, 1.4045, 1.0041, 1.0638],
#                       [1.0782, 1.0037, 1.5075, 1.0658]])

Tutorials

For more examples, explanations and further usages, take a look at:

External

We provide an official treevalue-based-wrapper for numpy and torch called DI-treetensor since the treevalue is often used with libraries like numpy and torch. It will actually be helpful while working with AI fields.

Speed Performance

Here is the speed performance of all the operations in FastTreeValue; the following table is the performance comparison result with dm-tree. (In DM-Tree, the unflatten operation is different from that in TreeValue, see: Comparison Between TreeValue and DM-Tree for more details.)

flatten flatten(with path) mapping mapping(with path)
treevalue --- 511 ns ± 6.92 ns 3.16 µs ± 42.8 ns 1.58 µs ± 30 ns
flatten flatten_with_path map_structure map_structure_with_path
dm-tree 830 ns ± 8.53 ns 11.9 µs ± 358 ns 13.3 µs ± 87.2 ns 62.9 µs ± 2.26 µs

The following 2 tables are the performance comparison result with jax pytree.

mapping mapping(with path) flatten unflatten flatten_values flatten_keys
treevalue 2.21 µs ± 32.2 ns 2.16 µs ± 123 ns 515 ns ± 7.53 ns 601 ns ± 5.99 ns 301 ns ± 12.9 ns 451 ns ± 17.3 ns
tree_map (Not Implemented) tree_flatten tree_unflatten tree_leaves tree_structure
jax pytree 4.67 µs ± 184 ns --- 1.29 µs ± 27.2 ns 742 ns ± 5.82 ns 1.29 µs ± 22 ns 1.27 µs ± 16.5 ns
flatten + all flatten + reduce flatten + reduce(with init) rise(given structure) rise(automatic structure)
treevalue 425 ns ± 9.33 ns 702 ns ± 5.93 ns 793 ns ± 13.4 ns 9.14 µs ± 129 ns 11.5 µs ± 182 ns
tree_all tree_reduce tree_reduce(with init) tree_transpose (Not Implemented)
jax pytree 1.47 µs ± 37 ns 1.88 µs ± 27.2 ns 1.91 µs ± 47.4 ns 10 µs ± 117 ns ---

This is the comparison between dm-tree, jax-libtree and us, with flatten and mapping operations (lower value means less time cost and runs faster)

Time cost of flatten operation

Time cost of mapping operation

The following table is the performance comparison result with tianshou Batch.

get set init deepcopy stack cat split
treevalue 51.6 ns ± 0.609 ns 64.4 ns ± 0.564 ns 750 ns ± 14.2 ns 88.9 µs ± 887 ns 50.2 µs ± 771 ns 40.3 µs ± 1.08 µs 62 µs ± 1.2 µs
tianshou Batch 43.2 ns ± 0.698 ns 396 ns ± 8.99 ns 11.1 µs ± 277 ns 89 µs ± 1.42 µs 119 µs ± 1.1 µs 194 µs ± 1.81 µs 653 µs ± 17.8 µs

And this is the comparison between Tianshou Batch and us, with cat , stack and split operations (lower value means less time cost and runs faster)

Time cost of cat operation

Time cost of stack operation

Time cost of split operation

Test benchmark code can be found here:

Change Log

Version History [click to expand]
  • 2022-05-03 1.3.1: Change definition of getitem, setitem and delitem; add pop method for TreeValue class.
  • 2022-03-15 1.3.0: Add getitem, setitem and delitem for adding, editing and removing items in TreeValue class.
  • 2022-02-22 1.2.2: Optimize union function; add walk utility method.
  • 2022-01-26 1.2.1: Update tree printing; add keys, values, items on TreeValue; add comparision to facebook nest library.
  • 2022-01-04 1.2.0: Add flatten_values and flatten_keys; fix problem in mapping function; add support for potc.
  • 2021-12-03 1.1.0: Add version information; fix bug of default value; add flatten and unflatten; optimization speed performance.
  • 2021-10-24 1.0.0: Greatly optimize the speed performance using cython, overhead has been reduced to a negligible level.

Feedback and Contribute

Welcome to OpenDILab community - treevalue!

If you meet some problem or have some brilliant ideas, you can file an issue.

Scan the QR code and add us on Wechat:

Or just contact us with slack or email ([email protected]).

Please check Contributing Guidances.

Thanks to the following contributors!

Citation

@misc{treevalue,
    title={{TreeValue} - Tree-Structure Computing Solution},
    author={TreeValue Contributors},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/opendilab/treevalue}},
    year={2021},
}

License

treevalue released under the Apache 2.0 license. See the LICENSE file for details.

treevalue's People

Contributors

hansbug avatar paparazz1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

treevalue's Issues

[BUG] When function returns a treevalue

from treevalue import FastTreeValue


def func(x):
    return FastTreeValue({
        'x': x, 'y': x ** 2,
    })


f = FastTreeValue({
    'x': func,
    'y': {
        'z': func,
    }
})
v = FastTreeValue({'x': 2, 'y': {'z': 34}})
r1 = f(v)
print(r1)

The output is

<FastTreeValue 0x7f13ff6491f0>
├── 'x' --> <FastTreeValue 0x7f13ff6491c0>
│           ├── 'x' --> 2
│           └── 'y' --> 4
└── 'y' --> <FastTreeValue 0x7f13ff649220>
    └── 'z' --> <FastTreeValue 0x7f13ff649130>
                ├── 'x' --> 34
                └── 'y' --> 1156

But the correct output should be

<FastTreeValue 0x7f13ff6498b0>
├── 'x' --> <FastTreeValue 0x7f13ff649a00>
│   ├── 'x' --> 2
│   └── 'y' --> 4
└── 'y' --> <FastTreeValue 0x7f13ff6499a0>
    └── 'z' --> <FastTreeValue 0x7f13ff649910>
        ├── 'x' --> 34
        └── 'y' --> 1156

ASCII output when print a tree.

As this title, sometimes unicode output is not acceptable (such as in latex documentation), so the full ASCII output should be allowed.

Like this

FORK = u'+'
LAST = u'+'
VERTICAL = u'|'
HORIZONTAL = u'-'
NEWLINE = u''

in treevalue/utils/formattree.py.

New features in version 0.1.0

Implemented in #5 .

  • Dump of the tree value graph.
  • Suport pickle.dumps of TreeValue objects
  • Binary-based full data dump
  • Binary-based full data load
  • Native-python-based full data load
  • Self calculation with original address
  • Self operator with original address
  • Add more features in general_treevalue, such as NotImplement
  • CLI support (dump/load, graph generation, graphviz code dump, etc)

Error occurred when raising TypeError in mapping function.

Error occurred when raising TypeError in mapping function.

from treevalue import TreeValue, mapping


def f(x):
    raise TypeError


if __name__ == '__main__':
    t = TreeValue({'a': 1, 'b': 2, 'x': {'c': 3, 'd': 4}})
    t1 = mapping(t, f)

It will cause this wrong error now.

Traceback (most recent call last):
  File "test_p.py", line 10, in <module>
    t1 = mapping(t, f)
  File "treevalue/tree/tree/functional.pyx", line 47, in treevalue.tree.tree.functional.mapping
    cpdef TreeValue mapping(TreeValue tree, object func):
  File "treevalue/tree/tree/functional.pyx", line 77, in treevalue.tree.tree.functional.mapping
    return type(tree)(_c_mapping(tree._detach(), _ValuePathFuncWrapper(func), ()))
  File "treevalue/tree/tree/functional.pyx", line 42, in treevalue.tree.tree.functional._c_mapping
    _d_res[k] = func(v, curpath)
  File "treevalue/tree/tree/functional.pyx", line 20, in treevalue.tree.tree.functional._ValuePathFuncWrapper.__call__
    return self.func()
TypeError: f() missing 1 required positional argument: 'x'

Documentation for the new features

  • Dump of the tree value graph.
  • Suport pickle.dumps of TreeValue objects
  • Self calculation with original address
  • Self operator with original address
  • Add more features in general_treevalue, such as NotImplement
  • Binary-based full data dump and load
  • CLI support (dump/load, graph generation, graphviz code dump, etc)

Add constraint within treevalue structure

For example, when treevalue is used in DI-treetensor, some tensors may have the same shape (all dimensions are the same) and partly-same shape (e.g. 1st dimension is the same).

Maybe a constraint should be provided, to simplify the result, such as

from treevalue import TreeValue
import torch

t = TreeValue({'a': torch.randn(2, 3), 'b': {'x': torch.randn(2, 4)}})
t.shape[0]  # should be a tree in current treevalue

# define some constraint
t.shape[0]  # should be 2 after defining

Add static tree support for treevalue

Well, though this is called "tree" as well, but I tend to treat it as another kind of data structure, like segment tree.

Its properties:

  • You can store (or called manage) something on the non-leaf nodes
  • The structure is fixed, including tree structure and max capacity of nodes

Based on this, pickle.loads and picke.dumps can be optimized because this serialization can be directly performed on the data of root node.

treevalue graph cli cannot import the trees

As the title said, it imports nothing and no error displayed. After setting PYTHONPATH=.${PYTHONPATH}, everything is okay. I seems to have something with hbutils library.

# test_main.py
from treevalue import TreeValue

t1 = TreeValue({
    'a': 2, 'b': 3,
    'x': {
        'c': 5, 'd': 7,
    }
})
t2 = TreeValue({
    't1': t1, 'a': 4, 'x': {'c': 5}
})
t2.x.d = t2
# command line
treevalue graph -t test_main.t1 -o test_graph.gv

Error when setting a raw-wrapped value

>>> from treevalue import FastTreeValue, raw
>>> f['a'] =  raw({'a': 1})
>>> f
<FastTreeValue 0x7f17e8153080>
└── 'a' --> <treevalue.tree.common.base.RawWrapper object at 0x7f17e8146fd0>

This is wrong, the correct one should be

>>> from treevalue import FastTreeValue, raw
>>> f['a'] =  raw({'a': 1})
>>> f
<FastTreeValue 0x7f17e8153080>
└── 'a' --> {'a': 1}

Something is wrong when the inherit value is missing.

  • I have marked all applicable categories:
    • installation bug
    • exception-raising bug
    • data model bug
    • tree utils bug
    • function treelize bug (including method and classmethod)
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
import sys

import treevalue

print(treevalue.__version__, sys.version, sys.platform)

Version 1.0.0, python 3.9.6 (should be independent of python's version)

The following code

python

from treevalue import func_treelize, FastTreeValue


@func_treelize(mode='outer', missing=lambda: [])
def append(arr: list, *args):
    print(arr, args)
    for item in args:
        if item:
            arr.append(item)
    return arr


t0 = FastTreeValue({})
t1 = FastTreeValue({'a': 2, 'x': {'c': 4, 'd': 9}})
t2 = FastTreeValue({'a': 4, 'b': 48, 'x': {'d': 54}})
t3 = FastTreeValue({'b': -12, 'x': 7, 'y': {'e': 3, 'f': 4}})

tr = append(t0, t1, t2, t3)

treevalue graph's dumped graph is

test_graph_10

tr.x's value is obviously incorrect.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.