GithubHelp home page GithubHelp logo

alpha-unito / pico Goto Github PK

View Code? Open in Web Editor NEW
26.0 7.0 5.0 5.1 MB

A C++ framework for data analytics pipelines

License: GNU Lesser General Public License v3.0

C++ 98.69% Shell 0.30% CMake 1.01%
high-performance data-analytics pipelines multi-core

pico's People

Contributors

armartinelli avatar clamis avatar cmisale avatar mdrocco avatar tremblay-guy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pico's Issues

ut/flatmap

  • [filter] read -> pass if even -> write
  • [expander] read -> tokenize -> write

Special Cases

  • kernel: never adding to collector

ut/reduce-by-key

read -> add-key -> sum-by-key -> write

Special Cases

  • one item for each key
  • all items with same key

ut/w-reduce

stdin -> windowing sum -> stdout

Special Cases

  • less items than window width

world-count fails with assertion error: Assertion `rend > rbegin' failed

root@DESKTOP-HUPC6JQ:~/repositories/PiCo/examples/word-count# make
make: Für das Ziel „all“ ist nichts zu tun.
root@DESKTOP-HUPC6JQ:~/repositories/PiCo/examples/word-count# echo "hello world" > test
root@DESKTOP-HUPC6JQ:~/repositories/PiCo/examples/word-count# ./pico_wc test test_out
=== Semantic Graph
[SEMGRAPH] adjacency [operator]=>[operators]:
        ac83ec0[0xc83ec0] => ac84270[0xc84270]
        ac84270[0xc84270] => ac84290[0xc84290]
        ac84290[0xc84290] => ac85080[0xc85080]
        ac85080[0xc85080] =>
[SEMGRAPH] first operator: ac83ec0
[SEMGRAPH] last operator: ac85080

pico_wc: ../../pico/Operators/InOut/../../ff_implementation/OperatorsFFNodes/InOut/ReadFromFileFFNode.hpp:275: virtual void ReadFromFileFFNode_par::Partitioner::begin_callback(): Assertion `rend > rbegin' failed.
Abgebrochen (Speicherabzug geschrieben)

Am I doing it wrong?

ut/join-flatmap-bykey

read -> pair(read, map, filter) -> write

  • both read operators read the same file (containing pairs)
  • map applies some processing to input pairs
  • filter duplicates some pairs and filters out the others

ut/merge

read -> merge(-, read) -> write

read-from-stdin ignores part of the input

it seems some bytes at the end of the stream are ignored, probably due to ignoring the result of the last call to read, when the buffer is filled only partially.

ut/map

  • read -> +1 -> write

Special Cases

  • empty input
  • void kernel function

API + code style

  • function naming (Camel vs underscores etc.)
  • file naming
  • match include guards with paths

better error reporting

Design some mechanism (e.g., exception-based?) to report errors of different kinds:

  • typing (data/structure) errors
  • generic errors (e.g., non-existing file)
  • ...

ut/iterate

  • variant A: read | iterate(flatmap) | write
  • variant B: read | iterate(flatmap-join(empty, read)) | write

sketch: flatmap could filter out some items and produce modified duplicates of other items

ut/iterate

  • read -> iterate(random filter) -> write
  • [page-rank kernel] read -> iterate(pair(-, read, join-by-key+map(sum))) -> write

Special Cases

  • 0 iterations
  • more iterations than input elements (or more, until all items filtered)

ut/multi-to

  • read -> to(write, write)
  • read -> to(write, +1) -> write
  • read -> to(+1, +2) -> write

ut/w-reduce-by-key

socket -> add key -> windowing sum -> stdout

Special Cases

  • for some key, less items (with that key) than window width

ut/reduce

  • read -> sum -> write

Corner Cases

  • single input item

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.