GithubHelp home page GithubHelp logo

hkpeaks / peaks-consolidation Goto Github PK

View Code? Open in Web Editor NEW
102.0 5.0 8.0 252.23 MB

The Peaks Consolidation is equipped with state-of-the-art algorithms and data structures that support high-performance databending exercises. It specializes in management accounting and consolidation, with some special topics in machine learning and bioinformatics.

Home Page: https://www.linkedin.com/in/max01/recent-activity/all/

License: MIT License

Go 94.79% Python 5.21%
benchmark csv dataframe etl framework golang in-memory jointable sql streaming

peaks-consolidation's Introduction

New Crossplatform App

Instant File Preview and Validation for Giant CSV File

Since comma is not necessary be a delimiter of CSV file, this app can detect other delimiter automatically with the assumption number of delimiter for every row must be the same. The app validates first row for each partition of a file as it divide file into 100 partitions, so you can get 100 sample rows to disk (display first 20 rows to screen). Inside the source code, an instruction is helping you how to change the number from 100 to 1000.

Download URL: https://github.com/hkpeaks/peaks-consolidation/tree/main/Documents/PreviewFile

  • main.go is a Golang version
  • main.rs is a Rust version
  • Peaks.py is a Python version

Demo video: https://lnkd.in/gCTFR9rh

New Query Statement for File, In-memory Table and Network Stream

Note: Use of "." to indicate it is member of your defined function is optional. First line is to define data extraction and data load. Below are 3 possible scenarios:-

UserDefineFunctionName = from Extraction to Load

Or

UserDefineFunctionName = from Extraction, Extraction, Extraction to Load

Or

UserDefineFunctionName = from Extraction to Load, Load, Load

You can define query/data transformation function from second line and after.

Examples:

ExpandFile = from Fact.csv to 1BillionRows.csv

.ExpandFactor: 123

JoinTable = from 1BillionRows.csv to Test1Results.csv

.Filter: Saleman(Mary,Peter,John)

.JoinTable: Product, Category => InnerJoin(Master.csv)

.AddColumn: Quantity, Unit_Price => Multiply(Amount)

.Filter: Amount(Float20000..29999)

.GroupBy: Saleman, Shop, Product => Count() Sum(Quantity) Sum(Amount)

.OrderBy: Saleman(A) Product(A) Date(D)

SplitFile = from Test1Results.csv to FolderLake

.CreateFolderLake: Shop

FilterFolder = from Outbox/FolderLake/S15/*.csv to Result-FilterFolderLake.csv

.Filter: Product(222..888) Style(=F)

ReadSample2View = from Outbox/Result-FilterFolderLake.csv to SampleTable

.ReadSample: StartPosition%(0) ByteLength(100000)

.View

Command List

AddColumn{Column, Column => Math(NewColName)}

    where Math includes Add, Subtract, Multiply & Divide

BuildKeyValue{Column, Column ~ KeyValueTableName}

CurrentSetting{StreamMB(Number) Thread(Number)}

Distinct{Column, Column}

Filter{Column(CompareOperator Value) Column(CompareOperator Value)}

FilterUnmatch{Column(CompareOperator Value) Column(CompareOperator Value)}

    where Compare operator includes >,<,>=,<=,=,!= & Range e.g. 100..200
          Compare integer or float e.g. Float > Number, Float100..200

GroupBy{Column, Column => Count() Sum(Column) Max(Column) Min(Column)}

JoinKeyValue{Column, Column => JoinType(KeyValueTableName)}

    where JoinType includes AllMatch, Filter & FilterUnmatch

JoinTable{Column, Column => JoinType(KeyValueTableName)}

    where JoinType includes AllMatch & InnerJoin

OrderBy{PrimaryCol(Sorting Order) SecondaryCol(Sorting Order)}

OrderBy{SecondaryCol(Sorting Order) => CreateFolderLake(PrimaryCol) ~ FolderName or FileName.csv}

    where Sorting Order represents by A or D, to sort real numbers, use either FloatA or FloatD

Read{FileName.csv ~ TableName}

ReadSample{StartPosition%(Number) ByteLength(Number)}

ReadSample{Repeat(Number) ByteLength(Number)}

Select{Column, Column}

SelectUnmatch{Column, Column}

SplitFile{FileName.csv ~ NumberOfSplit}

CreateFolderLake{Column, Column ~ SplitFolderName}

View{TableName}

Write{TableName ~ FileName.csv or %ExpandBy100Time.csv}

peaks-consolidation's People

Contributors

hkpeaks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

peaks-consolidation's Issues

unable to run

/Users/peter/Desktop/peaks-consolidation>./do Input/0.1MillionRows.csv
2023/06/29 22:53:00 unable to read file: open Input/0.1MillionRows.csv.txt: no such file or directory
/Users/peter/Desktop/peaks-consolidation>ls -l Input/0.1MillionRows.csv

-rw-r--r-- 1 peter staff 7217187 Jun 29 22:49 Input/0.1MillionRows.csv

Add more examples from simples to more detailled

I suggest to add more exemples in the readme and/or in a dedicated fonder.

Read/Select/write
Read/select/orderby/write
Read/selec/filter/write
Read/select/filter/groupby/write
Read/select/filter/groupby/write
Read/select/join/write
Read/select/join/filter/groupby/orderby/write
Read/select/split/write
....

make the syntax compatible with pandas

Thansk for good work !

in order to increase user based
and facilitate user migration,
please make the syntax one to one match with pandas:

Pd.read_csv(โ€ฆ)

check koalas library

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.