GithubHelp home page GithubHelp logo

winebarrel / jlsort Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 44 KB

Sort ndjson/JSON Lines using External merge sort

License: MIT License

Rust 88.81% Makefile 11.19%
rust sort jsonlines json ndjson

jlsort's Introduction

jlsort

Sort ndjson/JSON Lines using External merge sort.

Installation

brew install winebarrel/jl/jlsort

Usage

Usage: jlsort [OPTIONS] [FILE]

Options:
    -k, --key KEY       JSON key to sort
    -c, --capacity SIZE chunk capacity (default: 10M)
    -n, --numeric-sort  sort fields numerically
    -r, --reverse       sort in reverse order
    -v, --version       print version and exit
    -h, --help          print usage and exit
% cat users.ndjson
{"id":10,"name":"Carol"}
{"id":2,"name":"Alice"}
{"id":13,"name":"Bob"}

% jlsort -k id users.ndjson
{"id":10,"name":"Carol"}
{"id":13,"name":"Bob"}
{"id":2,"name":"Alice"}

% jlsort -k name users.ndjson
{"id":2,"name":"Alice"}
{"id":13,"name":"Bob"}
{"id":10,"name":"Carol"}

% jlsort -k id -n users.ndjson
{"id":2,"name":"Alice"}
{"id":10,"name":"Carol"}
{"id":13,"name":"Bob"}

% jlsort -k name -r users.ndjson
{"id":10,"name":"Carol"}
{"id":13,"name":"Bob"}
{"id":22,"name":"Alice"}

Benchmark

# salaries.ndjson: from https://github.com/datacharmer/test_db
% head salaries.ndjson
{"emp_no":10001,"salary":60117,"from_date":"1986-06-26","to_date":"1987-06-26"}
{"emp_no":10001,"salary":62102,"from_date":"1987-06-26","to_date":"1988-06-25"}
{"emp_no":10001,"salary":66074,"from_date":"1988-06-25","to_date":"1989-06-25"}
{"emp_no":10001,"salary":66596,"from_date":"1989-06-25","to_date":"1990-06-25"}
{"emp_no":10001,"salary":66961,"from_date":"1990-06-25","to_date":"1991-06-25"}
{"emp_no":10001,"salary":71046,"from_date":"1991-06-25","to_date":"1992-06-24"}
{"emp_no":10001,"salary":74333,"from_date":"1992-06-24","to_date":"1993-06-24"}
{"emp_no":10001,"salary":75286,"from_date":"1993-06-24","to_date":"1994-06-24"}
{"emp_no":10001,"salary":75994,"from_date":"1994-06-24","to_date":"1995-06-24"}
{"emp_no":10001,"salary":76884,"from_date":"1995-06-24","to_date":"1996-06-23"}

% ls -lah salaries.ndjson
-rw-r--r--  1 sugawara  staff   219M  8 15 00:02 salaries.ndjson

% wc salaries.ndjson
 2844047 2844047 229607343 salaries.ndjson

% time -f "Time:%E, Memory:%M KB" jlsort -k to_date salaries.ndjson > /dev/null
Time:0:28.01, Memory:86324 KB
# cf. https://www.gnu.org/software/time/

% cat salaries.ndjson salaries.ndjson > 2xsalaries.ndjson
% cat 2xsalaries.ndjson 2xsalaries.ndjson > 4xsalaries.ndjson
% cat 4xsalaries.ndjson 4xsalaries.ndjson > 8xsalaries.ndjson
% cat 8xsalaries.ndjson 8xsalaries.ndjson > 16xsalaries.ndjson

% ls -lah 16xsalaries.ndjson
-rw-r--r--  1 sugawara  staff   3.4G  8 22 16:07 16xsalaries.ndjson

% time -f "Time:%E, Memory:%M KB" jlsort -k to_date 16xsalaries.ndjson > /dev/null
Time:12:05.33, Memory:114444 KB

% time -f "Time:%E, Memory:%M KB" jlsort -k to_date -c 100m 16xsalaries.ndjson > /dev/null
Time:8:53.62, Memory:239324 KB

% time -f "Time:%E, Memory:%M KB" jlsort -k to_date -c 1g 16xsalaries.ndjson > /dev/null
Time:4:47.05, Memory:2087236 KB

% time -f "Time:%E, Memory:%M KB" jlsort -k to_date -c 4g 16xsalaries.ndjson > /dev/null
Time:2:42.51, Memory:7729932 KB

Related Links

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.