GithubHelp home page GithubHelp logo

ashimapanjwani / tablecloth Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scicloj/tablecloth

1.0 1.0 0.0 9.49 MB

Dataset manipulation library build on the top of tech.ml.dataset

Home Page: https://scicloj.github.io/tablecloth/index.html

License: MIT License

Clojure 100.00%

tablecloth's Introduction

Versions

tech.ml.dataset 5.x (master branch)

tech.ml.dataset 4.x (4.0 branch)

[scicloj/tablecloth "4.04"]

Introduction

tech.ml.dataset is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger tech.ml stack.

I’ve started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: dplyr, tidyr and data.table.

During conversions of the examples I’ve come up how to reorganized existing tech.ml.dataset functions into simple to use API. The main goals were:

  • Focus on dataset manipulation functionality, leaving other parts of tech.ml like pipelines, datatypes, readers, ML, etc.
  • Single entry point for common operations - one function dispatching on given arguments.
  • group-by results with special kind of dataset - a dataset containing subsets created after grouping as a column.
  • Most operations recognize regular dataset and grouped dataset and process data accordingly.
  • One function form to enable thread-first on dataset.

Important! This library is not the replacement of tech.ml.dataset nor a separate library. It should be considered as a addition on the top of tech.ml.dataset.

If you want to know more about tech.ml.dataset and dtype-next please refer their documentation:

Join the discussion on Zulip

Documentation

Please refer detailed documentation with examples

Usage example

(require '[tablecloth.api :as api])
(-> "https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv"
    (api/dataset {:key-fn keyword})
    (api/group-by (fn [row]
                    {:symbol (:symbol row)
                     :year (tech.v3.datatype.datetime/long-temporal-field :years (:date row))}))
    (api/aggregate #(tech.v3.datatype.functional/mean (% :price)))
    (api/order-by [:symbol :year])
    (api/head 10))

_unnamed [10 3]:

:symbol :year :summary
AAPL 2000 21.74833333
AAPL 2001 10.17583333
AAPL 2002 9.40833333
AAPL 2003 9.34750000
AAPL 2004 18.72333333
AAPL 2005 48.17166667
AAPL 2006 72.04333333
AAPL 2007 133.35333333
AAPL 2008 138.48083333
AAPL 2009 150.39333333

Contributing

Tablecloth is open for contribution. The best way to start is discussion on Zulip.

Development tools for documentation

Documentation is written in RMarkdown, that means that you need R to create html/md/pdf files. Documentation contains around 600 code snippets which are run during build. There are two files:

  • README.Rmd
  • docs/index.Rmd

Prepare following software:

  1. Install R
  2. Install rep, nRepl client
  3. Install pandoc
  4. Run nRepl
  5. Run R and install R packages: install.packages(c("rmarkdown","knitr"), dependencies=T)
  6. Load rmarkdown: library(rmarkdown)
  7. Render readme: render("README.Rmd","md_document")
  8. Render documentation: render("docs/index.Rmd","all")

Guideline

  1. Before commiting changes please perform tests. I ususally do: lein do clean, check, test and build documentation as described above (which also tests whole library).
  2. Keep API as simple as possible:
    • first argument should be a dataset
    • if parametrizations is complex, last argument should accept a map with not obligatory function arguments
    • avoid variadic associative destructuring for function arguments
    • usually function should working on grouped dataset as well, accept parallel? argument then (if applied).
  3. Follow potemkin pattern and import functions to the API namespace using tech.v3.datatype.export-symbols/export-symbols function
  4. Functions which are composed out of API function to cover specific case(s) should go to tablecloth.utils namespace.
  5. Always update README.Rmd, CHANGELOG.md, docs/index.Rmd, tests and function docs are highly welcomed
  6. Always discuss changes and PRs first

TODO

  • tests
  • tutorials

Licence

Copyright (c) 2020 Scicloj

The MIT Licence

tablecloth's People

Contributors

genmeblog avatar ashimapanjwani avatar daslu avatar

Stargazers

Roman avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.