GithubHelp home page GithubHelp logo

data-documentr's Introduction

data-documentR

Lifecycle: experimental

An idea for an R package to help your write metadata for .csv files and data.frames

The core of the package is a function that prompts the user for metadata about data sets including a general description and column level details that depend on the data type (numeric, factor, date, etc.).

This package needs a better name and a convention for function names!

You can play with this development version at your own risk:

devtools::install_github("Aariq/data-documentR")

Try something like this:

write_with_meta(trees, here::here(trees.csv))

This metadata can then be written as markdown or text alongside a .csv file(s). Here's where I see this project going right now:

Features/roadmap:

  • Nags you every time you read or write a file to document the data (via wrappers to read.csv, read_csv, write.csv, write_csv, etc.?)
  • Allows documentation of R data.frames as you save them (i.e. a write_and_document_csv() type thing that prompts user for metadata and writes .csv AND matching .md)
  • Allows documentation of .csv's or folders of .csv's (i.e. a document_csv() that reads in csv's and prompts the user for metadata then writes matching .md's)
    • Ideally one single METADATA.md per folder, with all .csv's documented. Need ability to append this document rather than overwriting.
  • Memoisation? Don't prompt the user unless the data object or .csv has changed since it was last documented? This might be beyond my abilities and may not be necessary.
  • RStudio plugin that writes a data dictionary for a data.frame in .Rmd (similar to remedy pacakge)
  • A funciton that checks the project code for any files read in or out and makes sure you've documented everything?

Example output markdown

File: dataset1.csv

Description:

Plant growth data that was collected between june 2011 and july 2012 at the boston area climate experiment in Waltham, MA.

Columns:

  • species <fct>: The plant species used.
    Levels:
    • AM: Achillea milfolium
    • PL: Plantago lanceolata
  • height <dbl>: Plant height from ground to longest leaf
    • Units: cm
  • flnum <int>: Number of inflorescences
  • date <Date>: Date of measurment
    • Format: ISO (yyy-mm-dd)
    • Timezone: EDT
  • plot <chr>: A plot ID to be used as a blocking factor

File: dataset2.csv

...

data-documentr's People

Contributors

aariq avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

j450h1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.