GithubHelp home page GithubHelp logo

isabella232 / meltr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from r-lib/meltr

0.0 0.0 0.0 548 KB

Read Non-Rectangular Text Data

Home Page: https://r-lib.github.io/meltr/

License: Other

R 31.01% C++ 68.99%

meltr's Introduction

meltr

R-CMD-check Codecov test coverage

The wicked witch of the west saying 'I'm Melting, Melting!!!!!'

The goal of ‘meltr’ is to provide a fast and friendly way to read non-rectangular data (like ragged forms of ‘csv’, ‘tsv’, and ‘fwf’).

Standard tools like readr::read_csv() can cope to some extent with unusual inputs, like files with empty rows or newlines embedded in strings. But some files are so wacky that standard tools don’t work at all, and instead you have to take the file to pieces and reassemble to get structured data you can work with.

The meltr package provides tools to do this.

Installation

You can install the released version of meltr from CRAN with:

install.packages("meltr")

Or you can install the development version with:

# install.packages("devtools")
devtools::install_github("r-lib/meltr")

The problem with non-rectangular data

Here’s a contrived example that breaks two assumptions made by common tools like readr::read_csv().

  1. There are more cells in some rows than others.
  2. There are mixed data types within each column.

In contrast, the melt_csv() function reads the file one cell at a time, importing each cell of the file into a whole row of the final data frame.

writeLines("Help,,007,I'm
1960-09-30,FALSE,trapped in,7,1.21
non-rectangular,data,NA", "messy.csv")

library(meltr)

melt_csv("messy.csv")
#> # A tibble: 12 × 4
#>      row   col data_type value          
#>    <dbl> <dbl> <chr>     <chr>          
#>  1     1     1 character Help           
#>  2     1     2 missing   <NA>           
#>  3     1     3 character 007            
#>  4     1     4 character I'm            
#>  5     2     1 date      1960-09-30     
#>  6     2     2 logical   FALSE          
#>  7     2     3 character trapped in     
#>  8     2     4 integer   7              
#>  9     2     5 double    1.21           
#> 10     3     1 character non-rectangular
#> 11     3     2 character data           
#> 12     3     3 missing   <NA>

The output of melt_csv() gives us:

  • A data frame of results – structured data about un-structured data!
  • Rows of data corresponding to cells of the input data.
  • Empty cells such as the cell on row 1, but not missing cells at the ends of rows 1 and 3.
  • The raw, unconverted data, no data type conversion is attempted – every value is imported as a string, and the data_type column merely gives meltr’s best guess of what the data types ought to be.

What are some ways you can you use this? To begin with, you can do some simple manipulations with ordinary functions.

For example you could extract the words.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data <- melt_csv("messy.csv")

data %>%
  filter(data_type == "character")
#> # A tibble: 6 × 4
#>     row   col data_type value          
#>   <dbl> <dbl> <chr>     <chr>          
#> 1     1     1 character Help           
#> 2     1     3 character 007            
#> 3     1     4 character I'm            
#> 4     2     3 character trapped in     
#> 5     3     1 character non-rectangular
#> 6     3     2 character data

Or find if there are missing entries.

data %>%
  filter(data_type == "missing")
#> # A tibble: 2 × 4
#>     row   col data_type value
#>   <dbl> <dbl> <chr>     <chr>
#> 1     1     2 missing   <NA> 
#> 2     3     3 missing   <NA>

meltr's People

Contributors

jimhester avatar nacnudus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.