GithubHelp home page GithubHelp logo

multiline's Introduction

Introduction

multiline is an R package for reading data from multiline fixed-width-formatted (FWF) files. This format is like that of typical FWF files, except that data for a given observation wraps after some number of columns to span a fixed number of rows.

Digitized punch card data are often found in multiline FWF format. If data for each observation exceeded the horizontal space on a card (conventionally 80 columns), additional decks of cards were used. When digitized, their rows were were often interleaved so that data for each observation would appear in consecutive rows, one for each card.

Installation

Install from GitHub with devtools:

if (!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jamesdunham/multiline")

Background

Consider the following multiline FWF (MFWF) data. As with FWF data, parsing requires the column positions of each field (ie, variable). But furthermore, we need the line position of each field.

123456789
789      
987654321
987      

Parsing requires:

  • The column positions of each field, as with FWF data;
  • The number of lines per observation; and
  • The line position of each field.

Suppose there are 2 lines per observation in the data; field1 occupies columns 1-4 of line 1; field2 columns 5-9 of line 1; and field3 columns 1-3 of line 2.

123456789  [line 1, obs. 1]
789        [line 2, obs. 1]
987654321  [line 1, obs. 2]
987        [line 2, obs. 2]

The purpose of multiline is reading this data into a tidy table:

obs field 1  field 2  field 3
  1    1234    56789      789
  2    9876    54321      987

Usage

Specify the column and line positions of each field in a table or list of tables. multiline imports the fwf_ functions from readr to help with this task.

As a list:

positions <- list(
  fwf_positions(start = c(1, 5), end = c(4, 9), col_names = c('field1', 'field2')),
  fwf_positions(start = 1, end = 3, col_names = 'field3'))
positions
#> [[1]]
#> # A tibble: 2 x 3
#>   begin   end col_names
#>   <dbl> <dbl>     <chr>
#> 1     0     4    field1
#> 2     4     9    field2
#> 
#> [[2]]
#> # A tibble: 1 x 3
#>   begin   end col_names
#>   <dbl> <dbl>     <chr>
#> 1     0     3    field3

The line position of each field is implicit in the list order. Here, field1 and field2 are in line 1 and field3 is in line 2.

Given the data:

d <- "123456789\n789\n987654321\n9871"
d
#> [1] "123456789\n789\n987654321\n9871"

read_multiline() returns a tidy table with observations in rows and fields in columns. Note that read_multiline() requires that the number of items in the list of positions exactly match the number of lines in the MFWF.

tidy <- read_multiline(d, lines = 2, positions)
tidy
#> # A tibble: 2 x 3
#>   field1 field2 field3
#>    <int>  <int>  <int>
#> 1   1234  56789    789
#> 2   9876  54321    987

multiline's People

Contributors

jamesdunham avatar parrishb avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.