GithubHelp home page GithubHelp logo

gd015pa's Introduction

Explanation of transformations in run_analysis.

Unzip the UCI HAR Dataset in your working directory, resulting in a ./UCI HAR Dataset - directory in working dir. This directory should contain two subdirs /train and /test

Description run_analysis starts here

Step 0: initialize and load data load dplyr-package load act-labels using column names 'act_id' and 'activity' load features.txt load x-dat: use the names in features as column-names while loading x-train and x-test data load subjects: subj-train and subj-test using 'subject' as column name load y-data: y-train and y-test using 'act_id' as column name

Step 1: merge train and test data 1a join x-train and x-test in x-complete using r-bind (each column is unique) join subj-train and subj-test in subj-complete using r-bind join y-train and y-test in y-complete using r-bind remove obsolete data 1b combine x-complete, subj-complete and y-complete using cbind (each row is unique) remove obsolete data

Step 2: select mean() and std() Only select 66 variables with mean... or std..., plus subject and act-id (see Codebook.md for explanation why only these variables were selected) total is created using select-statement from dplyr package, selected columns are subject, activity and each column contain(ing) mean... or std... (contains is used twice in this statement)

Step 3: descriptive activity names Descriptive activity names are added using merge total with act-label joining on 'act_id' Remove the now obsolete act_id

Step 4: descriptive labels Most of the labels are created while importing the data using the column names from features.txt (see step 0) In the provided labels the fBodyBody (dual body) whoch i found confusing is updated to single fBody for the existing occurences using find and replace gsub.

Step 5: create tidy dataset I choose to create a wide dataset with subject and activity-names as unique identifiers. For each column the average (mean) is calculated per subject/activity combination. Using a chain statement the following activities are completed

  • group by subject and activity
  • calculate the mean per column for each subject/activity combination this is done using the summerize-each function

Export the tidyMeans-wide to tidyMeans-wide.txt using write-table

gd015pa's People

Contributors

emiels avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.