GithubHelp home page GithubHelp logo

cyang-2014 / gettingandcleaningdata Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chinsengn/gettingandcleaningdata

0.0 1.0 0.0 136 KB

Coursera R Project for "Getting And Cleaning Data" module

R 100.00%

gettingandcleaningdata's Introduction

ReadMe for run_analysis.R

This .R script consist of 5 parts.

Part 1

  • load the dplyr package
  • make a lookup data table (ac_llk) wtih 2 columns - activity (number 1 - 6) and activity_nm (descriptive names) by reading the file activity_labels.txt

Part 2

Processing the files from the Train folder,

  • read y_train.txt into a - this contains the list of activities performed by each volunteer (identified by subject_id)
  • read subject_train.txt into s - this contains the list of subject ids associated with the activities performed
  • use cbind to merge them into 1 data table (train) with 2 columns - activity and subject_id
  • use mutate to add a 3rd column called data_typ that identifies this as the "Training" set
  • use left join to add activity_nm column by matching activity columns in train and ac_llk. Note that left join does not re-sort the rows in the data table
  • read the descriptive names for the measurement features from 'features.txt' into data table (features).
  • using grep, I get the positions and full names of the Mean() and Std() features, assign them to vectors meanStdColumnPos and meanStdColumnNames
  • read in the variables values from file X_train.txt into data table (v). The columns in v are not named but their names are listed in features.
  • select the mean and std columns from v using the col positions from meanStdColumnPos, this is data table v_meanstd_only which I then rename its columns with extracted names in meanStdColumnsNames
  • merge the train table with v_meanstd_only

The outcome of Part 2 is a data table - train (7352 x 83)

Part 3

Part 3 works like Part 2, except it processes corresponding files from the Test folder instead.

The outcome of Part 3 is a data table - test (2947 x 83)

Part 4

Taking the train and test data tables,

  • use rbind_list to merge their rows into one data table - all_data (10299 x 83)

Note that the columns in all_data are now: activity, subject_id, data_typ, activity_nm and all the measurment fields related to mean() or std().

Part 5

With all_data,

  • since we are not using them, throw out the data_typ, activity col by using -dat_typ, -activity in the select,
  • then group_by dataset by activity_nm and subject_id,
  • then summarise_each of the measurement variables by computing their mean
  • write the summarised data set into external txt file - summ_data.txt in the same path as this .R script

Note that col names of the measurement variables were not changed after the summarise_each function but their values are.

The output of Part 5 is a text file - summ_data.txt (180 x 81)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.