ReadMe for run_analysis.R

This .R script consist of 5 parts.

Part 1

load the dplyr package
make a lookup data table (ac_llk) wtih 2 columns - activity (number 1 - 6) and activity_nm (descriptive names) by reading the file activity_labels.txt

Part 2

Processing the files from the Train folder,

read y_train.txt into a - this contains the list of activities performed by each volunteer (identified by subject_id)
read subject_train.txt into s - this contains the list of subject ids associated with the activities performed
use cbind to merge them into 1 data table (train) with 2 columns - activity and subject_id
use mutate to add a 3rd column called data_typ that identifies this as the "Training" set
use left join to add activity_nm column by matching activity columns in train and ac_llk. Note that left join does not re-sort the rows in the data table
read the descriptive names for the measurement features from 'features.txt' into data table (features).
using grep, I get the positions and full names of the Mean() and Std() features, assign them to vectors meanStdColumnPos and meanStdColumnNames
read in the variables values from file X_train.txt into data table (v). The columns in v are not named but their names are listed in features.
select the mean and std columns from v using the col positions from meanStdColumnPos, this is data table v_meanstd_only which I then rename its columns with extracted names in meanStdColumnsNames
merge the train table with v_meanstd_only

The outcome of Part 2 is a data table - train (7352 x 83)

Part 3

Part 3 works like Part 2, except it processes corresponding files from the Test folder instead.

The outcome of Part 3 is a data table - test (2947 x 83)

Part 4

Taking the train and test data tables,

use rbind_list to merge their rows into one data table - all_data (10299 x 83)

Note that the columns in all_data are now: activity, subject_id, data_typ, activity_nm and all the measurment fields related to mean() or std().

Part 5

With all_data,

since we are not using them, throw out the data_typ, activity col by using -dat_typ, -activity in the select,
then group_by dataset by activity_nm and subject_id,
then summarise_each of the measurement variables by computing their mean
write the summarised data set into external txt file - summ_data.txt in the same path as this .R script

Note that col names of the measurement variables were not changed after the summarise_each function but their values are.

The output of Part 5 is a text file - summ_data.txt (180 x 81)

chinsengn / gettingandcleaningdata Goto Github PK

gettingandcleaningdata's Introduction

ReadMe for run_analysis.R

Part 1

Part 2

Part 3

Part 4

Part 5

gettingandcleaningdata's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs