Project for "Getting and Cleaning Data" Coursera course.
The R script run_analysis.R combines the "test" and "train" data sets and creates a tidy data summarizing the values they contain.
-
First, the script loads the "features.txt" file that contains the labels for the 500+ measurements contained in each line of the data files.
-
Next, the script reads the test and train raw data files, and selects only those columns corresponding to mean or standard deviation measurements.
-
Then, the script reads the activity type numbers (each taking on a value from 1-6) for each row of the tables constructed in step 2. These numbers are then replaced by the descriptive labels found in "activity_labels.txt"
-
Similarly to step 3, the Subject ID numbers are read and merged into a single vector.
-
This step merges all of the data together: the test/train data are merged, and two new columns are added, one for the activity and one for the subject ID.
-
Next, the melt and dcast commands are used to reformat and summarize the data. There are 30 subject ID's and 6 activities, so each of the two data sets (one for mean, one for standard deviation) has 180 rows.
-
Finally, these tidy data sets are written to two separate files.