The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.
One of the most exciting areas in all of data science right now is wearable computing - see for example this article. Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data represents data collected from the accelerometers from the Samsung Galaxy S smartphone.
A full description is available at the site where the data was obtained.
This repo contains the run_analysis.R script that does the following:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
Unless specified otherwise, all commands were run from within a R CLI session (on GNU Linux).
All data transformations are documented in the form of comments in run_analysis.R.
Note: All data files are kept in ./data.
CodeBook.md describes the variables, data and any transformations or work that were performed to clean up the data for this project.
The following steps we followed to download and extract the raw data:
> download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip", destfile="./data/getdata_projectfiles_UCI_HAR_Dataset.zip", method="curl")
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 59.6M 100 59.6M 0 0 339k 0 0:03:00 0:03:00 --:--:-- 378k
>
> unzip ("./data/getdata_projectfiles_UCI_HAR_Dataset.zip", exdir = "./data")
> list.files("./data")
[1] "getdata_projectfiles_UCI_HAR_Dataset.zip"
[2] "UCI HAR Dataset"
The raw data file is a little big so an exclusion entry was added to the project's .gitignore file to avoid pushing the file to Github.
To run the analysis you simply need the run_analysis.R script and the extracted data (see Save/Extract Raw Data) in the ./data directory.
From the working directory where you saved the script (and assuming you extracted the data in ./data), open a R console session and run:
source("run_analysis.R")
This will take a while as the script performs the required data processing and analysis set out in steps 1-5 above. You will most likely see the following warnings in the console that can be safely ignored:
Attaching package: ‘dplyr’
The following object is masked from ‘package:MASS’:
select
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Once done the tidy data set for numbers 1-4 will be stored in the totalData data frame:
> str(totalData)
'data.frame': 10299 obs. of 69 variables:
$ Subject : int 1 1 1 1 1 1 1 1 1 1 ...
$ DataType : Factor w/ 2 levels "Training","Test": 1 1 1 1 1 1 1 1 1 1 ...
$ Activity : Factor w/ 6 levels "LAYING","SITTING",..: 3 3 3 3 3 3 3 3 3 3 ...
$ tBodyAcc-mean()-X : num 0.289 0.278 0.28 0.279 0.277 ...
$ tBodyAcc-mean()-Y : num -0.0203 -0.0164 -0.0195 -0.0262 -0.0166 ...
$ tBodyAcc-mean()-Z : num -0.133 -0.124 -0.113 -0.123 -0.115 ...
$ tBodyAcc-std()-X : num -0.995 -0.998 -0.995 -0.996 -0.998 ...
$ tBodyAcc-std()-Y : num -0.983 -0.975 -0.967 -0.983 -0.981 ...
$ tBodyAcc-std()-Z : num -0.914 -0.96 -0.979 -0.991 -0.99 ...
$ tGravityAcc-mean()-X : num 0.963 0.967 0.967 0.968 0.968 ...
$ tGravityAcc-mean()-Y : num -0.141 -0.142 -0.142 -0.144 -0.149 ...
$ tGravityAcc-mean()-Z : num 0.1154 0.1094 0.1019 0.0999 0.0945 ...
$ tGravityAcc-std()-X : num -0.985 -0.997 -1 -0.997 -0.998 ...
$ tGravityAcc-std()-Y : num -0.982 -0.989 -0.993 -0.981 -0.988 ...
$ tGravityAcc-std()-Z : num -0.878 -0.932 -0.993 -0.978 -0.979 ...
$ tBodyAccJerk-mean()-X : num 0.078 0.074 0.0736 0.0773 0.0734 ...
$ tBodyAccJerk-mean()-Y : num 0.005 0.00577 0.0031 0.02006 0.01912 ...
$ tBodyAccJerk-mean()-Z : num -0.06783 0.02938 -0.00905 -0.00986 0.01678 ...
$ tBodyAccJerk-std()-X : num -0.994 -0.996 -0.991 -0.993 -0.996 ...
$ tBodyAccJerk-std()-Y : num -0.988 -0.981 -0.981 -0.988 -0.988 ...
$ tBodyAccJerk-std()-Z : num -0.994 -0.992 -0.99 -0.993 -0.992 ...
$ tBodyGyro-mean()-X : num -0.0061 -0.0161 -0.0317 -0.0434 -0.034 ...
$ tBodyGyro-mean()-Y : num -0.0314 -0.0839 -0.1023 -0.0914 -0.0747 ...
$ tBodyGyro-mean()-Z : num 0.1077 0.1006 0.0961 0.0855 0.0774 ...
$ tBodyGyro-std()-X : num -0.985 -0.983 -0.976 -0.991 -0.985 ...
$ tBodyGyro-std()-Y : num -0.977 -0.989 -0.994 -0.992 -0.992 ...
$ tBodyGyro-std()-Z : num -0.992 -0.989 -0.986 -0.988 -0.987 ...
$ tBodyGyroJerk-mean()-X : num -0.0992 -0.1105 -0.1085 -0.0912 -0.0908 ...
$ tBodyGyroJerk-mean()-Y : num -0.0555 -0.0448 -0.0424 -0.0363 -0.0376 ...
$ tBodyGyroJerk-mean()-Z : num -0.062 -0.0592 -0.0558 -0.0605 -0.0583 ...
$ tBodyGyroJerk-std()-X : num -0.992 -0.99 -0.988 -0.991 -0.991 ...
$ tBodyGyroJerk-std()-Y : num -0.993 -0.997 -0.996 -0.997 -0.996 ...
$ tBodyGyroJerk-std()-Z : num -0.992 -0.994 -0.992 -0.993 -0.995 ...
$ tBodyAccMag-mean() : num -0.959 -0.979 -0.984 -0.987 -0.993 ...
$ tBodyAccMag-std() : num -0.951 -0.976 -0.988 -0.986 -0.991 ...
$ tGravityAccMag-mean() : num -0.959 -0.979 -0.984 -0.987 -0.993 ...
$ tGravityAccMag-std() : num -0.951 -0.976 -0.988 -0.986 -0.991 ...
$ tBodyAccJerkMag-mean() : num -0.993 -0.991 -0.989 -0.993 -0.993 ...
$ tBodyAccJerkMag-std() : num -0.994 -0.992 -0.99 -0.993 -0.996 ...
$ tBodyGyroMag-mean() : num -0.969 -0.981 -0.976 -0.982 -0.985 ...
$ tBodyGyroMag-std() : num -0.964 -0.984 -0.986 -0.987 -0.989 ...
$ tBodyGyroJerkMag-mean() : num -0.994 -0.995 -0.993 -0.996 -0.996 ...
$ tBodyGyroJerkMag-std() : num -0.991 -0.996 -0.995 -0.995 -0.995 ...
$ fBodyAcc-mean()-X : num -0.995 -0.997 -0.994 -0.995 -0.997 ...
$ fBodyAcc-mean()-Y : num -0.983 -0.977 -0.973 -0.984 -0.982 ...
$ fBodyAcc-mean()-Z : num -0.939 -0.974 -0.983 -0.991 -0.988 ...
$ fBodyAcc-std()-X : num -0.995 -0.999 -0.996 -0.996 -0.999 ...
$ fBodyAcc-std()-Y : num -0.983 -0.975 -0.966 -0.983 -0.98 ...
$ fBodyAcc-std()-Z : num -0.906 -0.955 -0.977 -0.99 -0.992 ...
$ fBodyAccJerk-mean()-X : num -0.992 -0.995 -0.991 -0.994 -0.996 ...
$ fBodyAccJerk-mean()-Y : num -0.987 -0.981 -0.982 -0.989 -0.989 ...
$ fBodyAccJerk-mean()-Z : num -0.99 -0.99 -0.988 -0.991 -0.991 ...
$ fBodyAccJerk-std()-X : num -0.996 -0.997 -0.991 -0.991 -0.997 ...
$ fBodyAccJerk-std()-Y : num -0.991 -0.982 -0.981 -0.987 -0.989 ...
$ fBodyAccJerk-std()-Z : num -0.997 -0.993 -0.99 -0.994 -0.993 ...
$ fBodyGyro-mean()-X : num -0.987 -0.977 -0.975 -0.987 -0.982 ...
$ fBodyGyro-mean()-Y : num -0.982 -0.993 -0.994 -0.994 -0.993 ...
$ fBodyGyro-mean()-Z : num -0.99 -0.99 -0.987 -0.987 -0.989 ...
$ fBodyGyro-std()-X : num -0.985 -0.985 -0.977 -0.993 -0.986 ...
$ fBodyGyro-std()-Y : num -0.974 -0.987 -0.993 -0.992 -0.992 ...
$ fBodyGyro-std()-Z : num -0.994 -0.99 -0.987 -0.989 -0.988 ...
$ fBodyAccMag-mean() : num -0.952 -0.981 -0.988 -0.988 -0.994 ...
$ fBodyAccMag-std() : num -0.956 -0.976 -0.989 -0.987 -0.99 ...
$ fBodyBodyAccJerkMag-mean() : num -0.994 -0.99 -0.989 -0.993 -0.996 ...
$ fBodyBodyAccJerkMag-std() : num -0.994 -0.992 -0.991 -0.992 -0.994 ...
$ fBodyBodyGyroMag-mean() : num -0.98 -0.988 -0.989 -0.989 -0.991 ...
$ fBodyBodyGyroMag-std() : num -0.961 -0.983 -0.986 -0.988 -0.989 ...
$ fBodyBodyGyroJerkMag-mean(): num -0.992 -0.996 -0.995 -0.995 -0.995 ...
$ fBodyBodyGyroJerkMag-std() : num -0.991 -0.996 -0.995 -0.995 -0.995 ...
The tidy data set for number 5 can be found in the summary variable:
> str(summary)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 69 variables:
$ Subject : int 1 2 3 4 4 5 6 7 7 8 ...
$ Activity : Factor w/ 6 levels "LAYING","SITTING",..: 3 3 3 2 3 3 3 2 3 2 ...
$ DataType : chr "Training" "Test" "Training" "Test" ...
$ tBodyAcc-mean()-X : num 0.266 0.273 0.273 0.273 0.275 ...
$ tBodyAcc-mean()-Y : num -0.0183 -0.0191 -0.0179 -0.0196 -0.013 ...
$ tBodyAcc-mean()-Z : num -0.108 -0.116 -0.106 -0.113 -0.105 ...
$ tBodyAcc-std()-X : num -0.546 -0.606 -0.623 -0.282 -0.727 ...
$ tBodyAcc-std()-Y : num -0.368 -0.429 -0.48 -0.176 -0.636 ...
$ tBodyAcc-std()-Z : num -0.503 -0.589 -0.654 -0.549 -0.77 ...
$ tGravityAcc-mean()-X : num 0.745 0.661 0.708 0.764 0.685 ...
$ tGravityAcc-mean()-Y : num -0.0826 -0.1472 -0.0261 0.0443 0.1384 ...
$ tGravityAcc-mean()-Z : num 0.0723 0.1349 0.0481 0.1255 0.1788 ...
$ tGravityAcc-std()-X : num -0.96 -0.963 -0.966 -0.957 -0.965 ...
$ tGravityAcc-std()-Y : num -0.951 -0.96 -0.945 -0.939 -0.942 ...
$ tGravityAcc-std()-Z : num -0.926 -0.945 -0.927 -0.949 -0.938 ...
$ tBodyAccJerk-mean()-X : num 0.0771 0.0785 0.0702 0.0776 0.0794 ...
$ tBodyAccJerk-mean()-Y : num 0.01659 0.00709 0.01447 0.01741 -0.00174 ...
$ tBodyAccJerk-mean()-Z : num -0.009108 0.000756 -0.000527 -0.003608 -0.008798 ...
$ tBodyAccJerk-std()-X : num -0.525 -0.558 -0.635 -0.337 -0.743 ...
$ tBodyAccJerk-std()-Y : num -0.47 -0.492 -0.557 -0.251 -0.71 ...
$ tBodyAccJerk-std()-Z : num -0.717 -0.742 -0.796 -0.754 -0.877 ...
$ tBodyGyro-mean()-X : num -0.0209 -0.0517 -0.0248 -0.0233 -0.0311 ...
$ tBodyGyro-mean()-Y : num -0.0881 -0.0568 -0.0744 -0.084 -0.0767 ...
$ tBodyGyro-mean()-Z : num 0.0863 0.0873 0.0867 0.0869 0.0991 ...
$ tBodyGyro-std()-X : num -0.687 -0.711 -0.699 -0.483 -0.784 ...
$ tBodyGyro-std()-Y : num -0.451 -0.723 -0.763 -0.668 -0.848 ...
$ tBodyGyro-std()-Z : num -0.597 -0.635 -0.709 -0.603 -0.773 ...
$ tBodyGyroJerk-mean()-X : num -0.0971 -0.0876 -0.0992 -0.1162 -0.1047 ...
$ tBodyGyroJerk-mean()-Y : num -0.0417 -0.0434 -0.0402 -0.0359 -0.0416 ...
$ tBodyGyroJerk-mean()-Z : num -0.0471 -0.0558 -0.0521 -0.0493 -0.061 ...
$ tBodyGyroJerk-std()-X : num -0.638 -0.672 -0.689 -0.503 -0.807 ...
$ tBodyGyroJerk-std()-Y : num -0.634 -0.784 -0.843 -0.85 -0.922 ...
$ tBodyGyroJerk-std()-Z : num -0.665 -0.675 -0.743 -0.566 -0.817 ...
$ tBodyAccMag-mean() : num -0.454 -0.535 -0.563 -0.242 -0.683 ...
$ tBodyAccMag-std() : num -0.497 -0.553 -0.591 -0.345 -0.705 ...
$ tGravityAccMag-mean() : num -0.454 -0.535 -0.563 -0.242 -0.683 ...
$ tGravityAccMag-std() : num -0.497 -0.553 -0.591 -0.345 -0.705 ...
$ tBodyAccJerkMag-mean() : num -0.545 -0.588 -0.65 -0.383 -0.76 ...
$ tBodyAccJerkMag-std() : num -0.516 -0.512 -0.608 -0.384 -0.747 ...
$ tBodyGyroMag-mean() : num -0.475 -0.615 -0.643 -0.433 -0.741 ...
$ tBodyGyroMag-std() : num -0.5 -0.681 -0.674 -0.526 -0.775 ...
$ tBodyGyroJerkMag-mean() : num -0.64 -0.747 -0.784 -0.686 -0.869 ...
$ tBodyGyroJerkMag-std() : num -0.652 -0.74 -0.804 -0.733 -0.886 ...
$ fBodyAcc-mean()-X : num -0.532 -0.574 -0.626 -0.335 -0.74 ...
$ fBodyAcc-mean()-Y : num -0.406 -0.433 -0.502 -0.184 -0.655 ...
$ fBodyAcc-mean()-Z : num -0.596 -0.63 -0.7 -0.612 -0.809 ...
$ fBodyAcc-std()-X : num -0.553 -0.62 -0.624 -0.264 -0.724 ...
$ fBodyAcc-std()-Y : num -0.39 -0.465 -0.503 -0.228 -0.651 ...
$ fBodyAcc-std()-Z : num -0.499 -0.601 -0.657 -0.553 -0.768 ...
$ fBodyAccJerk-mean()-X : num -0.547 -0.562 -0.646 -0.373 -0.758 ...
$ fBodyAccJerk-mean()-Y : num -0.507 -0.509 -0.583 -0.285 -0.721 ...
$ fBodyAccJerk-mean()-Z : num -0.695 -0.716 -0.78 -0.721 -0.865 ...
$ fBodyAccJerk-std()-X : num -0.544 -0.595 -0.658 -0.36 -0.752 ...
$ fBodyAccJerk-std()-Y : num -0.466 -0.509 -0.56 -0.266 -0.718 ...
$ fBodyAccJerk-std()-Z : num -0.738 -0.767 -0.811 -0.786 -0.889 ...
$ fBodyGyro-mean()-X : num -0.623 -0.639 -0.642 -0.366 -0.746 ...
$ fBodyGyro-mean()-Y : num -0.505 -0.722 -0.775 -0.733 -0.869 ...
$ fBodyGyro-mean()-Z : num -0.554 -0.602 -0.671 -0.515 -0.755 ...
$ fBodyGyro-std()-X : num -0.708 -0.735 -0.719 -0.522 -0.797 ...
$ fBodyGyro-std()-Y : num -0.43 -0.727 -0.759 -0.638 -0.838 ...
$ fBodyGyro-std()-Z : num -0.65 -0.683 -0.751 -0.676 -0.802 ...
$ fBodyAccMag-mean() : num -0.478 -0.515 -0.579 -0.327 -0.706 ...
$ fBodyAccMag-std() : num -0.59 -0.647 -0.663 -0.461 -0.753 ...
$ fBodyBodyAccJerkMag-mean() : num -0.499 -0.51 -0.605 -0.357 -0.74 ...
$ fBodyBodyAccJerkMag-std() : num -0.542 -0.519 -0.616 -0.426 -0.758 ...
$ fBodyBodyGyroMag-mean() : num -0.535 -0.7 -0.717 -0.577 -0.809 ...
$ fBodyBodyGyroMag-std() : num -0.567 -0.725 -0.704 -0.575 -0.792 ...
$ fBodyBodyGyroJerkMag-mean(): num -0.646 -0.752 -0.81 -0.72 -0.884 ...
$ fBodyBodyGyroJerkMag-std() : num -0.686 -0.744 -0.81 -0.772 -0.898 ...
- attr(*, "vars")=List of 1
..$ : symbol Subject
- attr(*, "drop")= logi TRUE