The getdata_proj from irjerad

---
title: "README"
author: "Jerad Acosta"
date: "October 26, 2014"
output: html_document
---

First we start by loading libraries and the data sets from the working directory
```{r}
#Load Libraries
library(dplyr)
library(reshape2)

testdata <- read.table("./UCI HAR Dataset/test/X_test.txt")
test.index <- read.table("./UCI HAR Dataset/test/y_test.txt")
test.subjects <- read.table("./UCI HAR Dataset/test/subject_test.txt")
# load training data
traindata <- read.table("./UCI HAR Dataset/train/X_train.txt")
train.index <- read.table("./UCI HAR Dataset/train/y_train.txt")
train.subjects <- read.table("./UCI HAR Dataset/train/subject_train.txt")
# load features
features <- read.table("./UCI HAR Dataset/features.txt")

# bind test data with training data
totaldata <- rbind(testdata, traindata)
```

This code finds the appropriate Mean and Standard Deviation measurements
And then subsets the data to contain the observations of interest
Note the reasoning for how the subsetting was done.
For instance FreqMean was not considered since, according to the code book,
it contained the mean of frequencies in recording the data
as opposed to the mean of the actuals observations which we were interested in.
```{r}
# GOAL: find all mean calculations
# only using lower case mean to avoid angle measurements with 'Mean'
means <- grepl("mean", features$V2)
# remove meanfreq since not mean of measurement but of frequency components
# as described in features_info.txt document
freq <- grepl("Freq", features$V2)

# subset features without meanfreq
features <- features[!(freq),]
#subset 46 features with mean
mean.features <- features[means,]

## Goal: find all standard deviation calculations
std <- grepl("std", features$V2)
# subset 33 features calculating standard deviation
std.features <- features[std,]

# combine standard deviation and mean calculations
SnM.features <- rbind(mean.features, std.features)
SnM.features <- SnM.features[complete.cases(SnM.features),]

# subset total data to features of interest
totaldata <- totaldata[SnM.features$V1]
```

Here we are transforming the indexes from the code book into
understandable classifiers for the label variable
```{r}
# Join labels and Subsets
subjects <- rbind(test.subjects, train.subjects)
labels <- rbind(test.index, train.index)
# convert label index to label names
labels <- lapply(labels,function(x) gsub(1,"Walking",x))
labels <- lapply(labels,function(x) gsub(2,"Walking_Upstairs",x))
labels <- lapply(labels,function(x) gsub(3,"Walking_Downstairs",x))
labels <- lapply(labels,function(x) gsub(4,"Sitting",x))
labels <- lapply(labels,function(x) gsub(5,"Standing",x))
labels <- lapply(labels,function(x) gsub(6,"Laying",x))

# add Labels and Subject to total data
totaldata <- cbind(subjects, labels, totaldata)
```

Naming the Variables In accordance with tidy data
```{r}
# name variables
names(totaldata) <- c("Subject", "Label", as.character(SnM.features$V2))
```

This Creates the first tidy data set as well as
new feature and labal text files for the new code book
Finally create the tidy data set with average values
Per Subject Per Activity
```{r}
# Create new data frame with average of each variable for each activity
datamelt <- melt(totaldata, id=c("Subject", "Label"))
tidymean <- dcast(datamelt, Subject+Label~variable,mean)

# create new tidy data
write.table(totaldata, file = "tidyrawdata.txt", row.name=FALSE)
# create new feature text file for code book
write.table(SnM.features, file = "NewFeatures.txt")
write.table(totaldata$Label, file = "NewLabel.txt")
# Create tidy date with the average of each activity
write.table(tidymean, file = "tidyAvgPerSubjectPerAct.txt")
```
* Note this tidy data set contains the average of each variable value
as requested in the assignment.
Because some variables are standard deviations
their average comes out negative.
This does not mean the recorded value was negative
only that the average standard deviation of the recorded value was negative
irjerad / getdata_proj Goto Github PK

getdata_proj's Introduction

getdata_proj's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs