GithubHelp home page GithubHelp logo

getdata_proj's Introduction

---
title: "README"
author: "Jerad Acosta"
date: "October 26, 2014"
output: html_document
---

First we start by loading libraries and the data sets from the working directory
```{r}
#Load Libraries
library(dplyr)
library(reshape2)

testdata <- read.table("./UCI HAR Dataset/test/X_test.txt")
test.index <- read.table("./UCI HAR Dataset/test/y_test.txt")
test.subjects <- read.table("./UCI HAR Dataset/test/subject_test.txt")
# load training data
traindata <- read.table("./UCI HAR Dataset/train/X_train.txt")
train.index <- read.table("./UCI HAR Dataset/train/y_train.txt")
train.subjects <- read.table("./UCI HAR Dataset/train/subject_train.txt")
# load features
features <- read.table("./UCI HAR Dataset/features.txt")

# bind test data with training data
totaldata <- rbind(testdata, traindata)
```

This code finds the appropriate Mean and Standard Deviation measurements
And then subsets the data to contain the observations of interest
Note the reasoning for how the subsetting was done.
For instance FreqMean was not considered since, according to the code book,
it contained the mean of frequencies in recording the data
as opposed to the mean of the actuals observations which we were interested in.
```{r}
# GOAL: find all mean calculations
# only using lower case mean to avoid angle measurements with 'Mean'
means <- grepl("mean", features$V2)
# remove meanfreq since not mean of measurement but of frequency components
# as described in features_info.txt document
freq <- grepl("Freq", features$V2)

# subset features without meanfreq
features <- features[!(freq),]
#subset 46 features with mean
mean.features <- features[means,]

## Goal: find all standard deviation calculations
std <- grepl("std", features$V2)
# subset 33 features calculating standard deviation
std.features <- features[std,]

# combine standard deviation and mean calculations
SnM.features <- rbind(mean.features, std.features)
SnM.features <- SnM.features[complete.cases(SnM.features),]

# subset total data to features of interest
totaldata <- totaldata[SnM.features$V1]
```

Here we are transforming the indexes from the code book into
understandable classifiers for the label variable
```{r}
# Join labels and Subsets
subjects <- rbind(test.subjects, train.subjects)
labels <- rbind(test.index, train.index)
# convert label index to label names
labels <- lapply(labels,function(x) gsub(1,"Walking",x))
labels <- lapply(labels,function(x) gsub(2,"Walking_Upstairs",x))
labels <- lapply(labels,function(x) gsub(3,"Walking_Downstairs",x))
labels <- lapply(labels,function(x) gsub(4,"Sitting",x))
labels <- lapply(labels,function(x) gsub(5,"Standing",x))
labels <- lapply(labels,function(x) gsub(6,"Laying",x))

# add Labels and Subject to total data
totaldata <- cbind(subjects, labels, totaldata)
```

Naming the Variables In accordance with tidy data
```{r}
# name variables
names(totaldata) <- c("Subject", "Label", as.character(SnM.features$V2))
```

This Creates the first tidy data set as well as
new feature and labal text files for the new code book
Finally create the tidy data set with average values
Per Subject Per Activity
```{r}
# Create new data frame with average of each variable for each activity
datamelt <- melt(totaldata, id=c("Subject", "Label"))
tidymean <- dcast(datamelt, Subject+Label~variable,mean)

# create new tidy data
write.table(totaldata, file = "tidyrawdata.txt", row.name=FALSE)
# create new feature text file for code book
write.table(SnM.features, file = "NewFeatures.txt")
write.table(totaldata$Label, file = "NewLabel.txt")
# Create tidy date with the average of each activity
write.table(tidymean, file = "tidyAvgPerSubjectPerAct.txt")
```
* Note this tidy data set contains the average of each variable value
as requested in the assignment.
Because some variables are standard deviations
their average comes out negative.
This does not mean the recorded value was negative
only that the average standard deviation of the recorded value was negative

getdata_proj's People

Contributors

irjerad avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.