GithubHelp home page GithubHelp logo

ubc-mds / corrr Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 4.0 219 KB

This R package is developed to help users calculate correlation coefficients and covariance matrix of a given data with missing values.

License: Other

R 100.00%

corrr's Introduction

Build Status

CorrR

Latest Update Date: 2019 Feb

Overview

This project is developed to help users calculate standard deviation, correlation coefficients and covariance matrix of a given data with missing values in both R and Python.

Team

Name Slack Handle Github.com Project branch
KERA YUCEL @KERA YUCEL @K3ra-y Kera's link
GOPALAKRISHNAN ANDIVEL @Krish @Gopsathvik Krish's link
WEISHUN DENG @Wilson Deng @xiaoweideng Wilson's link
Mengda Yu @Mengda(Albert) Yu @mru4913 Albert's link

Installation

CorrR can be installed in a R command window:

devtools::install_github("UBC-MDS/CorrR")

Branch Coverage Test

To test branch coverage, we use covr package. You can install by install.packages("covr").

You can double click the project and include the following in the command.

library(covr)

report()

The results are shown below.

alt text

Executing test_that tests in CorrR

To test the test coverage, we use devtools package. Installation of this package can be done by install.packages("devtools").

You can open the CorrR R project and execute the following code.

library(devtools)
load_all()
test()

The results are shown below.

alt text

Functions

Standard Deviation (std_plus)

Standard deviation calculates how close the data points to the mean, in which an insight for the variation of the data points. This function would automatically handle the missing values in the input.



std_plus will omit frustration from workflows.


Example:

> x <-  c(1,2, NA, 4, NA, 6)
> std_plus(x)
[1] 2.217356

> y <-  c(1,2, Inf, 4, NA, 6)
> std_plus(y)
[1] 2.217356

Correlation Coefficients (corr_plus)

Correlation coefficients calculates the relationship between two variables as well as the magnitude of this relationship. This function would automatically handle the missing values in the input.


Example:

> x <-  c(1, 2, NA, 4, 5)
> y <-  c(-6, -7, -8, 9, TRUE)
> corr_plus(x, y)
[1] 0.7391091

Covariance Matrix (cov_mx)

A Covariance matrix displays the variance and covariance together. This function would use the above two functions.



A covariance matrix displays the variance and covariance together. The diagonal elements represent the variances and the covariances are represented by the other elements in the matrix shown below.


Example:

> foo_matrix <- matrix(c(1, 2, NA, 4, 5, -6, -7, -8, 9, TRUE), 5)
> cov_mx(foo_matrix)
          [,1]     [,2]
[1,]  3.333333 10.00000
[2,] 10.000000 54.91667

How does CorrR package fits into the R ecosystem?

Following functions are already present in R ecosystem. However, missing values are not being handles for the following functions and CorrR package will implement calculation of standard deviation, correlation coefficients and covariance matrix.

R Standard Deviation: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sd.html

R Correlation Coefficients: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html

R Covariance Matrix: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html

Milestone Progress

Milestone Tasks
Milestone 1 Proposal
Milestone 2 Python package (CorrPy) is complete
Milestone 3 R package (CorrR) is complete

corrr's People

Contributors

gopsathvik avatar k3ra-y avatar mru4913 avatar xiaoweideng avatar

Watchers

 avatar  avatar

corrr's Issues

Milestone 03 feedback

Hi,

Thanks for your efforts in milestone 03 submission. I've just reviewed your submission and I have few comments/suggestions below.

  • Your package installation runs without any problems
  • You have provided very comprehensive test cases that I cannot think of any more
  • You have provided a vignette for the R package
  • You have provided an open source license
  • You have provided good documentation of your function
  • You provided your branch coverage results

Thanks!

Milestone 3 - Documentation in CorrR

  • Documentation
  • R package README (including usage and installation instructions)
    pip install git+PACKAGE_URL.git
    devtools::install_github("PACKAGE_NAME") for R packages
  • Code documentation
  • ensure that the test suite for each function provides 100% branch coverage (if possible). Please include documentation (e.g. as we did on the board in class) that shows you've checked for this.
  • functions
  • tests
  • vignette (for R packages only)

Proposal feedback

Hello,

Thank you for the effort you put into making the proposal. I just have one suggestion/recommendation below,

  • It will be better to specify how the functions will handle missing values (removing rows, substituting by averages,...)
    Thanks!

Milestone 01 feedback

Hello,
Thanks for your effort in the submission. You really did a really good job. I a few comments,

  • You should fix the formatting error in the description file, as when I tried to install the package it threw that error message:
Malformed Authors@R field:
  <text>:1:92: unexpected ','
1: person("Mengda (Albert)", "Yu", email = "[email protected]", role = c("aut", "cre")),
  • Good job on using Roxygen2 package in generating the function documentation templates.
  • In the corr_plus function, the description says that the purpose of the function is to calculate the "relationship". I believe if you replace "relationship" with "correlation" will be better./
  • Throughout the codes you always refer to "positive/negative" as "pos/neg", which may be confusing to some people.
  • I couldn't find the License file.
  • You did a great job in the tests.
  • In the test_std_plus, should Inf be treated as NA?

Milestone 3- R functions and tests (high priority)

Functions

The following summaries expectations for this milestone:

  • 3 R function (by March 1st.)
  • std_plus (Done)
  • corr_plus (Done)
  • cov_mx

Requirements:

  • Update test cases
  • As you develop your code we also expect you to update your code documentation so that it makes sense with any of the changes you have made.

Revisiting your test cases and writing additional test cases

You will be writing additional test cases for both your Python and R packages to make sure that your functions work as expected.


Licence

  • Adding a licence (Done)

Milestone 3 - Workflow in CorrR

Following GitHub Flow workflow

Starting from this milestone, you must follow the GitHub Flow workflow. Each team member must do at least one review and each member must have some part of their code reviewed by other team members.

  • Kera
  • Krish
  • Wilson
  • Albert

In particular, each team member will

  • create a branch
  • work on the function you are responsible for in this branch
  • add commits
  • open a pull request (you should be at this step by the end of the lab on Feb 12th)
  • wait for the code review and feedback from other team member
  • review code of other team member via their pull request
  • deploy your changes
  • merge if your branch is not causing any problems

Just some notes:

  1. In order to improve our coding format, it is better to limit to 120 characters per line. For further information, please read https://stackoverflow.com/questions/88942/why-does-pep-8-specify-a-maximum-line-length-of-79-characters
  2. To toggle the warning line in Studio. https://support.rstudio.com/hc/en-us/community/posts/207625357-Toggle-80-character-warning-line
  3. For atom. https://stackoverflow.com/questions/49616864/limiting-line-length-in-atom
  4. For VS code. https://stackoverflow.com/questions/29968499/vertical-rulers-in-visual-studio-code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.