GithubHelp home page GithubHelp logo

gsoc21's Introduction

Google Summer of Code 2021

Mayur Shende

Organisation: The R Project for Statistical Computing

Project: Automated (AutoML) tool to clean univariate time series at microscales

Tests

  • Easy: Download the imputeTestbench package and demonstrate it with a naturally occurring time series. Document it with RMarkdown.

  • Medium: Suggest possible updates or a new feature you would like to include in the next version of the imputeTestench package.

  • Hard: Develop a dummy code of 5 functions and a vignette and pass it with no Error/Warning/Note through https://win-builder.r-project.org/

Easy Test

Install the imputeTestbench package using install.packages("imputeTestbench") and load it.

library(imputeTestbench)
## Warning: package 'imputeTestbench' was built under R version 4.0.4

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(imputeTS)

Load temperature dataset. The dataset contains ~5 years of high temporal resolution (hourly measurements) temperature data from Vancouver, Canada. (https://www.kaggle.com/selfishgene/historical-hourly-weather-data?select=wind_speed.csv)

library(readr)
temperature <- read_csv("temperature.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   datetime = col_character(),
##   Vancouver = col_double()
## )
summary(temperature)
##    datetime           Vancouver    
##  Length:45252       Min.   :245.2  
##  Class :character   1st Qu.:279.2  
##  Mode  :character   Median :283.4  
##                     Mean   :283.9  
##                     3rd Qu.:288.6  
##                     Max.   :307.0  
##                     NA's   :794

Use imputeTestbench to compare various imputation methods

errs <- impute_errors(temperature$Vancouver, methods = c("na_interpolation", "na_locf", "na_ma", "na_kalman"), missPercentFrom = 10, missPercentTo = 50)
data.frame(errs)
##   Parameter MissingPercent na_interpolation   na_locf     na_ma na_kalman
## 1      rmse             10        0.1994888 0.3318171 0.2387271 0.1972355
## 2      rmse             20        0.2934247 0.5221524 0.3496052 0.2891276
## 3      rmse             30        0.3782250 0.7049150 0.4531255 0.3683607
## 4      rmse             40        0.4821345 0.9134725 0.5751792 0.4610488
## 5      rmse             50        0.6153928 1.1841495 0.7258125 0.5772984
plot_errors(errs, plotType = "line")

Medium Test

imputeTestbench is a great package for comparing various methods of imputation. Few updates that I would suggest are:

  • Introducing new sampling methods such as monte carlo.

  • Making the package interactive.

  • Developing a ShinyApp for the package.

Hard Test

For the final test, I created a R package called Calculator. It contains five functions, add(), substract(), multiply(), divide() and modulo() , with documentation and tests. Then, using devtools::check(), I checked for any errors or warning, and uploaded the source file to https://win-builder.r-project.org/. It passed without errors/warnings/note.

The result of https://win-builder.r-project.org/ is included in the github repository(00check.log).

gsoc21's People

Contributors

mayur1009 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.