GithubHelp home page GithubHelp logo

fengxu-pku / intro_to_stats_r Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dejohnam/intro_to_stats_r

0.0 0.0 0.0 3.28 MB

A series of tutorials for R (data included). Starts with an introduction to R and works all the way to spatial regressions.

R 100.00%

intro_to_stats_r's Introduction

Introduction to (some spatial) statistics in R!

A series of tutorials for R (data included). Starts with an introduction to R and works all the way to spatial regressions. Below is a list of the files I include here, what the learning goals are, and how to access any required data to run the lab. Data sources are credited within the code.

What you'll find in the tutorial folder:

Tutorial_1_code.R

This introductory tutorial includes the basics of R and is meant to get novice students acquainted with coding in R Studio. They'll do simple math, assign variables, and play around with different data structures. No data are needed for this lab.

Tutorial_2_code.R

Tutorial 2 uses the Groundhog Day temperatures dataset. The code filters the dataset, creates new variables, and explores central tendency via boxplots, quantiles, etc. NOTE: we used the ggplot package for histograms and this was too difficult for students. I suggest using the base hist() function instead.

Tutorial_3_code.R

Tutorial 3 also uses the Groundhog Day temperatures dataset. This time, students will be guided through measuring skewness, z-scores, and working with probabilities. Note that the script shows students how to break down formulas instead of using R functions (e.g., skewnewss()).

Tutorial_4_code.R

This tutorial guides students through simple hypothesis testing. Students are also provided a CSV with messy variable names to clean. I've included the messy variables in the Groundhog Day file (MustUpdate), as well as the cleaned file that will work in the code (v2). More information on where you can find the data is below.

Tutorial_5_code.R

Tutorial 5 uses a fictional dataset about two towns (metropolis and townsville). Students are guided through various hypothesis tests including t-tests (paired, unpaired, wilcoxon) and a difference of proportion test.

Tutorial_6_code.R

This tutorial uses datasets provided by R and reviewed hypothesis testing from tutorial 4. ANOVA was NOT included but I think this would a good place to introduce if you plan to cover that concept.

Tutorial_7_code.R

Tutorial 7 covers correlation and simple linear regression. It uses data generated by UofT PhD candidate Jeff Allen (which I modified to be a smaller dataset). You could easily use a different data source and integrate into the code.

Tutorial_8_code.R

Tutorial 8 continues the regression analysis that we started in tutorial 7 and concludes with spatial autocorrelation. In order to map residuals/continue the analysis, you will need Toronto DA files. These files, along with the original CSV, and included in this repository. Please check the code to ensure the maps are generating properly.

Tutorial_9_code.R

This tutorial covers spatial regression and model selection. It continues to use the Toronto DA travel data. Please note we had some issues with the mapping code for some students, so I'd be sure to check all the mapping code for this week to ensure it works properly. VERY IMPORTANT: some of these models took several minutes to run. In the code, I've included the error and lag model outputs you can read in as text files in case you don't have time. The code to run the model is also included.

What you'll find in the data folder:

GroundhogDay

All data (including dictionaries) for use in tutorials 2, 3, 4. One version (MustUpdate) includes intentionally messed up variable names we had students edit in excel. v2 is cleaned and will work in the code.

TwoTowns

A fictional dataset I created for use in tutorial 5. Dictionary included.

Toronto_transport

Data collected by the Transportation Tomorrow survey, which Jeff Allen cleaned and I cut down immensely for use in tutorials. Thanks to Jeff for sharing his data! I also include two shapefiles for mapping and analysis purposes. toronto_da_centroids.zip is a point file while toronto_da_XY.zip is a polygon file. These files are quite large so I suggest finding another dataset for use in these tutorials (7,8,9). For tutorial 9, some models took several minutes to run. For this reason, I've included .txt files of the model outputs to save time in tutorial. These are included in this folder.

intro_to_stats_r's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.