neonscience / neon-data-skills Goto Github PK

Self-paced tutorials that review key data literacy concepts and data analysis skills. Published materials can be found at:

Home Page: https://www.neonscience.org/resources/learning-hub/tutorials

License: GNU Affero General Public License v3.0

HTML 81.35% R 1.61% Jupyter Notebook 16.18% Python 0.85% PHP 0.01%

tutorial r neon data-skills ecology python

neon-data-skills's Introduction

Welcome to the NEON Data Skills GitHub Repo!

NEON Data Skills provides tutorials and resources for working with scientific data, including that collected by the National Ecological Observatory Network (NEON). NEON is an ecological observatory that will collect and provide open data for 30 years.

For more information on NEON, visit the website: www.neonscience.org

This repo contains the materials used to build the NEON Data Skills resources that are available for use on the NEON Data Skills section of the NEON website.

Version 2.0 Note that as of 11/20/2020 NEON has updated this repo to version 2.0 with several changes. This transition coencides with the upgrade to the www.neonscience.org website. Most notably, the default repository has been changed from 'master' to 'main', and the 'master' repository is now deprecated, renamed to 'old-master', and will not be maintained. The new 'main' repository has been re-organized so that each tutorial markdown file (*.md), and all of its associated files (*.Rmd/*.ipynb, *.r/*.py, *.html, and code-generated figures) are now contained within a single directory within the /tutorials/ directory, rather than distributed across multiple high-level directories (/code/, /graphics/, etc.).

Usage

Contributing

If you would like to make a change to one of the resources, please fork the repo and make a pull request to the main branch with the suggested change. All pull requests are reviewed for scientific accuracy and educational pedagogy. Individuals who make significant contributions can request to be listed as contributors on individual tutorials.

Please start by copying the template (/tutorials-in-development/0_templates_style_guide/tutorial-template-.) if you are creating a new tutorial.

Questions?

Having a problem getting something to work or want to know why the repo is setup in a certain way? Email us neondataskills -AT- BattelleEcology.org or file a GitHub Issue.

Credits & Acknowledgements

The National Ecological Observatory Network is a project solely funded by the National Science Foundation and managed under cooperative agreement by Battelle. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

License

GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Disclaimer

Information and documents contained within this repository are available as-is. Codes or documents, or their use, may not be supported or maintained under any program or service and may not be compatible with data currently available from the NEON Data Portal.

neon-data-skills's People

Contributors

Stargazers

Watchers

Forkers

chrlaney cflagg fluby scelmendorf dlebauer jduckles mjones01 lwasser ethanwhite leo72 naupaka marconis natemow neon-leslie neondataskills mwpatterson qinlab weecology cflorian1 bnasr ddurden awoods1 anhnguyendepocen bridgethass reza-khatami yokaddoura anikap22 riemino chezulkhairi kexu2014 sohailkhanmarwat icompbioutc kailepler tdelpriore pchalee karleerbradley sumesh1 ndurden frencil bamurphy5 yangdi1031 kathleenruth carlos-alberto-silva grguerrero mchailey mayastahl fe-sa crandalle maxwell-burner garrettmw mpmugnani jbrown7659 kmccahill adernbach burlingamet douevencoder benonearth glitt13 eshrh lizlarue droath donal-at-neon cklunch dunazo klevan savannahrayegonzales dataintensiveecology srweintraub zzw0034 natalie-robinson xjy11 sokole kjones13 schatterje22 sarapaull edwardayres cderanek samajibade sydnerecord bhensley0440 shashi-konduri lijo8146 valentinedwv tguo4 hughcross atworek-data arlenemegill kellykapsar

neon-data-skills's Issues

A few thoughts on the hyperspectral lesson

Hi @lwasser

I had just a few thoughts on this lesson, take them or leave them:

Would it be worth show another way to set a raster extent (e.g., show setting the extent with and without having to scale the data)?
Maybe we could try to bring it all together by saying that you are going to have to do this every time you try to look at a dataset, so you should always start by getting the required info when you load the data (e.g., xMin, yMin, etc.)
Kinda along the same lines - you could start the lesson by prefacing that these are the things we are going to need to get from the data in order to project a raster correctly in R: min/max X/Y, extent, etc. I think a bit more focus on the fact that these are needed by R, as it is not inherently a spatial application, might help people initially understand why they need to do all of this stuff.

Add images to the Lidar raster lesson that colin created.

Cleanup Activities

Remove / delete the following files

A org/category/coding-and-informatics.md
A org/category/remote-sensing.md
A org/tag/Data-Workshops.md
A org/tag/Workshop-Trials.md
A index copy.md

Remove all files in this dir (put into the less development repo for the time being.

_posts/IGNORE-FOR-NOW-cheatSheets

NEONInc link at top of NEONDataSkills page broken

On main NEONdataskills.org page, the link back to NEONinc.org is broken. Current link is neoninc.org and code needs to change to www.neoninc.org.
I'm happy to change but I can't find the file that contains this code.

HDFView lesson issues with hypersctral image

I am not getting the hyperspectral image to display properly in HDFView. I am just seeing a black screen.
I am wondering if it is related to this:
on http://neondataskills.org/HDF5/Exploring-Data-HDFView/, it states:
Under height, make sure dim 1 is selected.
Under width, make sure dim 3 is selected.

However, there are only dim 0-dim2 available in my drop down menus in either HDF View 2.10 or 2.11. Thoughts?

Comments on newest versions or lessons for ESA

Raster Data in R:

How difficult would it be to take the NLCD class graphic at the beginning and add a blowup of one small region that shows the actual grid at finer scale (e.g., exampleGraphic.ppt emailed)?
Typo right after UTM graphic: You can also view the rasters min and max values and the range of values containes within the pixels.

Create Canopy Height Model:

Good link to differences between DSM/DTM/DEM: https://imageryspeaks.wordpress.com/2011/12/20/dem-versus-dtm-versus-dsm/
All of the following is commented out (I can't include the hashtags here or the font gets all weird in the issue):

"Create a function that subtracts one raster from another
canopyCalc <- function(x, y) {
return(x - y)
}

use the function to create the final CHM
then plot it.
You could use the overlay function here
chm <- overlay(dsm,dtm,fun = canopyCalc)
but you can also perform matrix math to get the same output.
chm <- canopyCalc(dsm,dtm)"

It is actually a bit confusing as it follows the simplest way to do the calculation (dsm-dtm). Why not simplify and delete this?

In the intro paragraph to part 2 you have a typo in "To figure tihs out"
There needs to be a bit more background on: 1) why you're using both vst and tree height data, and what vst data actually are; 2) why you calculate the max height for each plot for the comparison, as opposed to just getting the height measurement at the pixel corresponding to the CHM measurement and comparing that to the CHM measurement.
Your description of calculating data from plots mentions circular plots but the graphic shows square plots (until variation 3 at least, but it might be more intuitive to talk about circular plots until you introduce the code to make square buffers)...
In variation 2, why is the code commented out?
The commented out code for calculating max stem height in option 2 does not work, because the object 'insitu_inCentroid$stemheight is not recognized (it's used throughout the code block).

Intro do HDF5:

Maybe a bit more brief intro on what HDF5 is and why we would want to use it (e.g., a regular directory on your hard drive can also store heterogeneous datasets, what is the impotous for an HDF5 file?)
Just a preference, but to me it might make more sense to assign hdf attributes with the name first, then the attribute. Like: h5writeAttribute(did2,name="Location",attr="SJER"). Not a big deal, it just seems a little more intuitive to name the thing first.

check out links on hyperspectral h5 page

HDF5 hyperspectral lesson -- look at the last commit. we may need to adjust some of these links but should wait until we have the new file structure in place!

Join function in pheno-time series tutorial

Hello,

I'm coming up with this warning message when I try to join data frames. I don't know what this means or if the command was successful..

Create a new dataframe "phe_ind" with all the data from status and some from ind_lastnoD

phe_ind <- left_join(status_noD, ind_lastnoD)
Joining, by = c("namedLocation", "domainID", "siteID", "plotID", "individualID")
Warning messages:
1: Column namedLocation joining character vector and factor, coercing into character vector
2: Column domainID joining character vector and factor, coercing into character vector
3: Column siteID joining character vector and factor, coercing into character vector
4: Column plotID joining character vector and factor, coercing into character vector
5: Column individualID joining character vector and factor, coercing into character vector

Need to finish cleaning up the HDF5 code!

Once this code is clean it's time to test!!!

When the site is forked, the CNAM and config files are copied

We need a way to ignore changes to those files so i can keep them current in my repo but they aren't pushed back with changes from other repos

hyperspectral lesson - some math weirdness in the example

Full Width Half Max (FWHM)

The full width half max (FWHM) will also often be reported in a multi or hyperspectral dataset. This value represents the spread of the band around that center point.

The Full Width Half Max (FWHM) of a band relates to the distance in nanometers between the band center and the edge of the band. In this case, the FWHM for Band C is 2.5 nm.

Organismal Data - code

Working with Sarah E and Kate on this.

Add plotly code (in the R script) to the "chm lesson"

Add links to categories and tags / build out pages with some seed text

need to figure out how to link to the tags using the slug text rather than the tag itself which might have spaces.

Then need to add a few sentences of text to each tag and category page as this could optimize SEO for these pages? need to look into SEO for GH pages.

Add Actual precip data to RHDF5 lesson

Rather than making up some data - it would be good to import a csv and add it to the H5 file.

Also add a "create HDF5 lesson" part one that is very simple.

import a csv.
create the H5 file
add the csv to the H5 file
add some metadata to the H5 file

Done.

Have the looping be optional part 2. it's way too complicated as written

Add interactive example with Binder

This Issue is migrated from NEONScience/NEON-utilities Issue #43.

Add spectrometer HDF5 activity to the files.

HDF5 Hyperspectral Lesson - Subset doesn't match our distribution

See comments:
http://neondataskills.org/HDF5/Imaging-Spectroscopy-HDF5-In-R/#comment-2479441671

The data subset is an OLDER version of the H5 file. Thus the code is not helping our user community. We need to

Ask josh to create a new subset that exactly mimicks the H5 file format we are distributing now.
redo this lesson to match our current format.

Adjust category and tag structure

informatics and programming

Remote sensing science

Gis Geographic information systems

Tags
Lidar
Imaging spectroscopy
Hdf 5
Ecological analysis
R programming
Python

Minor comments for Intro-To-HDF5-In-R

correct rhfd5 in Goals/Objectives to rhdf5
where do we find the D17_2013_vegStr.csv file referenced in the Challenge?

NEON,Inc (and old emails/twitter) changed throughout website

Twitter: @NEON_sci
website: www.neonscience.org
and text that still says NEON,Inc

This is very true on the workshop pages.

Errors in file tutorials/R/biodiversity/aquatic-macroinvertebrates/02_ecocomDP_workflow_with_NEON_algae/02_ecocomDP_workflow_with_NEON_algae.R

As part of automated testing, the following errors were found (some of these are because of function overrides due to the order in which packages were loaded in my environment, but this will occur for other users as well):

Lines: 45; 48;51; 54-56; 87

return NULL

Lines: 87-90; 92-94; 103

return "... $ operator is invalid for atomic vectors".
To fix this, remove "[[1]]"

Line 145

returns "Error: by must be supplied when x and y have no common variables."
This is because the object created on line 141 ('my_data_summed') has a single column and a single value. To fix this, update line 143 to read "dplyr::summarize(value = sum(value, na.rm = FALSE))"

Line 175

returns ""unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’
Fix by updating line 177 to "dplyr::select(event_id, taxon_id, value)"

Finish cleaning up intro to R materials

Finish cleaning up the first R intro materials.

Add RGDAL to NIS workshop

Per dave moore:
The rgdal package is required for this workshop
http://neondataskills.org/HDF5/Imaging-Spectroscopy-HDF5-In-R/

Lidar change detection

get in touch with Tristan about Boulder data. It would be cool to put a change detection series together since he's done the work already

curly bracket placement issue for automated script testing

@cklunch, can you move the opening curly bracket for a function onto the same line as the function call, as described below? My automated testing script needs to identify functions and this is very difficult if the curly bracket is not on the same line as the call.

Thank you!

File: ~tutorials/R/Geospatial-skills/primer-raster-r/Creating-Square-Plot-Boundaries-From-Centroids-in-R/Creating-Square-Plot-Boundaries-From-Centroids-in-R.R (and associated rmd, html, etc. files)

Line: 55

OLD code:
polys <- SpatialPolygons(mapply(function(poly, id)
{
xy <- matrix(poly, ncol=2, byrow=TRUE)
Polygons(list(Polygon(xy)), ID=id)
},
split(square, row(square)), ID),
proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

NEW code request:
polys <- SpatialPolygons(mapply(function(poly, id){
xy <- matrix(poly, ncol=2, byrow=TRUE)
Polygons(list(Polygon(xy)), ID=id)
},
split(square, row(square)), ID),
proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

I am also happy to make the change and submit a PR, I just need to quickly walk through the procedure for creating all the files properly.

Update Tutorials to Match New NEON Data

Many of these tutorials are built on data using older NEON file structures.
Options:

update the associated data and the tutorial to match current file structures as they should now be (more) stable.
make notes at the top of each tutorial for which structure no longer matches the data on data.neonscience.org
do nothing.

Add RHS approach to DPLYR

Just one minor suggestion for the DPLYR code: you can perhaps show a second way of using pipes with the right-hand-symbol (RHS) e.g. insitu_inCentroid %>% group_by(plotid) %>% summarise(max = max(stemheight)) -> insitu_maxStemHeight

http://neondataskills.org/lidar-data/lidar-data-rasters-in-R/

Add some descriptive text on setting breaks / color maps for rasters

We added a section on applying color maps with specific break points when rendering rasters in R. we should add some background on what a break is and how a color map works. There's got to be something out there on this!

http://neondataskills.org/Data-Workshops/NEON-lidar-Rasters-R/

meters issue is Raster 01

See this issue in Data Carpentry repo: datacarpentry/r-raster-vector-geospatial#86

Cheatsheets

There is a duplicated cheatsheet page.

Fix formatting on DPLYR Lessons and clean up title

feedback from Sarah E

On the website there is some weird formatting stuff going on, with ?? showing up where it shouldn’t, example if you’??re dealing with large data it’??s
http://neondataskills.org/R/GREPL-Filter-Piping-in-DPLYR-Using-R/

-change title of the dplyr lesson to since it's largely not about grepl? Maybe just leaving it as filter-piping-in-DPLYR-Using-R.

Where do people start

Community Feedback:
"My one problem is there's no real guidance on recommended order (if there is one). I'm not really sure where to get started or if there is even any sort of linear path through all of the lessons. It might benefit from a "start here", especially for those that are new to all of the topics."

We need to address the use case of those coming to the site who don't necessarily have a path in mind, but want to start working with data. Perhaps we create a few lesson set "workshop" like pages as a test?

Interactive maps?

From a user comment about making maps interactive:

But why not use interactive graphics?
library(mapview)
mapview(, col.regions = grey.colors, alpha = 1) +
mapview(, alpha = 0.6)

More info here:
http://environmentalinformatics-marburg.github.io/mapview/introduction.html

This isn't the goal of the current lessons, but could be incorporated into future lessons.

HDF-5 temp data - is the datetime conversion wrong?

Looking at the temp data figure, it seems like the min temp is reached each day ~noon?
I think timezone issues here.

tz EST is not in fact eastern standard time. See:
http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
What we want for a FL site is in fact called 'America/New_York'
Data is stored as UTC (GMT)
Then we need to convert it local (eastern) time to make the diurnal fluctations make sense. Suggested code improvement

#read in as UTC
temp$date <- as.POSIXct(temp$date ,format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
#convert to local time for pretty plotting
attributes(temp$date)$tzone <- "America/New_York"

@lwasser

Institute Page needs updating

http://neondataskills.org/workshop-event/NEON-Work-With-Data-Insitute-2016

name change -> to NEON Data Institute
Incorrect formatting on last line of list
Broken link for more information @ bottom of page: link to neonscience -> http://www.neonscience.org/learn-experience/work-data-institute
Do we want to add link for the Data Institute webpage?

R markdown Chunk names

Hey Leah,
I noticed in Intro-HDF5-R.Rmd that the r chunks don't have descriptive names. These help one navigate the chunks easily in the drop-down menus in Rstudio. Did you intentionally omit due to capability issues with the website (wild guess)? If you like I can add in some names into the Rmd file and send a pull request. Just let me know.

Thanks!
Kate

minor fixes needed for ~/R/Raster-Data-In-R/

[ ] - correct San Joachim to San Joaquin
[ ] - example output for 'DEM <- raster("DigitalTerrainModel/SJER2013_DTM.tif")' looks like it is actually for the DTMHill version of the file

Take bubble map code and turn into lesson!

Knitr Script: Fix extension in filename quirk

The knitr scripts throughout the repo(s) need to have the rmd.files <- object code changed (~line 96).
Change to:

rmd.files <- list.files(file.path(gitRepoPath, postsDir), pattern="\\.Rmd$", full.names = TRUE )

it is currently pattern=*.Rmd.

The problem is that any time a file with Rmd in the file name (not extension; e.g. 2016-06-10-Rmd03-RMarkdown.md) is in a folder that is being knit it will also be re-knit.

set min max

If a raster doesn't show the min max values when you type in the object name, you can force it to compute

DEM <- setMinMax(DEM)

add this to the raster lessons.

Set working Dir Lesson is Live - Please review

Ok - this is (i think) the most current version. Please review. if it looks OK, close the issue!
We are done with the lesson for now :) @mjones01

http://neondataskills.org/R/Set-Working-Directory

Full waveform vs discrete return. Get ppt info up

Global Environment Disapeared

I'm not sure I'm leaving this in the correct spot but my global environment disappeared & the subsequent data sets show as "object no longer exists"

I tried reloading my ".RData" file into my workspace but that hasn't yielded anything. See below:

[Workspace loaded from ~/School/Research/RMNP Phenology Temp/RMNP Gymno Phenology Temp/.RData]

load("~/School/Research/RMNP Phenology Temp/RMNP Gymno Phenology Temp/.RData")

I was in the process of working through the below code from the Pheno-Temp Tutorial and my global environment vanished.

quality flags

finalQf

Error: object 'finalQf' not found

sum(YELL.Pheno$finalQF)
[1] 0
sum(RMNP.Temp1$finalQf)
[1] 0

What about null (NA) data?

# Are there NA's in your data? Count 'em up
> sum(is.na(RMNP.Temp1$tempSignleMean))
[1] 0
> # Are there NA's in your data? Count 'em up
> sum(is.na(YELL.Temp$tempSingleMean))
[1] 0

na.rm=TRUE 
**tells R to ignore n/a values when making calculations

# create new dataframe without NAs
YELL.Temp_noNA <- YELL.Temp 
drop_na(tempSingleMean)  # tidyr function
RMNP.Temp_noNA <- RMNP.Temp1
drop_NA(tempSingleMean)

> sum(is.na(YELL.Temp_noNA$tempSingleMean))
[1] 0
> sum(is.na(RMNP.Temp1_noNA$tempSingleMean))
Error: object 'RMNP.Temp1_noNA' not found
> sum(is.na(RMNP.Temp_noNA$tempSingleMean))
[1] 0

downloading files prior to the workshop - NEON-HDF5-HyperspectralImagery-In-R/

Very minor issue, but it's a little weird how some of the files to download are in the pre-workshop materials/prep area, and others are inside the individual half hour lessons. Maybe put all up front or all inside the lessons? I think people could get confused thinking they already downloaded the stuff they need. Or didn't.

Fix text on _posts/HDF5/2014-11-05-Activity-HDFView-HDF5.md

i started to describe reflected light in this lesson and stopped mid sentence. per @natalie-robinson comment in a PR

No "subset_clean_refl" function in NEON AOP HDF5 Python Functions

Hi @bridgethass,

I have tried a tutorial on NDVI calculation , and apparently it would not run correctly due to the absence of the function named "subset_clean_refl" in the neon_aop_refl_hdf5_functions.py file.

I believe something like that should be added:

def subset_clean_refl(reflArray,reflArray_metadata,clipIndex):

reflCleaned = reflArray[clipIndex['yMin']:clipIndex['yMax'],clipIndex['xMin']:clipIndex['xMax']].astype(np.float)
reflCleaned[reflCleaned==int(reflArray_metadata['noDataVal'])]=np.nan
reflCleaned = reflCleaned/reflArray_metadata['scaleFactor']

return reflCleaned

Could you please fix the above mentioned py file with the functions definitions?

h5 read throwing error

I the HDF5 code, the following error occurs...should followup with ted or look into

http://stackoverflow.com/questions/23098355/h5read-crashes-with-large-strings

g <- paste(fiu_struct[2,1:2],collapse="/")
h5metadata(f,g,fiu_struct$num_attrs[2])
Warning: h5read for variable length strings not yet implemented. Replacing strings by NA's
Error in H5Aread(a) : length-0 dimension vector is invalid
h5metadata(f,g,fiu_struct$num_attrs[2])
Warning: h5read for variable length strings not yet implemented. Replacing strings by NA's
Error in H5Aread(a) : length-0 dimension vector is invalid

Build Error - Set Working Directory Page

Hi @mjones01
I'm working on this site and there is a build error in a page that you pushed. I think some of the includes may be incorrect. Please fix, build locally to test that things are working, then submit a PR for me to review.
Specific Error:

The page build failed with the following error:

A file was included in `_posts/R-stats/2016-01-08-Set-Working-Directory-In-R.md` that is a symlink or does not exist in your `_includes` directory. For more information, see https://help.github.com/articles/page-build-failed-file-is-a-symlink.

Build errors mean that new content added to the site will not render.
Don't worry about this - this week - focus on the teacher training.
Please complete this task by next Tuesday, 20 Jan or get in touch if you need more time.
Thank you!
Leah