GithubHelp home page GithubHelp logo

neonscience / neon-data-skills Goto Github PK

View Code? Open in Web Editor NEW
76.0 15.0 90.0 859.78 MB

Self-paced tutorials that review key data literacy concepts and data analysis skills. Published materials can be found at:

Home Page: https://www.neonscience.org/resources/learning-hub/tutorials

License: GNU Affero General Public License v3.0

HTML 81.35% R 1.61% Jupyter Notebook 16.18% Python 0.85% PHP 0.01%
tutorial r neon data-skills ecology python

neon-data-skills's Introduction

Welcome to the NEON Data Skills GitHub Repo!

NEON Data Skills provides tutorials and resources for working with scientific data, including that collected by the National Ecological Observatory Network (NEON). NEON is an ecological observatory that will collect and provide open data for 30 years.

For more information on NEON, visit the website: www.neonscience.org

This repo contains the materials used to build the NEON Data Skills resources that are available for use on the NEON Data Skills section of the NEON website.

Version 2.0 Note that as of 11/20/2020 NEON has updated this repo to version 2.0 with several changes. This transition coencides with the upgrade to the www.neonscience.org website. Most notably, the default repository has been changed from 'master' to 'main', and the 'master' repository is now deprecated, renamed to 'old-master', and will not be maintained. The new 'main' repository has been re-organized so that each tutorial markdown file (*.md), and all of its associated files (*.Rmd/*.ipynb, *.r/*.py, *.html, and code-generated figures) are now contained within a single directory within the /tutorials/ directory, rather than distributed across multiple high-level directories (/code/, /graphics/, etc.).

Usage

Contributing

If you would like to make a change to one of the resources, please fork the repo and make a pull request to the main branch with the suggested change. All pull requests are reviewed for scientific accuracy and educational pedagogy. Individuals who make significant contributions can request to be listed as contributors on individual tutorials.

Please start by copying the template (/tutorials-in-development/0_templates_style_guide/tutorial-template-.) if you are creating a new tutorial.

Questions?

Having a problem getting something to work or want to know why the repo is setup in a certain way? Email us neondataskills -AT- BattelleEcology.org or file a GitHub Issue.

Credits & Acknowledgements

The National Ecological Observatory Network is a project solely funded by the National Science Foundation and managed under cooperative agreement by Battelle. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

License

GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Disclaimer

Information and documents contained within this repository are available as-is. Codes or documents, or their use, may not be supported or maintained under any program or service and may not be compatible with data currently available from the NEON Data Portal.

neon-data-skills's People

Contributors

adernbach avatar benonearth avatar bhass-neon avatar bnasr avatar bridgethass avatar burnermax37 avatar chrlaney avatar cklunch avatar coolinwilliams avatar ddurden avatar donal-at-neon avatar douevencoder avatar fe-sa avatar garrettmw avatar hughcross avatar jbrown7659 avatar kexu2014 avatar khufkens avatar kmccahill avatar lwasser avatar maxheld83 avatar mayastahl avatar mmistakes avatar morganejones avatar mpmugnani avatar naupaka avatar neondataskills avatar sarapaull avatar scelmendorf avatar sokole avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neon-data-skills's Issues

A few thoughts on the hyperspectral lesson

Hi @lwasser

I had just a few thoughts on this lesson, take them or leave them:

  • Would it be worth show another way to set a raster extent (e.g., show setting the extent with and without having to scale the data)?
  • Maybe we could try to bring it all together by saying that you are going to have to do this every time you try to look at a dataset, so you should always start by getting the required info when you load the data (e.g., xMin, yMin, etc.)
  • Kinda along the same lines - you could start the lesson by prefacing that these are the things we are going to need to get from the data in order to project a raster correctly in R: min/max X/Y, extent, etc. I think a bit more focus on the fact that these are needed by R, as it is not inherently a spatial application, might help people initially understand why they need to do all of this stuff.

Cleanup Activities

  1. Remove / delete the following files

A org/category/coding-and-informatics.md
A org/category/remote-sensing.md
A org/tag/Data-Workshops.md
A org/tag/Workshop-Trials.md
A index copy.md

Remove all files in this dir (put into the less development repo for the time being.

_posts/IGNORE-FOR-NOW-cheatSheets

Comments on newest versions or lessons for ESA

Raster Data in R:

  • How difficult would it be to take the NLCD class graphic at the beginning and add a blowup of one small region that shows the actual grid at finer scale (e.g., exampleGraphic.ppt emailed)?
  • Typo right after UTM graphic: You can also view the rasters min and max values and the range of values containes within the pixels.

Create Canopy Height Model:

"Create a function that subtracts one raster from another
canopyCalc <- function(x, y) {
return(x - y)
}

use the function to create the final CHM
then plot it.
You could use the overlay function here
chm <- overlay(dsm,dtm,fun = canopyCalc)
but you can also perform matrix math to get the same output.
chm <- canopyCalc(dsm,dtm)"

It is actually a bit confusing as it follows the simplest way to do the calculation (dsm-dtm). Why not simplify and delete this?

  • In the intro paragraph to part 2 you have a typo in "To figure tihs out"
  • There needs to be a bit more background on: 1) why you're using both vst and tree height data, and what vst data actually are; 2) why you calculate the max height for each plot for the comparison, as opposed to just getting the height measurement at the pixel corresponding to the CHM measurement and comparing that to the CHM measurement.
  • Your description of calculating data from plots mentions circular plots but the graphic shows square plots (until variation 3 at least, but it might be more intuitive to talk about circular plots until you introduce the code to make square buffers)...
  • In variation 2, why is the code commented out?
  • The commented out code for calculating max stem height in option 2 does not work, because the object 'insitu_inCentroid$stemheight is not recognized (it's used throughout the code block).

Intro do HDF5:

  • Maybe a bit more brief intro on what HDF5 is and why we would want to use it (e.g., a regular directory on your hard drive can also store heterogeneous datasets, what is the impotous for an HDF5 file?)
  • Just a preference, but to me it might make more sense to assign hdf attributes with the name first, then the attribute. Like: h5writeAttribute(did2,name="Location",attr="SJER"). Not a big deal, it just seems a little more intuitive to name the thing first.

check out links on hyperspectral h5 page

HDF5 hyperspectral lesson -- look at the last commit. we may need to adjust some of these links but should wait until we have the new file structure in place!

Join function in pheno-time series tutorial

Hello,

I'm coming up with this warning message when I try to join data frames. I don't know what this means or if the command was successful..

Create a new dataframe "phe_ind" with all the data from status and some from ind_lastnoD

phe_ind <- left_join(status_noD, ind_lastnoD)
Joining, by = c("namedLocation", "domainID", "siteID", "plotID", "individualID")
Warning messages:
1: Column namedLocation joining character vector and factor, coercing into character vector
2: Column domainID joining character vector and factor, coercing into character vector
3: Column siteID joining character vector and factor, coercing into character vector
4: Column plotID joining character vector and factor, coercing into character vector
5: Column individualID joining character vector and factor, coercing into character vector

hyperspectral lesson - some math weirdness in the example

Full Width Half Max (FWHM)

The full width half max (FWHM) will also often be reported in a multi or hyperspectral dataset. This value represents the spread of the band around that center point.

The Full Width Half Max (FWHM) of a band relates to the distance in nanometers between the band center and the edge of the band. In this case, the FWHM for Band C is 2.5 nm.

Add Actual precip data to RHDF5 lesson

Rather than making up some data - it would be good to import a csv and add it to the H5 file.

Also add a "create HDF5 lesson" part one that is very simple.

  1. import a csv.
  2. create the H5 file
  3. add the csv to the H5 file
  4. add some metadata to the H5 file

Done.

Have the looping be optional part 2. it's way too complicated as written

Adjust category and tag structure

informatics and programming

Remote sensing science

Gis Geographic information systems

Tags
Lidar
Imaging spectroscopy
Hdf 5
Ecological analysis
R programming
Python

Errors in file tutorials/R/biodiversity/aquatic-macroinvertebrates/02_ecocomDP_workflow_with_NEON_algae/02_ecocomDP_workflow_with_NEON_algae.R

As part of automated testing, the following errors were found (some of these are because of function overrides due to the order in which packages were loaded in my environment, but this will occur for other users as well):

Lines: 45; 48;51; 54-56; 87

  • return NULL

Lines: 87-90; 92-94; 103

  • return "... $ operator is invalid for atomic vectors".
  • To fix this, remove "[[1]]"

Line 145

  • returns "Error: by must be supplied when x and y have no common variables."
  • This is because the object created on line 141 ('my_data_summed') has a single column and a single value. To fix this, update line 143 to read "dplyr::summarize(value = sum(value, na.rm = FALSE))"

Line 175

  • returns ""unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’
  • Fix by updating line 177 to "dplyr::select(event_id, taxon_id, value)"

Lidar change detection

get in touch with Tristan about Boulder data. It would be cool to put a change detection series together since he's done the work already

curly bracket placement issue for automated script testing

@cklunch, can you move the opening curly bracket for a function onto the same line as the function call, as described below? My automated testing script needs to identify functions and this is very difficult if the curly bracket is not on the same line as the call.

Thank you!

File: ~tutorials/R/Geospatial-skills/primer-raster-r/Creating-Square-Plot-Boundaries-From-Centroids-in-R/Creating-Square-Plot-Boundaries-From-Centroids-in-R.R (and associated rmd, html, etc. files)

Line: 55

OLD code:
polys <- SpatialPolygons(mapply(function(poly, id)
{
xy <- matrix(poly, ncol=2, byrow=TRUE)
Polygons(list(Polygon(xy)), ID=id)
},
split(square, row(square)), ID),
proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

NEW code request:
polys <- SpatialPolygons(mapply(function(poly, id){
xy <- matrix(poly, ncol=2, byrow=TRUE)
Polygons(list(Polygon(xy)), ID=id)
},
split(square, row(square)), ID),
proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

I am also happy to make the change and submit a PR, I just need to quickly walk through the procedure for creating all the files properly.

Update Tutorials to Match New NEON Data

Many of these tutorials are built on data using older NEON file structures.
Options:

  • update the associated data and the tutorial to match current file structures as they should now be (more) stable.
  • make notes at the top of each tutorial for which structure no longer matches the data on data.neonscience.org
  • do nothing.

Where do people start

Community Feedback:
"My one problem is there's no real guidance on recommended order (if there is one). I'm not really sure where to get started or if there is even any sort of linear path through all of the lessons. It might benefit from a "start here", especially for those that are new to all of the topics."


We need to address the use case of those coming to the site who don't necessarily have a path in mind, but want to start working with data. Perhaps we create a few lesson set "workshop" like pages as a test?

HDF-5 temp data - is the datetime conversion wrong?

Looking at the temp data figure, it seems like the min temp is reached each day ~noon?
I think timezone issues here.

  1. tz EST is not in fact eastern standard time. See:
    http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
    What we want for a FL site is in fact called 'America/New_York'
  2. Data is stored as UTC (GMT)
  3. Then we need to convert it local (eastern) time to make the diurnal fluctations make sense. Suggested code improvement
#read in as UTC
temp$date <- as.POSIXct(temp$date ,format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
#convert to local time for pretty plotting
attributes(temp$date)$tzone <- "America/New_York" 

@lwasser

R markdown Chunk names

Hey Leah,
I noticed in Intro-HDF5-R.Rmd that the r chunks don't have descriptive names. These help one navigate the chunks easily in the drop-down menus in Rstudio. Did you intentionally omit due to capability issues with the website (wild guess)? If you like I can add in some names into the Rmd file and send a pull request. Just let me know.

Thanks!
Kate

minor fixes needed for ~/R/Raster-Data-In-R/

[ ] - correct San Joachim to San Joaquin
[ ] - example output for 'DEM <- raster("DigitalTerrainModel/SJER2013_DTM.tif")' looks like it is actually for the DTMHill version of the file

Knitr Script: Fix extension in filename quirk

The knitr scripts throughout the repo(s) need to have the rmd.files <- object code changed (~line 96).
Change to:

rmd.files <- list.files(file.path(gitRepoPath, postsDir), pattern="\\.Rmd$", full.names = TRUE )

it is currently pattern=*.Rmd.

The problem is that any time a file with Rmd in the file name (not extension; e.g. 2016-06-10-Rmd03-RMarkdown.md) is in a folder that is being knit it will also be re-knit.

set min max

If a raster doesn't show the min max values when you type in the object name, you can force it to compute

DEM <- setMinMax(DEM)

add this to the raster lessons.

Global Environment Disapeared

I'm not sure I'm leaving this in the correct spot but my global environment disappeared & the subsequent data sets show as "object no longer exists"

I tried reloading my ".RData" file into my workspace but that hasn't yielded anything. See below:

[Workspace loaded from ~/School/Research/RMNP Phenology Temp/RMNP Gymno Phenology Temp/.RData]

load("~/School/Research/RMNP Phenology Temp/RMNP Gymno Phenology Temp/.RData")

I was in the process of working through the below code from the Pheno-Temp Tutorial and my global environment vanished.

quality flags

finalQf

Error: object 'finalQf' not found

sum(YELL.Pheno$finalQF)
[1] 0
sum(RMNP.Temp1$finalQf)
[1] 0

What about null (NA) data?

# Are there NA's in your data? Count 'em up
> sum(is.na(RMNP.Temp1$tempSignleMean))
[1] 0
> # Are there NA's in your data? Count 'em up
> sum(is.na(YELL.Temp$tempSingleMean))
[1] 0

na.rm=TRUE 
**tells R to ignore n/a values when making calculations

# create new dataframe without NAs
YELL.Temp_noNA <- YELL.Temp 
drop_na(tempSingleMean)  # tidyr function
RMNP.Temp_noNA <- RMNP.Temp1
drop_NA(tempSingleMean)

> sum(is.na(YELL.Temp_noNA$tempSingleMean))
[1] 0
> sum(is.na(RMNP.Temp1_noNA$tempSingleMean))
Error: object 'RMNP.Temp1_noNA' not found
> sum(is.na(RMNP.Temp_noNA$tempSingleMean))
[1] 0

downloading files prior to the workshop - NEON-HDF5-HyperspectralImagery-In-R/

Very minor issue, but it's a little weird how some of the files to download are in the pre-workshop materials/prep area, and others are inside the individual half hour lessons. Maybe put all up front or all inside the lessons? I think people could get confused thinking they already downloaded the stuff they need. Or didn't.

No "subset_clean_refl" function in NEON AOP HDF5 Python Functions

Hi @bridgethass,

I have tried a tutorial on NDVI calculation , and apparently it would not run correctly due to the absence of the function named "subset_clean_refl" in the neon_aop_refl_hdf5_functions.py file.

I believe something like that should be added:

def subset_clean_refl(reflArray,reflArray_metadata,clipIndex):

reflCleaned = reflArray[clipIndex['yMin']:clipIndex['yMax'],clipIndex['xMin']:clipIndex['xMax']].astype(np.float)
reflCleaned[reflCleaned==int(reflArray_metadata['noDataVal'])]=np.nan
reflCleaned = reflCleaned/reflArray_metadata['scaleFactor']

return reflCleaned

Could you please fix the above mentioned py file with the functions definitions?

h5 read throwing error

I the HDF5 code, the following error occurs...should followup with ted or look into

http://stackoverflow.com/questions/23098355/h5read-crashes-with-large-strings

g <- paste(fiu_struct[2,1:2],collapse="/")
h5metadata(f,g,fiu_struct$num_attrs[2])
Warning: h5read for variable length strings not yet implemented. Replacing strings by NA's
Error in H5Aread(a) : length-0 dimension vector is invalid
h5metadata(f,g,fiu_struct$num_attrs[2])
Warning: h5read for variable length strings not yet implemented. Replacing strings by NA's
Error in H5Aread(a) : length-0 dimension vector is invalid

Build Error - Set Working Directory Page

Hi @mjones01
I'm working on this site and there is a build error in a page that you pushed. I think some of the includes may be incorrect. Please fix, build locally to test that things are working, then submit a PR for me to review.
Specific Error:

The page build failed with the following error:

A file was included in `_posts/R-stats/2016-01-08-Set-Working-Directory-In-R.md` that is a symlink or does not exist in your `_includes` directory. For more information, see https://help.github.com/articles/page-build-failed-file-is-a-symlink.

Build errors mean that new content added to the site will not render.
Don't worry about this - this week - focus on the teacher training.
Please complete this task by next Tuesday, 20 Jan or get in touch if you need more time.
Thank you!
Leah

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.