This is the repository for D-Lab's Geospatial Fundamentals in R with sf workshop.

License: Other

Jupyter Notebook 96.14% HTML 3.86%

r geospatial spatial mapping geographical-information-system

r-geospatial-fundamentals-legacy's Issues

Aggregation

It seems there are additional data added if the join = st_within is not specified.

Original code: tracts_acs_sf_ac2 <- st_join(tracts_acs_sf_ac, tracts_meanAPI['mean_API'])

suggested code: tracts_acs_sf_ac2 <- st_join(tracts_acs_sf_ac, tracts_meanAPI['mean_API'], left = T, join = st_within)

07 Solution Visible

Solution to student exercise is visible in 07_Joins_and_Aggregation at lines 644-663

opening slide recommended changes

Slides 17-18: Location example looks like an all-white ranch/4H program (https://asotincountyfairandrodeo.org/4-h/)
Possibly use this location instead? https://commons.wikimedia.org/wiki/File:Coyote_Point_Trail_at_Whitewater_State_Park,_Minnesota_(44136078811).jpg (photo has a CC 2.0 license)

Slide 34: Given the racialization of what counts as “crime” in the United States (and even more overt racism in police presence), I’d replace crime locations with something else, e.g., motor vehicle accident locations or point environmental pollutant emissions locations from the Toxics Release Inventory (e.g., https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-2019)

Raster Broken Links

Broken link :

Challenge 2: Read in and check out new data

You have another raster dataset in your ./data directory. The file is called nlcd2011_sf.tif.

This is data from the National Land Cover Database (NLCD).

Clean up some of the slides with too much content

Split slides with too much content into two: 1 with the slide text & code & one with the code output (repeating the code but not the text if it is short). At least these part 1 slides...
103, 111, 137, 142,156
165, 170, 173, 175, 177, 186

Build Rproj file to connect code and files?

Feeling the new Geospatial-fundanmentals-in-sf so much! One suggestion would be that we bundle the .RMDs and various data sources into an Rproj document, similar to what our Advanced data wrangling in R curriculum has going on. If we go that way, we could leverage the here() package to automate user to access shapefiles and datasets, avoiding directory configs at a local level.

Add license file

I think creative commons is what we want:
Creative Commons Attribution-NonCommercial 4.0 International Public License

Binder link fails to load `sf` package

Binder link will eventually resolve. The sf package will not load when called.
This is with the runtime.txt of r-4.0-2020-10-10

This warning message while installing the package could be a clue for us why:

Warning message:
R graphics engine version 14 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.

09_Raster_Data.Rmd, raster::getData()

I wonder if it's relevant to include a piece about the raster::getData function when we need to import ELEV data into the San Francisco bicycling pain analysis map. getdata() is an extremely useful way to import geographical data directly into the R computing environment. The imported data can be a little cryptic, but here is one blog that explains exactly what is being imported with the function.

Define reclassification matrix

The whole matrix below is quite challenging to follow. It's unclear the traslation that is happen. The notes attempt to help but I think they could be more explicit.

Note: by default, the ranges dont include left val, but include right

reclass_vec <- c(0, 20, NA, # water will be set to NA (i.e. 'left out' of our analysis)
20, 21, 1, # we'll treat developed open space as greenspace,
# based on the NLCD description
21, 30, 0, # developed and hardscape will be set to 0s
30, 31, NA,
31, Inf, 1) # greenspace will have 1s

Workshop Title

Workshop title should be "R-Geospatial-Data:-Parts-1-2"

change name?

I was supposed to make an issue regarding the name of the repo and the name listed on the excel sheet.
Name listed: R Geospatial Data: Parts 1-3

Source vs. Visual

Hi,

If one choses to use the Visual version, there are some aspects that don't translate well. e.g.

in section 2.1 "sf = simple features"
2.4 the text after the image in sf Geometry Types subsection
at the very end
at the beginning of lesson 3: "
Instructor Notes" and "Exercises: 10 minutes "
Section 3.3: "

If you look at the coordinate values "
Section 3.4 "

As a refresher, a CRS "
at the very end

So most if not all are just visual

Comments after being instructor

In the Introduction to sf, in line 261 in "Making your own sf objects" it seemed a bit counter intuitive to create points that do not have a CRS.
Is there a reason to be using base R instead of dplyr? I think it might be more intuitive for some to use dplyr instead of base R, specially when subsetting data from dataframes.
In line 408 of More Data More Maps, it did not make a lot of sense to me to write out a .csv with sf_write, instead of write.csv, particularly when we use write.csv later in the workshop.

Update "Insert Code Chunk" text

Instructions for inserting a code chunk (Part 1, lines 56-62) appear out of date.

Option 1 should be Code > Insert Chunk.

NAD27

We say "NAD27 is old and inaccurate! Don't use it." and then use the DEM in Section 1 that uses NAD27... and then we later transform the other data into NAD27 which seems a bit against the statement. It could be helpful to make a statement about this or use this to show why NAD27 is outdated? Or just project into a different CRS overall.

LN 84: st_read() typo

The filename in line 84 should be "sftracts_wpop.shp" instead of ""sftracts_wpop". Otherwise it will not read in.

CRS Transformations

I think how projectExtent is used can be explained a bit more explicitly in the following code:

DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)))

since there are a few nested functions. It's not immediately clear what is occurring.

And then when the incompatibility between cases in packages is explained, the use of $proj4string doesn't come across too clearly. A good amount of effort goes into explaining the incompatibility-- maybe a bit more could go into explaining how proj4string introduces compatibility across the packages. I think "st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)" gets at this point enough but for a brief moment only looking at the output of st_crs(SFtracts_NAD83)$proj4string alone may not clearly show why this workaround works.

Or, if the main point is more so trying to find workarounds then all is well. It's a bit nit picky here

st_length

There may be an error with the 'meters' column of bike boulevard.

{r}
bart_lines$len_mi <- units::set_units(st_length(bart_lines), mi)
bart_lines$len_km <- units::set_units(st_length(bart_lines), km)
bart_lines$len_m <- units::set_units(st_length(bart_lines), m)

bart_lines$len_m <- units::set_units(st_length(bart_lines), m)

head(bart_lines)

When you calculate the lengths by transforming to various CRS'

bart_lines$len_NAD83 <- units::set_units(st_length(st_transform(bart_lines,26910)), m)
bart_lines$len_WebMarc <- units::set_units(st_length(st_transform(bart_lines,3857)), m)
bart_lines$len_WGS84 <- units::set_units(st_length(st_transform(bart_lines,4326)), m)

you see that Web Marc outputs the closest length to the 'meters' column. The transformation to WGS84 is redundant and was done just to verify how the st_length function is working. So in theory both these outputs should be the same as the 'meters' column.

Caused some confusion during the workshop.

Same CRS Different EPSG

CRS Transformations

st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)

is true because the CRS' are both NAD83, but the former has EPSG",9122 and the latter "EPSG",4269.

So the workaround:
DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)$proj4string))

works, but it's a bit confusing because the alternative way of reprojecting is inputing the EPSG. So it could be useful to clarify this difference in CRS and EPSG

Set up github io link

Set up the docs dir to be the github io directory.

Add knitr option for path

Add
knitr::opts_knit$set(root.dir = normalizePath('../'))
to the knitr setup options for the Rmd files for pt2 & 3 (done for pt1) so that the Rmd files can knit in the docs directory but reference the data files from the root directory in any R chunks.

Note: The images directory was moved under docs because it is only used in the Rmarkdown files and the relative path setting only works for chunks (not markdown).

Datahub

Is there a possibility to have a datahub version of this workshop? We ran into some issues where datahub could have been useful.

Add a readme file

Add a readme file with:

links to the html slides/notebook in the docs directory so folks can click and view in the browser.
Details on prerequisite knowledge, packages, getting started etc - the stuff in the setup slide.

09.Raster_data.Rmd : fatal error when running certain chunk on DataHub

Around line 380, projectRaster() causes fatal error only when ran on DataHub. No problem on the local side. There are subtle differences in the proj4string output that precedes the call for some reason. Not sure how to fix at this time.

st_crs(SFtracts)$proj4string

when ran locally, the above call returns: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs " and "+proj=longlat +datum=NAD83 +no_defs" via DataHub. (where did the +towgs84=0,0,0,0,0,0,0 go)?!?!

DEM_WGS = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts)$proj4string))

the next call then causes fatal error on DataHub.

maxValue minValue

In the code chunk

{r}

Quick summary stats

summary(DEM)

summary(DEM[,])

freq(DEM)

maxValue(DEM)
minValue(DEM)

res(DEM)

i get an NA - although it worked in one of the videos. the documentation on the function says 'If a Raster* object is created from a file on disk, the min and max values are often not known (depending on the file format).' It's confusing because in the instructor's video (https://berkeley.zoom.us/rec/share/JQ4Xk3TSG5U-8L0pPezDaZoN_dIMFQyS4jJRKUK_JASFXo5G30erHErh8kxcvTjZ.XsO27WKI2ZKFQPEH?startTime=1649797368000) at (time point 1:21:33), the Max and Min values that are output are not the same values we see in the summary output.

Raster Legend

I think the Challenge 2: Read in and check out new data section needs some effort to edit. The nlcd@legend has no data. Even when brought into the memory with readAll. So the predefined legend values are not available, even before transforming and cropping. So a large part of this lesson is lost. The legend data is there, somewhere, because the barplot segments into the colors and if you just click on the tif file (mac) there is a plot that follows the predefined colors.

We noted that there is a reference to a data folder: 'You have another raster dataset in your ./data directory. The file is called nlcd2011_sf.tif.'
So wondering if there was change somehow that affected the data?

dlab-berkeley / r-geospatial-fundamentals-legacy Goto Github PK

r-geospatial-fundamentals-legacy's Issues

Note: by default, the ranges dont include left val, but include right

Quick summary stats

Recommend Projects

Recommend Topics

Recommend Org

Jobs