dlab-berkeley / r-geospatial-fundamentals-legacy Goto Github PK
View Code? Open in Web Editor NEWThis is the repository for D-Lab's Geospatial Fundamentals in R with sf workshop.
License: Other
This is the repository for D-Lab's Geospatial Fundamentals in R with sf workshop.
License: Other
Solution to student exercise is visible in 07_Joins_and_Aggregation
at lines 644-663
Slides 17-18: Location example looks like an all-white ranch/4H program (https://asotincountyfairandrodeo.org/4-h/)
Possibly use this location instead? https://commons.wikimedia.org/wiki/File:Coyote_Point_Trail_at_Whitewater_State_Park,_Minnesota_(44136078811).jpg (photo has a CC 2.0 license)
Slide 34: Given the racialization of what counts as “crime” in the United States (and even more overt racism in police presence), I’d replace crime locations with something else, e.g., motor vehicle accident locations or point environmental pollutant emissions locations from the Toxics Release Inventory (e.g., https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-2019)
Broken link :
Challenge 2: Read in and check out new data
You have another raster dataset in your ./data directory. The file is called nlcd2011_sf.tif.
This is data from the National Land Cover Database (NLCD).
Split slides with too much content into two: 1 with the slide text & code & one with the code output (repeating the code but not the text if it is short). At least these part 1 slides...
103, 111, 137, 142,156
165, 170, 173, 175, 177, 186
Feeling the new Geospatial-fundanmentals-in-sf
so much! One suggestion would be that we bundle the .RMDs and various data sources into an Rproj document, similar to what our Advanced data wrangling in R curriculum has going on. If we go that way, we could leverage the here()
package to automate user to access shapefiles and datasets, avoiding directory configs at a local level.
I think creative commons is what we want:
Creative Commons Attribution-NonCommercial 4.0 International Public License
Binder link will eventually resolve. The sf
package will not load when called.
This is with the runtime.txt of r-4.0-2020-10-10
This warning message while installing the package could be a clue for us why:
Warning message:
R graphics engine version 14 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.
I wonder if it's relevant to include a piece about the raster::getData
function when we need to import ELEV data into the San Francisco bicycling pain analysis map. getdata()
is an extremely useful way to import geographical data directly into the R computing environment. The imported data can be a little cryptic, but here is one blog that explains exactly what is being imported with the function.
The whole matrix below is quite challenging to follow. It's unclear the traslation that is happen. The notes attempt to help but I think they could be more explicit.
reclass_vec <- c(0, 20, NA, # water will be set to NA (i.e. 'left out' of our analysis)
20, 21, 1, # we'll treat developed open space as greenspace,
# based on the NLCD description
21, 30, 0, # developed and hardscape will be set to 0s
30, 31, NA,
31, Inf, 1) # greenspace will have 1s
Workshop title should be "R-Geospatial-Data:-Parts-1-2"
I was supposed to make an issue regarding the name of the repo and the name listed on the excel sheet.
Name listed: R Geospatial Data: Parts 1-3
Hi,
If one choses to use the Visual version, there are some aspects that don't translate well. e.g.
So most if not all are just visual
Instructions for inserting a code chunk (Part 1, lines 56-62) appear out of date.
Option 1 should be Code > Insert Chunk
.
We say "NAD27 is old and inaccurate! Don't use it." and then use the DEM in Section 1 that uses NAD27... and then we later transform the other data into NAD27 which seems a bit against the statement. It could be helpful to make a statement about this or use this to show why NAD27 is outdated? Or just project into a different CRS overall.
The filename in line 84 should be "sftracts_wpop.shp" instead of ""sftracts_wpop". Otherwise it will not read in.
I think how projectExtent is used can be explained a bit more explicitly in the following code:
DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)))
since there are a few nested functions. It's not immediately clear what is occurring.
And then when the incompatibility between cases in packages is explained, the use of $proj4string doesn't come across too clearly. A good amount of effort goes into explaining the incompatibility-- maybe a bit more could go into explaining how proj4string introduces compatibility across the packages. I think "st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)" gets at this point enough but for a brief moment only looking at the output of st_crs(SFtracts_NAD83)$proj4string alone may not clearly show why this workaround works.
Or, if the main point is more so trying to find workarounds then all is well. It's a bit nit picky here
There may be an error with the 'meters' column of bike boulevard.
{r}
bart_lines$len_mi <- units::set_units(st_length(bart_lines), mi)
bart_lines$len_km <- units::set_units(st_length(bart_lines), km)
bart_lines$len_m <- units::set_units(st_length(bart_lines), m)
bart_lines$len_m <- units::set_units(st_length(bart_lines), m)
head(bart_lines)
When you calculate the lengths by transforming to various CRS'
bart_lines$len_NAD83 <- units::set_units(st_length(st_transform(bart_lines,26910)), m)
bart_lines$len_WebMarc <- units::set_units(st_length(st_transform(bart_lines,3857)), m)
bart_lines$len_WGS84 <- units::set_units(st_length(st_transform(bart_lines,4326)), m)
you see that Web Marc outputs the closest length to the 'meters' column. The transformation to WGS84 is redundant and was done just to verify how the st_length function is working. So in theory both these outputs should be the same as the 'meters' column.
Caused some confusion during the workshop.
CRS Transformations
st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)
is true because the CRS' are both NAD83, but the former has EPSG",9122 and the latter "EPSG",4269.
So the workaround:
DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)$proj4string))
works, but it's a bit confusing because the alternative way of reprojecting is inputing the EPSG. So it could be useful to clarify this difference in CRS and EPSG
Set up the docs dir to be the github io directory.
Add
knitr::opts_knit$set(root.dir = normalizePath('../'))
to the knitr setup options for the Rmd files for pt2 & 3 (done for pt1) so that the Rmd files can knit in the docs directory but reference the data files from the root directory in any R chunks.
Note: The images directory was moved under docs because it is only used in the Rmarkdown files and the relative path setting only works for chunks (not markdown).
Is there a possibility to have a datahub version of this workshop? We ran into some issues where datahub could have been useful.
Add a readme file with:
Around line 380, projectRaster()
causes fatal error only when ran on DataHub. No problem on the local side. There are subtle differences in the proj4string
output that precedes the call for some reason. Not sure how to fix at this time.
st_crs(SFtracts)$proj4string
when ran locally, the above call returns: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs " and "+proj=longlat +datum=NAD83 +no_defs" via DataHub. (where did the +towgs84=0,0,0,0,0,0,0
go)?!?!
DEM_WGS = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts)$proj4string))
the next call then causes fatal error on DataHub.
In the code chunk
{r}
summary(DEM)
summary(DEM[,])
freq(DEM)
maxValue(DEM)
minValue(DEM)
res(DEM)
i get an NA - although it worked in one of the videos. the documentation on the function says 'If a Raster* object is created from a file on disk, the min and max values are often not known (depending on the file format).' It's confusing because in the instructor's video (https://berkeley.zoom.us/rec/share/JQ4Xk3TSG5U-8L0pPezDaZoN_dIMFQyS4jJRKUK_JASFXo5G30erHErh8kxcvTjZ.XsO27WKI2ZKFQPEH?startTime=1649797368000) at (time point 1:21:33), the Max and Min values that are output are not the same values we see in the summary output.
I think the Challenge 2: Read in and check out new data section needs some effort to edit. The nlcd@legend has no data. Even when brought into the memory with readAll. So the predefined legend values are not available, even before transforming and cropping. So a large part of this lesson is lost. The legend data is there, somewhere, because the barplot segments into the colors and if you just click on the tif file (mac) there is a plot that follows the predefined colors.
We noted that there is a reference to a data folder: 'You have another raster dataset in your ./data directory. The file is called nlcd2011_sf.tif.'
So wondering if there was change somehow that affected the data?
to make it clearly distinguishable from the older materials
Add a docs directory for the RMD and HTML files.
Looking over 2 previous iterations and going through this current one, I think the projected duration included at the beginning of each lesson may be an underestimation. the lessons seem to take longer looking at the last 3 runs.
I wondering if the subheadings could be more informative
There are name 'Explore the Structures'. Maybe 'Explore the Structures - Memory' and 'Explore the Structures - dropping', etc. could be more informative.
There are broken image links. e.g. notebook 7, the image in 7.1 Attribute join
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.