GithubHelp home page GithubHelp logo

emmalink1 / soilcmodel Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 970 KB

Class project outlining workflow for modeling soil carbon storage as a function of environmental variables and creating "Minimum viable project"

Shell 0.09% Jupyter Notebook 99.00% Python 0.91%

soilcmodel's Introduction

# Modelling Soil Carbon Content

# Description of project
This is the same project that I completed for my MVP1 but now executable via a bash submission script on MSI resources. The only changes I made were taking out visualizations printed in-line in Jupyter and exporting some map and model objects in the script so that they can be inspected as outputs outside of the slurm job summary output. 

My friend Dustin Michels helped me through some github issues. 

# Major code chunks
I tried to follow markdown formatting as best as I could in this script. The major code chunks are: 

## Importing Packages

## Import soils data (NCSS gdb) and subsetting to area of interest (MN)
Originally, I wanted to edit my code for this section and consider the whole US as an area of interest. However, things really broke down when I tried to do this entirely in nano editing, so after messing around for a few hours, I went back to the original. 

## Wrangling NCSS gdb into one dataframe
- this is followed by reading in all of the data layers that I need from the dataframe (NCSS, carbon_extractions, bulk_density) 

Some of the soils don't have bulk density data or soil C data so we're going to have to throw them out because you can't calculate total soil C without bulk density information.So now we are down to only 184 samples. This seems a bit low to me; I have seen other publications that had way larger datasets for states (including WI, next door). But I can't quite figure it out after looking for ~30 minutes. For proof of concept I'm going to move on with this much smaller dataset.

## Calculating SOC for each pedon 
- define three functions to do so. One function, create_pedons, passes each pedon to another function, new_pedon, and recieves calculated output to concatenate into a new dataframe. New_pedon runs pedon_sum, recieves its output, and adds necessary metadata from the original pedon information into it. Pedon_sum is a for loop over all rows in a pedon, adding them together and returning one summed line. 
- I am still not calculating SOC in the right way, mathematically. I'm running with an approximation because the math is hard. 
It's fairly standard practice to just consider the top 0.3 m (though there is a lot of SOC below that...). But at first, I'll consider the whole depth profile. In a more robust project I would make sure that soil C is calculated along the same depth and probably just subset to 0.3 m.  

Usually, we would use some pretty complicated math (spline function) to estimate the amount of carbon along the depth profile, smoothly. Unfortunately, a lot of the packages that were built around the spline function for soil C calculations are now not available and I am not very good at math. For proof of concept, I am not going to deal with that. Instead, I will calculate the amount of C in the soil profile just by adding up the amount of carbon in each depth increment as it is defined, without smoothing. 

## Map and inspect calculated SOC
- In contrast to my MVP1, where I printed maps in the Jupyter output, here I output a .png graph so that I can look at it. This is something I added and learned how to do. 

## Extract landvoer information from HistoricLandcover modeled layers and join to soils data
- 3 functions defined in this section 

## Build Predictive model 
- Build a random forest regressor using sklearn. Train and test it. 
- includes testing and assessment of mean absolute error

## Export predictive model 
- this is an additional one part that I added to my MVP1. I export the random forest regressor and the dataset so that I don't have to go back and rebuild it if I need to access the values

# Why this represents a MVP
Originally, I wanted to expand upon the actual code of my MVP1 and move the workflow into a batch submission form for MVP2. At first this seemed 
entirely possible as moving the code in initially went well. However, I soon found that it was very difficult for me to expand and improve the workflow 
in a batch submission form - I am much more attacheched to jupyter notebooks than I thought. I initially broke some things (badly) and spent a lot of 
time trying to get back to working code. From this process, I learned that I will probably always try to edit long workflows on a subset of data in a 
Jupyter notebook environment, and move to batch submission only when I feel pretty confident that the workflow is working well. One change that I made to the workflow was exporting objects like maps as a png and the random forest regression model and final data. This was the only way to view results outside of the slurm summary file, and I figured this would be good for sharing the results of the project. I also was able to learn how to push to github from the MSI environment, which was a bit more tricky than from my local environment. I estimate that I spent around 10 
hours on this project. 

 

soilcmodel's People

Contributors

emmalink1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.