GithubHelp home page GithubHelp logo

w207_final_project_bwp's Introduction

W207 Final Project

Forest Cover Type Prediction

Berkeley MIDS Program

Team: Blair Witch Project

Sharon Wu, Kevin Ngo, Vish Pillai, Blair Jones

Date: 11 April 2021

Contents

Folder Description
0_experiments Individual team folders for EDA and model development.
2021_03_09 Initial draft of report.
2021_04_11_Final Final version of report and team notebook.

Final Report Presentation

Final Presentation
Presentation

Introduction

The challenge is to predict seven different forest cover types in four different wilderness areas of the Roosevelt National Forest of Northern Colorado with the best accuracy. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

Each 30 x 30 square meter section of land is described by 54 attributes (ex. elevation, area, soil type, distance to water, aspect, etc.).

The 4 wilderness areas:

  • Rawah Wilderness Area
  • Neota Wilderness Area
  • Comanche Peak Wilderness Area
  • Cache la Poudre Wilderness Area

Roosevelt National Forest

The 7 forest cover types:

  1. Spruce/Fir
  2. Lodgepole Pine
  3. Ponderosa Pine
  4. Cottonwood/Willow
  5. Aspen
  6. Douglas-fir
  7. Krummholz

Data

  • Elevation - Elevation in meters
  • Aspect - Aspect in degrees azimuth
  • Slope - Slope in degrees
  • Horizontal_Distance_To_Hydrology - Horz Dist to nearest surface water features
  • Vertical_Distance_To_Hydrology - Vert Dist to nearest surface water features
  • Horizontal_Distance_To_Roadways - Horz Dist to nearest roadway
  • Hillshade_9am (0 to 255 index) - Hillshade index at 9am, summer solstice
  • Hillshade_Noon (0 to 255 index) - Hillshade index at noon, summer solstice
  • Hillshade_3pm (0 to 255 index) - Hillshade index at 3pm, summer solstice
  • Horizontal_Distance_To_Fire_Points - Horz Dist to nearest wildfire ignition points
  • Wilderness_Area (4 binary columns, 0 = absence or 1 = presence) - Wilderness area designation
  • Soil_Type (40 binary columns, 0 = absence or 1 = presence) - Soil Type designation
  • Cover_Type (7 types, integers 1 to 7) - Forest Cover Type designation

Summary

After thorough EDA, we decided to apply several transformations to the data to highlight key differences between the cover types across the explanatory variables.

With the transformed data, we analyzed the characteristics of several different machine learning models for this type of classification problem.

After some iterations to tune the features and hyperparameter, we conducted a final comparison, shown below. The ExtraTrees Classifier achieved the best performance in terms of accuracy and execution time.

We also experimented with a hybrid model, using the Voting Classifier, which takes the output from 2 or more models and determines the best prediction for each sample from across the input models. The Voting Classifier with 8 ExtraTrees models increased the accuracy by 0.2%, but took 9 times longer to train and predict.

The full analysis is described in the final report notebook and presentation.

Model Accuracy Time Comments
KNN 0.811 - k=1
Decision Tree 0.771 - trees=19
Random Forest 0.868 12 sec -
ExtraTrees 0.911 5 sec random_state=1189
ExtraTrees Voting 0.913 45 sec trees=100, random_state=[1189,883,2153,5568,2077,769,2223,2675]
XGBoost 0.865 193 sec -
LightGBM 0.897 72 sec boosting_type='gbdt', trees=200
CatBoost 0.876 96 sec -
Neural Network 0.865 192 sec layers=6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.