This project looks to use random forests to classify binary data.
Dataset Description - Social Network Ads Dataset: • Source: Social Network Ads (kaggle.com) • Description: This dataset contains user age, estimated salary, and whether they made a purchase after viewing social ads. '0' indicates no purchase, '1' indicates a purchase.
Additional Tasks for Graduate Students:
- Algorithm Implementation from Scratch: We will implement a simplified version of the random forest algorithm from scratch, documenting key assumptions and comparing it with popular library implementations.
- Performance Comparison: We will compare our implementation of random forests with at least one popular library implementation (e.g., scikit-learn) in terms of runtime, memory usage, and CPU/GPU usage across various data set sizes and dimensionalities.