Light

pokemaster720 / hw2q2 Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 2 KB

hw2q2's Introduction

HW2Q2

Libraries Used

collections.Counter
math
calculate_information_gain

Data Description - Classification

Training_data: inner list represents an instance (or record) and consists of categorical attributes (Education, Career, Years of Experience).
labels: in Training_data ("Low" or "High" salary).
features: Identifiers for the categories of features in Training_data.
validation_data and validation_labels: lists intended for pruning the decision tree using validation data.

Functions Explained

**calculate_entropy(labels) -> float:
- Purpose: Calculate the entropy of a list of classification labels.
  - Entropy measures the randomness in the set, which is crucial for deciding on splits in a decision tree.
- **Parameters: labels
  - list of classification labels.
- **Process:
  - Counter to count occurrences of class labels, calculates their probabilities, and computes entropy according to
  - the formula (Entropy = -\sum (p \times \log_2(p))).
- **Returns: The calculated entropy as a float.
**Conditional Statement involving low_salary_count + high_salary_count:
- used intended to calculate the total entropy for salary data (either "High" or "Low" salaries) by using the calculate_entropy function.
**divide_entropy(subsets) -> float:
- Purpose: Calculate the weighted average of entropy for a given division of data subsets. - its useful in the decision of tree algorithms to measure the effectiveness of a split.
- Process: - the entropy for each subset by weighing the subset's proportion to the total number of instances.
- Returns: The weighted average entropy.
**DecisionTreeNode Class:
- Each node can either be a decision node - feature_index and a feature_value
  - or a leaf node- a predicted class label for that branch.
divide_dataset Function:
- Splits the dataset and corresponding labels based on a specified feature index and its value.
- function is essential for recursively dividing the dataset as the tree grows, ensuring each child node gets the correct subset of data.
build_tree Function: - checks for base cases- the dataset is empty or all instances belong to the same class). - It then finds the best feature to split on by calculating the information gain for all features. - Next, it creates a new node based on the best feature and recursively builds the tree for each subset of the dataset resulting from splitting on the best feature.
Error Handling and Validation - Checking if data and labels are lists, and if their lengths match. - Ensuring feature index is within the bounds of the dataset. - Handling cases where there is an empty dataset or labels/features are missing.

hw2q2's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs