I have implemented a nearest neighbor algorithm from scratch to classify glass types of different glass . I have also extended the implementation as weighted KNN as well.
Dataset consists of 214 samples with discrete 6 class types. (Type 4 class have no samples in the dataset, thus the dataset includes 6 different class types {”1”, ”2”, ”3”, ”5”, ”6”, ”7”}
- RI: refractive index
- Na: Sodium
- Mg: Magnesium
- Al: Aluminum
- Si: Silicon
- K: Potassium
- Ca: Calcium
- Ba: Barium
- Fe: Iron
- Type: Type of glass: (class attribute)
- Feature Normalization
- Shuffle all data using numpy permutations
- Cross validation for dataset
- Implemented a nearest neighbor algorithm
- Implemented weighted nearest neighbor algorithm
- Selecting K as 1 or 3, is the most convientient for this tasks.
- weighted_normalized data give in avarage the 67% as accuracy
- The more datat we have, the more the accuracy improved.
- When we have less Data, crross valiation is important.
- Without shuffling data, accuracy goes down. KNN does not generalize well.
- Learning happended, we we shuffle data and diverse the features.
- Accuracy may be improved by adding more data.
- Cross validation increased time of training better to now use 5 K. 3 maybe good. because I does not increaze overall accuracy that much.
- "Weighted non-normalized KNN"is the right implmentation for glass classification.
- Weighted Normalized KNN is not a good choice for this problem.
- Normalization data does not help much when the distance are too close