rivolli / mfe Goto Github PK
View Code? Open in Web Editor NEWMeta-Feature Extractor
License: Other
Meta-Feature Extractor
License: Other
when the dataset has a column named with '0' there are a problem in the landmarking.
Specifically in the function replace.nominal.columns
TODO: attribute_entropy, class_entropy is the normalized version
TODO: irrelevant.attributes
TODO describe discretization method
If one of the landmarkers fails, an error happens and there is no result returned by the function mfe::mf.landmarking()
.
In my case, I debugged the the function call, and 6 different values should be returned. However, it failed forth one (knn), due to some dataset's characteristics. In theory, the first three algorithms worked, but with the error, no previous result is provided.
I handled that using tryCatch statement and imputing NAs. However, the whole vector has NAs values, but the correct would be just the missing descriptor.
$landmarking
decision.stumps elite.nearest.neighbor linear.discriminant
NA NA NA
naive.bayes nearest.neighbor worst.node
NA NA NA
Is there any smartest way I could use from my side? Calling the functions individually?
The mfe throws an error when the dataset has fewer instances in one class than the number of folds.
I remember that we solved this problem by doing min(10, min(table(y))
If the dataset target attribute has a level without no sample, the measure will raise an exception and stop the code unexpected
Using quantile to summarize data can generate problems when NA is produced in the measures
It should be better if the summarization function range works like iqr and returns a single value
landmarking brokes using this dataset, after I rename the columns it works.
colnames(xdata)
[1] "-" "/" "0"
[4] "00" "04" "0;"
[7] "0cm" "1" "1-1/2"
[10] "1-1/2-year" "1-month" "1-year"
[13] "1-year-1-month" "1-year-4-month" "1-year-5-month"
[16] "1/2" "1/2/01" "10"
[19] "10-month" "10-year" "101"
[22] "102" "103" "104"
[25] "105" "11" "11-1/2-year"
[28] "11-month" "11-month-16-day" "11-year"
[31] "11-year-5-month" "12" "12-day"
[34] "12-month" "12-year" "12-year-7-month"
[37] "13" "13-1/2-year" "13-day"
[40] "13-month" "13-year" "134"
[43] "14" "14-day" "14-year"
[46] "15-month" "15-year" "15-year-6-month"
[49] "16" "16-month" "16-year"
[52] "17-day" "17-year" "18"
[55] "18-month" "18-year" "19-month"
[58] "19-year" "2" "2-1/2-year"
[61] "2-3" "2-month" "2-year"
[64] "2-year-3-month" "2-year-4-month" "2-year-9-month"
[67] "20-day" "20-year" "2001"
[70] "21" "21-day" "21-month"
[73] "21-year" "22" "22-day"
[76] "27" "28" "2;"
[79] "2nd" "3" "3-4"
[82] "3-month" "3-year" "3-year-10-month"
[85] "3-year-5-month" "38" "3mm"
[88] "3rd" "4" "4-1/2-year"
[91] "4-month" "4-year" "4-year-11-month"
[94] "4-year-3-month" "4-year-4-month" "4-year-8-month"
[97] "5" "5-1/2-year" "5-6"
[100] "5-month" "5-month-19-day" "5-year"
[103] "591" "5th" "6"
[106] "6-month" "6-month-6-day" "6-year"
[109] "6-year-11-month" "6-year-4-month" "7"
[112] "7-1/2-year" "7-month" "7-year"
[115] "7-year-4-month" "8" "8-month"
[118] "8-month-13-day" "8-year" "8-year-2-month"
[121] "8-year-9-month" "85-month" "9"
[124] "9-day" "9-month" "9-month-24-day"
[127] "9-month-30-day" "9-year" "9-year-1-month"
[130] "90s" "90th" "9;"
[133] "abdomen" "abdominal" "abnormal"
It should be better if the summarization function range works like iqr and returns a single value
Add drop=FALSE to all modFrame to avoid error with datasets with only one predictive attribute
Using quantile to summarize data can generate problems when NA is produced in the measures
leavesPerClass - the number of leaves for each class
Imbalance [Bensusan2000]
TODO: alternatives: mean by attribute or class
TODO: index of dispersion
TODO: z-score, normal cumulative distribution, Chi-square test
TODO: degree of fuzziness
Thank you for this awesome library.
Can you please include calculation of time per meta-feature :). It comes handy :).
Density = N / (F * C)
From "The minimum ratio of preserving the dataset similarity in resampling: (1 โ 1/e)"
I was reported that when the base has factor attributes with a single level it breaks the code.
Create a test to assess and fix this issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.