The dataset in input and the inputs parameters to featurewiz are the same for both projects. I executed step by step the two versions of featurewiz and I could say that the results of the two versions diverge starting from the computation of the correlation matrix at the beginning of FE_remove_variables_using_SULOV_method. Moreover, I noticed that the target label is modified by mlb = My_LabelEncoder() and dataname[each_target] = mlb.fit_transform(dataname[each_target]) in the new version before calling SULOV but it didn't happen in the previous version.
Could you clarify what main differences have been introduced in the new version? As you can imagine, such a significant difference in the outputs of the two versions is unpleasant.
Imported featurewiz: advanced feature engg and selection library. Version=0.0.38
output = featurewiz(dataname, target, corr_limit=0.70,
verbose=2, sep=',', header=0, test_data='',
feature_engg='', category_encoders='')
Create new features via 'feature_engg' flag : ['interactions','groupby','target']
Skipping feature engineering since no feature_engg input...
Skipping category encoding since no category encoders specified in input...
Shape of your Data Set loaded: (38, 3385)
Filename is an empty string or file not able to be loaded
############## C L A S S I F Y I N G V A R I A B L E S ####################
Classifying variables in data set...
3384 Predictors classified...
2022-07-26 00:03:54.391147: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
142 variable(s) will be ignored since they are ID or low-information variables
Shape of your Data Set loaded: (38, 3385)
Number of processors on machine = 1
No GPU active on this device
Running XGBoost using CPU parameters
############## C L A S S I F Y I N G V A R I A B L E S ####################
Classifying variables in data set...
3384 Predictors classified...
142 variable(s) will be ignored since they are ID or low-information variables
Removing 142 columns from further processing since ID or low information variables
columns removed: ['x427', 'x433', 'x439', 'x771', 'x777', 'x783', 'x825', 'x831', 'x837', 'x850', 'x856', 'x862', 'x1216', 'x1228', 'x1240', 'x1248', 'x1254', 'x1260', 'x1273', 'x1279', 'x1285', 'x1289', 'x1301', 'x1313', 'x1617', 'x1623', 'x1629', 'x1633', 'x1639', 'x1645', 'x1651', 'x1657', 'x1663', 'x1671', 'x1677', 'x1683', 'x1696', 'x1702', 'x1708', 'x1712', 'x1724', 'x1736', 'x2056', 'x2062', 'x2068', 'x2074', 'x2080', 'x2086', 'x2094', 'x2100', 'x2106', 'x2119', 'x2125', 'x2131', 'x2135', 'x2147', 'x2159', 'x2479', 'x2491', 'x2503', 'x2517', 'x2523', 'x2529', 'x2542', 'x2548', 'x2554', 'x2558', 'x2570', 'x2582', 'x2965', 'x2971', 'x2977', 'x2981', 'x2993', 'x3005', 'x3325', 'x3337', 'x3349', 'x3363', 'x3369', 'x3375', 'x490', 'x495', 'x497', 'x502', 'x503', 'x507', 'x844', 'x913', 'x918', 'x920', 'x924', 'x925', 'x926', 'x936', 'x1267', 'x1268', 'x1330', 'x1332', 'x1334', 'x1336', 'x1341', 'x1343', 'x1347', 'x1348', 'x1349', 'x1351', 'x1360', 'x1361', 'x1753', 'x1755', 'x1757', 'x1759', 'x1764', 'x1766', 'x1770', 'x1771', 'x1772', 'x1778', 'x2176', 'x2178', 'x2180', 'x2182', 'x2187', 'x2189', 'x2194', 'x2195', 'x2207', 'x2599', 'x2601', 'x2603', 'x2605', 'x2610', 'x2616', 'x2617', 'x2959', 'x3022', 'x3024', 'x3026', 'x3028', 'x3039', 'x3046']
After removing redundant variables from further processing, features left = 3242
Single_Label Binary_Classification Feature Selection Started
Searching for highly correlated variables from 3242 variables using SULOV method
SULOV : Searching for Uncorrelated List Of Variables (takes time...)
Removing (3224) highly correlated variables:
Following (18) vars selected: ['x2', 'x10', 'x26', 'x69', 'x87', 'x129', 'x187', 'x417', 'x496', 'x554', 'x608', 'x975', 'x1033', 'x1156', 'x1765', 'x2134', 'x2910', 'x3176']