hengzhe-zhang / evolutionaryforest Goto Github PK

An open source python library for automated feature engineering based on Genetic Programming

License: GNU Lesser General Public License v3.0

Makefile 0.19% Python 54.59% Jupyter Notebook 45.22%

genetic-programming automl automated-feature-engineering automated-machine-learning feature-engineering machine-learning data-science

evolutionaryforest's People

Contributors

Stargazers

Watchers

evolutionaryforest's Issues

Multiprocessing

Evolutionary Forest version: 0.1.7
Python version: 3.6.0
Operating System: linux

Description

Hi, there. Is parallel supported in this package. I notice there is a param "n_process", but when I use a dataset with #50_0000 times 100 featues, the program stucks there and I dont what is going on.

The question about the following errors

When I created the environment and run the first simple case of Zhihu, I encountered the following errors. I hope the author can reply at his free time, thanks!

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

SKLearn Version Error

In Google Colab, it does not work with the current version of scikit-learn (version 1.4.2).
I get this error ( "ImportError: cannot import name '_check_fit_params' from 'sklearn.utils.validation' (/usr/lo" )
It fixes it if I change requirements.txt to scikit_learn==1.2.2 but that is a problem when I run it locally instead of on Colab. Are you able to make it work with the latest version of sklearn?

How to add new operators?

Hi, is there a away to add custom operators?

feature_append 报错

new_train = feature_append(r, x_train, list(code_importance_dict.keys())[:len(code_importance_dict.keys()) // 2], only_new_features=True)

这里报错：
TypeError: () takes 10 positional arguments but 4638 were given

背景：我只想用特征工程，输入df ，产生new_df 然后路径是这样写的。

r.fit(x, y)
ture_importance_dict = get_feature_importance(r)
code_importance_dict = get_feature_importance(r, simple_version=False)
new_x = feature_append(r, x, list(code_importance_dict.keys())[:len(code_importance_dict.keys())],only_new_features=True)

Cannot Pickle the final model

Evolutionary Forest version:
Python version:
Operating System:

Description

I tried to pickle the final model, and save it into the disk using Colab. However, I received the following error. Furthermore, I tried joblib and dill packages but received the same error.

What I Did

with open('test.pkl', 'wb') as file:  
    pickle.dump(r, file)

Exception:

PicklingError Traceback (most recent call last)
in ()
1 with open(f'{curr}.pkl', 'wb') as file:
----> 2 pickle.dump(r, file)

PicklingError: Can't pickle <function cxOnePoint_multiple_gene at 0x7f7adbcde320>: it's not the same object as evolutionary_forest.multigene_gp.cxOnePoint_multiple_gene

ValueError: need at most 63 handles, got a sequence of length 66

Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\miniconda3\envs\lib\threading.py", line 973, in _bootstrap_inner
self.run()
File "C:\Users\miniconda3\envs\lib\threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\miniconda3\envs\lib\multiprocessing\pool.py", line 519, in _handle_workers
cls._wait_for_updates(current_sentinels, change_notifier)
File "C:\Users\miniconda3\envs\lib\multiprocessing\pool.py", line 499, in _wait_for_updates
wait(sentinels, timeout=timeout)
File "C:\Users\miniconda3\envs\lib\multiprocessing\connection.py", line 884, in wait
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
File "C:\Users\miniconda3\envs\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 66

AttributeError: 'NoneType' object has no attribute 'split'"

Evolutionary Forest version: 最新的
Python version: 3.9
Operating System: Windows

作者您好！
我用你的tutorial的例子跑，现在不行了，应该是我用的某个依赖库的版本和你不同？
请问怎么才能跑通？能不降版本吗

AttributeError Traceback (most recent call last)
Cell In[2], line 10
7 from sklearn.metrics import r2_score
8 from sklearn.model_selection import train_test_split
---> 10 from evolutionary_forest.forest import EvolutionaryForestRegressor
12 random.seed(0)
13 np.random.seed(0)

File c:\Users\JOJO\anaconda3\lib\site-packages\evolutionary_forest\forest.py:22
20 from sklearn.base import RegressorMixin, BaseEstimator, ClassifierMixin, TransformerMixin
21 from sklearn.cluster import KMeans
---> 22 from sklearn.compose.tests.test_target import DummyTransformer
23 from sklearn.decomposition import PCA
24 from sklearn.ensemble import ExtraTreesRegressor, GradientBoostingRegressor, RandomForestClassifier, \
25 RandomForestRegressor

File c:\Users\JOJO\anaconda3\lib\site-packages\sklearn\compose\tests\test_target.py:11
9 from sklearn.pipeline import Pipeline
10 from sklearn.preprocessing import FunctionTransformer, StandardScaler
---> 11 from sklearn.utils._testing import assert_allclose, assert_no_warnings
13 friedman = datasets.make_friedman1(random_state=0)
16 def test_transform_target_regressor_error():

File c:\Users\JOJO\anaconda3\lib\site-packages\sklearn\utils\_testing.py:371
368 skip_if_32bit = pytest.mark.skipif(_IS_32BIT, reason="skipped on 32bit platforms")
369 fails_if_pypy = pytest.mark.xfail(IS_PYPY, reason="not compatible with PyPy")
370 fails_if_unstable_openblas = pytest.mark.xfail(
--> 371 _in_unstable_openblas_configuration(),
372 reason="OpenBLAS is unstable for this configuration",
373 )
374 skip_if_no_parallel = pytest.mark.skipif(
375 not joblib.parallel.mp, reason="joblib is in serial mode"
376 )
377 skip_if_array_api_compat_not_configured = pytest.mark.skipif(
378 not ARRAY_API_COMPAT_FUNCTIONAL,
379 reason="requires array_api_compat installed and a new enough version of NumPy",
380 )

File c:\Users\JOJO\anaconda3\lib\site-packages\sklearn\utils\init.py:89, in _in_unstable_openblas_configuration()
86 import numpy # noqa
87 import scipy # noqa
---> 89 modules_info = threadpool_info()
91 open_blas_used = any(info["internal_api"] == "openblas" for info in modules_info)
92 if not open_blas_used:

File c:\Users\JOJO\anaconda3\lib\site-packages\sklearn\utils\fixes.py:85, in threadpool_info()
83 return controller.info()
84 else:
---> 85 return threadpoolctl.threadpool_info()

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:124, in threadpool_info()
107 @_format_docstring(USER_APIS=list(_ALL_USER_APIS),
108 INTERNAL_APIS=_ALL_INTERNAL_APIS)
109 def threadpool_info():
110 """Return the maximal number of threads for each detected library.
111
112 Return a list with all the supported modules that have been found. Each
(...)
122 In addition, each module may contain internal_api specific entries.
123 """
--> 124 return _ThreadpoolInfo(user_api=_ALL_USER_APIS).todicts()

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:340, in _ThreadpoolInfo.init(self, user_api, prefixes, modules)
337 self.user_api = [] if user_api is None else user_api
339 self.modules = []
--> 340 self._load_modules()
341 self._warn_if_incompatible_openmp()
342 else:

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:373, in _ThreadpoolInfo._load_modules(self)
371 self._find_modules_with_dyld()
372 elif sys.platform == "win32":
--> 373 self._find_modules_with_enum_process_module_ex()
374 else:
375 self._find_modules_with_dl_iterate_phdr()

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:485, in _ThreadpoolInfo._find_modules_with_enum_process_module_ex(self)
482 filepath = buf.value
484 # Store the module if it is supported and selected
--> 485 self._make_module_from_path(filepath)
486 finally:
487 kernel_32.CloseHandle(h_process)

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:515, in _ThreadpoolInfo._make_module_from_path(self, filepath)
513 if prefix in self.prefixes or user_api in self.user_api:
514 module_class = globals()[module_class]
--> 515 module = module_class(filepath, prefix, user_api, internal_api)
516 self.modules.append(module)

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:606, in _Module.init(self, filepath, prefix, user_api, internal_api)
604 self.internal_api = internal_api
605 self._dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
--> 606 self.version = self.get_version()
607 self.num_threads = self.get_num_threads()
608 self._get_extra_info()

File c:\Users\JOJO\anaconda3\lib\site-packages\threadpoolctl.py:646, in _OpenBLASModule.get_version(self)
643 get_config = getattr(self._dynlib, "openblas_get_config",
644 lambda: None)
645 get_config.restype = ctypes.c_char_p
--> 646 config = get_config().split()
647 if config[0] == b"OpenBLAS":
648 return config[1].decode("utf-8")

AttributeError: 'NoneType' object has no attribute 'split'"
}

关于采用其他数据集测试后，模型性能提升效果不好的问题

Evolutionary Forest version: 0.1.7
Python version: 3.9.7
Operating System: Windows

Description

我们使用了新的数据集在示例代码中实验，但是在实验过程中，采用演化森林后，许多模型的效果并没有得到提升。但是根据演化森林的理论，应该是可以得到提升的。希望能够得到您的解答，谢谢！

可以替换掉smt的抽样方法？

安装起来难度过大.

dictionary changed size during iteration

Evolutionary Forest version:
Python version:
Operating System:

Description

I tried to run the example provided in the documentation using Google Colab however I received the following error.

What I Did

import random

import numpy as np
#from catboost import CatBoostRegressor
from lightgbm import LGBMRegressor
from sklearn.datasets import load_diabetes
from sklearn.ensemble import ExtraTreesRegressor, AdaBoostRegressor, GradientBoostingRegressor, RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

from evolutionary_forest.forest import EvolutionaryForestRegressor
from evolutionary_forest.utils import get_feature_importance, plot_feature_importance, feature_append

random.seed(0)
np.random.seed(0)

# Load Dataset
X, y = load_diabetes(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train Evolutionary Forest
r = EvolutionaryForestRegressor(max_height=3, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=64)
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

and received the following traceback:

RuntimeError Traceback (most recent call last)
in ()
3 gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
4 base_learner='Random-DT', verbose=True, n_process=64)
----> 5 r.fit(x_train, y_train)
6 print(r2_score(y_test, r.predict(x_test)))

2 frames
/usr/local/lib/python3.7/dist-packages/evolutionary_forest/forest.py in fit(self, X, y)
585
586 pop, log = self.eaSimple(self.pop, self.toolbox, self.cross_pb, self.mutation_pb, self.n_gen,
--> 587 stats=mstats, halloffame=self.hof, verbose=self.verbose)
588 self.pop = pop
589

/usr/local/lib/python3.7/dist-packages/evolutionary_forest/forest.py in eaSimple(self, population, toolbox, cxpb, mutpb, ngen, stats, halloffame, verbose)
842
843 record = stats.compile(population) if stats else {}
--> 844 logbook.record(gen=0, nevals=len(invalid_ind), **record)
845 if verbose:
846 print(logbook.stream)

/usr/local/lib/python3.7/dist-packages/deap/tools/support.py in record(self, **infos)
336 """
337 apply_to_all = {k: v for k, v in infos.items() if not isinstance(v, dict)}
--> 338 for key, value in infos.items():
339 if isinstance(value, dict):
340 chapter_infos = value.copy()

RuntimeError: dictionary changed size during iteration

Classification

Can your program be adapted to generate features for classification problems (like Titanic for example)?

PicklingError: Can't pickle <class 'deap.gp.rand101'>: it's not found as deap.gp.rand101

Evolutionary Forest version:lastest from pip
Python version:3.8
Operating System:ubuntu 18.04

Description

It seem something is wrong when running your example code

import random

import numpy as np
from catboost import CatBoostRegressor
from lightgbm import LGBMRegressor
from sklearn.datasets import load_diabetes
from sklearn.ensemble import ExtraTreesRegressor, AdaBoostRegressor, GradientBoostingRegressor, RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

from evolutionary_forest.forest import EvolutionaryForestRegressor
from evolutionary_forest.utils import get_feature_importance, plot_feature_importance, feature_append

random.seed(0)
np.random.seed(0)

# Load Dataset
X, y = load_diabetes(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train Random Forest
r = RandomForestRegressor()
r.fit(x_train, y_train)
print('随机森林R2分数', r2_score(y_test, r.predict(x_test)))

# Train Evolutionary Forest
r = EvolutionaryForestRegressor(max_height=3, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=64)
r.fit(x_train, y_train)
print('演化森林R2分数', r2_score(y_test, r.predict(x_test)))

Error

PicklingError                             Traceback (most recent call last)
/tmp/ipykernel_24448/3027070399.py in <module>
     29                                 gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
     30                                 base_learner='Random-DT', verbose=True, n_process=64)
---> 31 r.fit(x_train, y_train)
     32 print('演化森林R2分数', r2_score(y_test, r.predict(x_test)))

~/anaconda3/lib/python3.9/site-packages/evolutionary_forest/forest.py in fit(self, X, y, test_X)
   2137         else:
   2138             # Not using gradient boosting mode
-> 2139             pop, log = self.eaSimple(self.pop, self.toolbox, self.cross_pb, self.mutation_pb, self.n_gen,
   2140                                      stats=mstats, halloffame=self.hof, verbose=self.verbose)
   2141             self.pop = pop

~/anaconda3/lib/python3.9/site-packages/evolutionary_forest/forest.py in eaSimple(self, population, toolbox, cxpb, mutpb, ngen, stats, halloffame, verbose)
   2548             invalid_ind = self.multiobjective_evaluation(toolbox, population)
   2549         else:
-> 2550             invalid_ind = self.population_evaluation(toolbox, population)
   2551         if self.environmental_selection == 'NSGA2-Mixup':
   2552             self.mixup_evaluation(self.toolbox, population)

~/anaconda3/lib/python3.9/site-packages/evolutionary_forest/forest.py in population_evaluation(self, toolbox, population)
   4385         # distribute tasks
   4386         if self.n_process > 1:
-> 4387             data = [next(f) for f in fitnesses]
   4388             results = list(self.pool.map(calculate_score, data))
   4389         else:

~/anaconda3/lib/python3.9/site-packages/evolutionary_forest/forest.py in <listcomp>(.0)
   4385         # distribute tasks
   4386         if self.n_process > 1:
-> 4387             data = [next(f) for f in fitnesses]
   4388             results = list(self.pool.map(calculate_score, data))
   4389         else:

~/anaconda3/lib/python3.9/site-packages/evolutionary_forest/forest.py in fitness_evaluation(self, individual)
    714         information: EvaluationResults
    715         if self.n_process > 1:
--> 716             y_pred, estimators, information = yield pipe, dill.dumps(genes, protocol=-1)
    717         else:
    718             y_pred, estimators, information = yield pipe, genes

~/anaconda3/lib/python3.9/site-packages/dill/_dill.py in dumps(obj, protocol, byref, fmode, recurse, **kwds)
    302     """
    303     file = StringIO()
--> 304     dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
    305     return file.getvalue()
    306 

~/anaconda3/lib/python3.9/site-packages/dill/_dill.py in dump(obj, file, protocol, byref, fmode, recurse, **kwds)
    274     _kwds = kwds.copy()
    275     _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse))
--> 276     Pickler(file, protocol, **_kwds).dump(obj)
    277     return
    278 

~/anaconda3/lib/python3.9/site-packages/dill/_dill.py in dump(self, obj)
    496             raise PicklingError(msg)
    497         else:
--> 498             StockPickler.dump(self, obj)
    499         stack.clear()  # clear record of 'recursion-sensitive' pickled objects
    500         return

~/anaconda3/lib/python3.9/pickle.py in dump(self, obj)
    485         if self.proto >= 4:
    486             self.framer.start_framing()
--> 487         self.save(obj)
    488         self.write(STOP)
    489         self.framer.end_framing()

~/anaconda3/lib/python3.9/pickle.py in save(self, obj, save_persistent_id)
    558             f = self.dispatch.get(t)
    559             if f is not None:
--> 560                 f(self, obj)  # Call unbound method with explicit self
    561                 return
    562 

~/anaconda3/lib/python3.9/pickle.py in save_list(self, obj)
    929 
    930         self.memoize(obj)
--> 931         self._batch_appends(obj)
    932 
    933     dispatch[list] = save_list

~/anaconda3/lib/python3.9/pickle.py in _batch_appends(self, items)
    953                 write(MARK)
    954                 for x in tmp:
--> 955                     save(x)
    956                 write(APPENDS)
    957             elif n:

~/anaconda3/lib/python3.9/pickle.py in save(self, obj, save_persistent_id)
    601 
    602         # Save the reduce() output and finally memoize the object
--> 603         self.save_reduce(obj=obj, *rv)
    604 
    605     def persistent_id(self, obj):

~/anaconda3/lib/python3.9/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    708 
    709         if listitems is not None:
--> 710             self._batch_appends(listitems)
    711 
    712         if dictitems is not None:

~/anaconda3/lib/python3.9/pickle.py in _batch_appends(self, items)
    953                 write(MARK)
    954                 for x in tmp:
--> 955                     save(x)
    956                 write(APPENDS)
    957             elif n:

~/anaconda3/lib/python3.9/pickle.py in save(self, obj, save_persistent_id)
    601 
    602         # Save the reduce() output and finally memoize the object
--> 603         self.save_reduce(obj=obj, *rv)
    604 
    605     def persistent_id(self, obj):

~/anaconda3/lib/python3.9/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
    685                     "args[0] from __newobj__ args has the wrong class")
    686             args = args[1:]
--> 687             save(cls)
    688             save(args)
    689             write(NEWOBJ)

~/anaconda3/lib/python3.9/pickle.py in save(self, obj, save_persistent_id)
    558             f = self.dispatch.get(t)
    559             if f is not None:
--> 560                 f(self, obj)  # Call unbound method with explicit self
    561                 return
    562 

~/anaconda3/lib/python3.9/site-packages/dill/_dill.py in save_type(pickler, obj)
   1437        #print ("%s\n%s" % (obj.__bases__, obj.__dict__))
   1438         name = getattr(obj, '__qualname__', getattr(obj, '__name__', None))
-> 1439         StockPickler.save_global(pickler, obj, name=name)
   1440         log.info("# T4")
   1441     return

~/anaconda3/lib/python3.9/pickle.py in save_global(self, obj, name)
   1068             obj2, parent = _getattribute(module, name)
   1069         except (ImportError, KeyError, AttributeError):
-> 1070             raise PicklingError(
   1071                 "Can't pickle %r: it's not found as %s.%s" %
   1072                 (obj, module_name, name)) from None

PicklingError: Can't pickle <class 'deap.gp.rand101'>: it's not found as deap.gp.rand101

Problems with the version of the dependency package

Evolutionary Forest version:0.1.0
Python version:3.10.13
Operating System:Windows

Description

I'm currently installing the author's libraries, but there is no indication of version requirements for some of the packages, and I'm having problems exporting to do content imports

What I Did

I'm just importing packages at the moment and am having problems reporting errors

Other installation versions
tpot 0.12.2
scikit-learn 1.4.1.post1

from evolutionary_forest.forest import EvolutionaryForestRegressor

ImportError                               Traceback (most recent call last)
Cell In[13], line 1
----> 1 from evolutionary_forest.forest import EvolutionaryForestRegressor

File E:\Users\taibai\anaconda3\envs\pytorch\lib\site-packages\evolutionary_forest\forest.py:42
     39 from sympy import parse_expr
     40 from tpot import TPOTClassifier, TPOTRegressor
---> 42 from evolutionary_forest.component.archive import *
     43 from evolutionary_forest.component.archive import DREPHallOfFame, NoveltyHallOfFame, OOBHallOfFame, BootstrapHallOfFame
     44 from evolutionary_forest.component.configuration import CrossoverMode, ArchiveConfiguration, ImbalancedConfiguration, \
     45     EvaluationConfiguration, check_semantic_based_bc, BloatControlConfiguration, SelectionMode, \
     46     BaseLearnerConfiguration

File E:\Users\taibai\anaconda3\envs\pytorch\lib\site-packages\evolutionary_forest\component\archive.py:17
     14 from sklearn.tree import DecisionTreeRegressor
     16 from evolutionary_forest.component.configuration import ArchiveConfiguration
---> 17 from evolutionary_forest.component.evaluation import quick_result_calculation
     18 from evolutionary_forest.component.primitives import individual_to_tuple
     19 from evolutionary_forest.component.subset_selection import EnsembleSelectionADE

File E:\Users\taibai\anaconda3\envs\pytorch\lib\site-packages\evolutionary_forest\component\evaluation.py:32
     30 from evolutionary_forest.component.configuration import EvaluationConfiguration, ImbalancedConfiguration
     31 from evolutionary_forest.multigene_gp import result_post_process, MultiplePrimitiveSet, quick_fill, GPPipeline
---> 32 from evolutionary_forest.sklearn_utils import cross_val_predict
     33 from evolutionary_forest.utils import reset_random
     35 np.seterr(invalid='ignore')

File E:\Users\taibai\anaconda3\envs\pytorch\lib\site-packages\evolutionary_forest\sklearn_utils.py:10
      8 from sklearn.utils import (indexable)
      9 from sklearn.utils.metaestimators import _safe_split
---> 10 from sklearn.utils.validation import _check_fit_params
     11 from sklearn.utils.validation import _deprecate_positional_args
     12 from sklearn.utils.validation import _num_samples

ImportError: cannot import name '_check_fit_params' from 'sklearn.utils.validation' (E:\Users\taibai\anaconda3\envs\pytorch\lib\site-packages\sklearn\utils\validation.py)

Updating Requirements.txt

Evolutionary Forest version: 1.0
Python version: 3.8
Operating System: WSL

Description

Should requirements.txt be updated?
I did manage to the example nb, but had to manually pip install gplearn, scorch, umap-learn, category_encoders.

**Also, are all the packages required/essesntial? (As in, they aren't relics of previous exploration or optional)? **

Background

I was attempting to setup a blank environment for EvolutionaryForest with mamba but run into various issues with python package dependencies.

Environment setup

The final environment.yaml that I used.

name: evolutionforest

channels:
  - conda-forge

dependencies:
  - jupyter-packaging
  - jupyterlab
  - nodejs
  - pytest
  - pytest-check-links
  - python=3.9
  - yarn
  - seaborn
  - numpy
  - pandas
  - scikit-learn
  - scipy
  - pytorch
  - pip
  - pip:
    - gplearn
    - shap
    - pyade
    - scorch
    - umap-learn
    - category_encoders
    - evolutionary_forest

and also, on WSL ubuntu

sudo apt-get install gcc
sudo apt-get install g++

FINAL Rules of Evolutionary Forest

Evolutionary Forest version:
Python version:3.8
Operating System:

Description

How can I print final expression of Evolutionary Forest (if any) like PSTree or gplearn?

Thanks in advance

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

get_feature_importance failing when there are more features - likely issue with latex parsing

Description

Once I have more than a certain number of features, the parsing of latex will typically fail during the
get_feature_importance,

While `get_feature_importance(r, simple_version=True) will still work).

There are several types of error though (listing screenshots of what I got).

Seems like issues with parsing the lambda operations into math symbol, sometimes it missed a feature name, sometimes it runs into issues with other lambda description

Any feature naming convention should i follow to avoid these?

Code

To reproduce it with example codes (modifying the tutorial code with more features)

import random
import string
import pandas as pd
import numpy as np
from sklearn.datasets import make_friedman1
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from evolutionary_forest.forest import EvolutionaryForestRegressor

random.seed(0)
np.random.seed(0)

# Generate dataset
X, y = make_friedman1(n_samples=500, n_features=17, random_state=0)

# Convert numpy arrays to pandas dataframe
X = pd.DataFrame(X, columns=list(string.ascii_uppercase[:X.shape[1]]))
y = pd.DataFrame(y, columns=['Target'])

# Split dataset
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train Evolutionary Forest
r = EvolutionaryForestRegressor(max_height=5, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=1)
r.fit(x_train, y_train)

from evolutionary_forest.utils import get_feature_importance, plot_feature_importance

code_importance_dict = get_feature_importance(r)

Saving the model

Evolutionary Forest version: 0.2.3
Python version: 3.11.4
Operating System: Windows

Description

Hi, do you know how I could save a model that was fitted with EvolutionaryForestRegressor?
I don't see a save or dump method, and pickle currently gives me an error:
AttributeError: Can't pickle local object 'EvolutionaryForestRegressor.primitive_initialization..'

Thank you in advance.

Catboost Error

FYI, you left CatBoost off of your requirements.txt file, so it then gives an error trying to import it.

hengzhe-zhang / evolutionaryforest Goto Github PK

evolutionaryforest's People

Contributors

Stargazers

Watchers

Forkers

evolutionaryforest's Issues

Description

When I created the environment and run the first simple case of Zhihu, I encountered the following errors. I hope the author can reply at his free time, thanks!

What I Did

Description

What I Did

Exception:

Description

Description

What I Did

Description

Error

Description

What I Did

Description

Background

Environment setup

Description

What I Did

Description

Code

Description

Recommend Projects

Recommend Topics

Recommend Org

Jobs