pysrurgs / pysrurgs Goto Github PK
View Code? Open in Web Editor NEWSymbolic regression by uniform random global search
License: GNU General Public License v3.0
Symbolic regression by uniform random global search
License: GNU General Public License v3.0
For the JOSS review (openjournals/joss-reviews#1675), I created a very basic test case to try pySRURGS. This is my data:
x, y
1, 2
2, 3
3, 4
4, 5
5, 6
6, 7
With python pySRURGS.py -max_num_fit_params 1 -max_permitted_trees 2000 -funcs_arity_two=add,sub,mul,div -plotting ./csvs/test.csv 100
, I get the following output:
Running in multi processor mode
105it [00:14, 7.06it/s]
Making sure we meet the iters value
Normalized Mean Squared Error R^2 Equation, simplified Parameters
------------------------------- -------- ----------------------------------------------------------- ------------
0 1 p0*(p0 - x**2) 1
6.68694e-17 1 -(p0 + x**2*(x - 1))*(p0**2 - x**2)/(p0*x**4) 1.03e-08
0.00811055 0.999165 (-p0**3*(x + 1) + x*(2*p0 + 1)*(p0**2 - 1))/(x*(p0**2 - 1)) 0.57
0.0209102 0.997802 (p0 + x)*(p0 - x + 1)/(x**2 + (p0 + x)*(2*p0 + x)) 1.54
0.042286 0.996303 p0*(x*(x - 3) + 1)/x -0.381
Obviously, 1 - x**2
is not a good model for this data, yet still it shows up as the best solution. However, the plot looks fine:
Am I doing something wrong, or is this a bug?
Replace the code in test.py with code using the unittest module.
test_command_line_code
shows that tests are done to command-line interface, however this doesn't give an indication to the results of the tests.
Have you considered uploading pySRURGS to PyPI? That would simplify the installation process (which is admittedly already quite simple). Even if you don't put it on PyPI, if you make pySRURGS a proper package (see for example here for instructions), it can simply be installed directly from GitHub with
pip install git+https://github.com/pySRURGS/pySRURGS.git
I'm trying to define the functions of arity one that are permitted in the symbolic regression run.
So I created a SRconfig object like this :
SR_config = SymbolicRegressionConfig(r'.\csv\mydataReal.csv', None, n_functions=['add', 'sub', 'mul', 'div', 'pow'], f_functions=['sin', 'sinh', 'log', 'cos', 'cosh', 'tan', 'tanh', 'exp', 'pow', 'sqrt'], max_num_fit_params=2, max_permitted_trees=200)
I just wanted to see a random equation generated so I wrote the following lines :
(f, n, m, cum_weights, N, dataset, enumerator, _, _) = setup(SR_config) eqn_str = random_equation_binary_tree(N, cum_weights, enumerator, SR_config) simple_eqn = simplify_equation_string(eqn_str, dataset) print(simple_eqn)
But in my terminal, the following expression was return : (-p1 + x0 + x1 - x3)(p1 - x0 + x12)(p1**x5 - x1 + x2 + 2x4)
It seems that it doesn't create an expression with the possible function of arity one I have inserted. Maybe I'm doing something wrong here.
Hello,
I used pySRURGS to search for the equation of best fit for my numerical dataset and everything was going great until, just after "Running in multi processor mode" finished (100%), it doesn't display the equations with R^2, but Traceback the following error :
Traceback (most recent call last):
File "pySRURGS.py", line 2714, in
plot_results(SRconfig)
File "pySRURGS.py", line 2204, in plot_results
best_model = result_list._results[0]
IndexError: list index out of range
How may I fix this problem?
Please find attached all the optionnal arguments that I considered.
Regards,
Loïc
@anthonyrollett suggested that we permit a column of weights such that less important datapoints are weighted less heavily.
The plan to implement this is to
(1) add an additional CLI argument pointing to a CSV which houses the weights,
(2) load this additional CSV into pySRURGS.Dataset
(3) when running pySRURGS.eval_equation, multiply residual
by the values in this CSV
I just tried using SRURG for the first time tonight and it's impressively easy to install and use. This is probably a stupid question but I tried varying Max_num_fit_params (between 1 & 3) but it seemed as I got back the same answer with 3 params (p0, p1, p2), regardless of that input. When I increased to 4 then I saw an additional p3 in the Eq sets. I apologize if I've misunderstood the method. Thanks, Tony Rollett
PS. I tried editing the pySRURGS.py to have max_num_fit_params=2 but that did not seem to make any difference.
currently, code skips over the case of i==0 representing a simple terminal because there are zero configuration of operators for such a case. this needs additional logic in the code to make it work
Hello,
I'm trying to use the exemple on the READme file but at the end, the plotting crash.
Here is the command I entered (Linux terminal):
~/Desktop/Pysrugs/pySRURGS-master$ python3 pySRURGS.py -max_num_fit_params 3 -max_permitted_trees 1000 -plotting ./csv/x1_squared_minus_five_x3.csv 2000
And here is the error :
Traceback (most recent call last):
File "pySRURGS.py", line 2713, in
plot_results(SRconfig)
File "pySRURGS.py", line 2203, in plot_results
best_model = result_list._results[0]
IndexError: list index out of range
Do you know why this error occurs ?
So the multiprocessing using a worker does not seem to work.
The JOSS release works fine for multiprocessing but the current version only works when using the -single
flag. Will need to revert.
Hello,
I have juste discover this library and I'm intersting in the random_equation_binary_tree method. I would like to understand how it works but I can't understand the purposes of each parameters and how the nodes are created.
Can someone enlighten me ?
I think the documentation of the command line arguments could be improved, specifically:
-run_ID
argument? What would the ID be used for?-count
)? For example, does this number help me to decide what to use for -max_permitted_trees
? Also, isn't it possible to have an infinite number of possible equations?-path_to_db
is None
, but even if it's not set, something is stored in the pySRURGS/db/
.If this ends up making the command line help too long, you could also provide a more detailed description in the README.md
(instead of copying the command line help), and use shorter descriptions for the command line.
This issue is part of the JOSS review (openjournals/joss-reviews#1675).
how do you optimize constant in your code?
See if any performance gains to be had if using pypy instead of cpython
While I was experimenting with pySRURGS, I ran into the following to problems:
max_num_fit_params
from 3 to 2, resulting in the following error:
File "pySRURGS.py", line 1283, in <module>
plot_results(path_to_db, path_to_csv, SRconfig)
File "pySRURGS.py", line 1127, in plot_results
y_calc = eval_equation(params_obj, eval_eqn_string, dataset, mode=data_dict)
File "pySRURGS.py", line 779, in eval_equation
y_value = eval(function_string)
File "<string>", line 1, in <module>
KeyError: 'p2'
Those are just suggestions for improving pySRURGS, I do not consider them issues for the JOSS paper.
The benchmark generation functions all use the setup() function
These retrieve the number of variables from the toy_csv file
We need to make the number of variables user specified via arguments.
Converting multiprocessing to system using a master-worker system whereby there is only ever 1 process writing results to disk.
https://stackoverflow.com/questions/40287657/load-pickled-object-in-different-file-attribute-error
The Result gets pickled in a different namespace than the functions which originally called it.
Issue occurs when doing pytest --cov=pySRURGS -s ./test.py
and not a simple python test.py
call.
Technically, since we permit functions of arity one
ith_full_binary_tree should be renamed to ith_binary_tree
and
ith_full_binary_tree2 should be ith_full_binary_tree
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.