pysrurgs / pysrurgs Goto Github PK

View Code? Open in Web Editor NEW

13.0 5.0 3.0 8.77 MB

Symbolic regression by uniform random global search

License: GNU General Public License v3.0

Python 97.41% TeX 2.17% Shell 0.41%

symbolic-regression random enumeration binary-tree python python3

pysrurgs's People

Contributors

Stargazers

Watchers

Forkers

fjstinar rishirelan zhouwenfengtyrantasteroid

pysrurgs's Issues

Unexpected results for simple test case (JOSS review)

For the JOSS review (openjournals/joss-reviews#1675), I created a very basic test case to try pySRURGS. This is my data:

x, y
1, 2
2, 3
3, 4
4, 5
5, 6
6, 7

With python pySRURGS.py -max_num_fit_params 1 -max_permitted_trees 2000 -funcs_arity_two=add,sub,mul,div -plotting ./csvs/test.csv 100, I get the following output:

Running in multi processor mode
105it [00:14,  7.06it/s]                                                                                                                                                                                                                      
Making sure we meet the iters value
  Normalized Mean Squared Error       R^2  Equation, simplified                                           Parameters
-------------------------------  --------  -----------------------------------------------------------  ------------
                    0            1         p0*(p0 - x**2)                                                   1
                    6.68694e-17  1         -(p0 + x**2*(x - 1))*(p0**2 - x**2)/(p0*x**4)                    1.03e-08
                    0.00811055   0.999165  (-p0**3*(x + 1) + x*(2*p0 + 1)*(p0**2 - 1))/(x*(p0**2 - 1))      0.57
                    0.0209102    0.997802  (p0 + x)*(p0 - x + 1)/(x**2 + (p0 + x)*(2*p0 + x))               1.54
                    0.042286     0.996303  p0*(x*(x - 3) + 1)/x                                            -0.381

Obviously, 1 - x**2 is not a good model for this data, yet still it shows up as the best solution. However, the plot looks fine:

Am I doing something wrong, or is this a bug?

Unittest

Replace the code in test.py with code using the unittest module.

Review performed using most-current github (not in releases)

https://github.com/pySRURGS/pySRURGS.git

Installation

Installation instructions work well
- Though, it maybe worth while putting on pip as a package?

Testing

the function test_command_line_code shows that tests are done to command-line interface, however this doesn't give an indication to the results of the tests.
- In other words: a test could be run, instantly return 0 and be considered a "success".
I wonder, would it be possible to provide a testing table with expected outputs. This way a user can verify that their usage is running as expected

Documentation

The URL for documentation https://pysrurgs.github.io/pySRURGS/ doesn't come across as very intuitive, perhaps usage of a documentation generator e.g. Sphinx would do well.
More clarity could be given in the explanation of results (see below confusion of mine).

Methods

I haven't dug into the methods too heavily but is it possible to ensure somehow that for given input you always get the same output, the stochasticity of the outputs? I noticed this when my results of your readme.md example were different.

Results

How should one interpret the output
- is (-p0*p1 + p2 + x**2/p0)**(-x*(p1**p1)**x) equivalent to
  - The pX is the coefficients given as parameters?
If the above point is understood correct, then it results are not given in their most simplified form (though, I admit this may be a technical misunderstanding of mine)

Other

As changes were made recently, the releases of the software is now out of date (at time of posting this comment)
Perhaps I've overlooked it, but was this work funded or supported by any research councils that need mentioned?
The acknowledgements cited in the readme.md should be done better
- You should indicate the artists for the icon, as well as the original location that you took it from, as well as any license which is applied to it
- The algorithm of Tychonievich, 2013, seems to be a novel addition to your software, did they have enough input in this project to warrent authorship also?

Have you considered uploading pySRURGS to PyPI? That would simplify the installation process (which is admittedly already quite simple). Even if you don't put it on PyPI, if you make pySRURGS a proper package (see for example here for instructions), it can simply be installed directly from GitHub with
pip install git+https://github.com/pySRURGS/pySRURGS.git

arity_one_function

I'm trying to define the functions of arity one that are permitted in the symbolic regression run.
So I created a SRconfig object like this :
SR_config = SymbolicRegressionConfig(r'.\csv\mydataReal.csv', None, n_functions=['add', 'sub', 'mul', 'div', 'pow'], f_functions=['sin', 'sinh', 'log', 'cos', 'cosh', 'tan', 'tanh', 'exp', 'pow', 'sqrt'], max_num_fit_params=2, max_permitted_trees=200)

I just wanted to see a random equation generated so I wrote the following lines :
(f, n, m, cum_weights, N, dataset, enumerator, _, _) = setup(SR_config) eqn_str = random_equation_binary_tree(N, cum_weights, enumerator, SR_config) simple_eqn = simplify_equation_string(eqn_str, dataset) print(simple_eqn)

But in my terminal, the following expression was return : (-p1 + x0 + x1 - x3)(p1 - x0 + x12)(p1**x5 - x1 + x2 + 2x4)

It seems that it doesn't create an expression with the possible function of arity one I have inserted. Maybe I'm doing something wrong here.

Error : IndexError: list index out of range

Hello,

I used pySRURGS to search for the equation of best fit for my numerical dataset and everything was going great until, just after "Running in multi processor mode" finished (100%), it doesn't display the equations with R^2, but Traceback the following error :

Traceback (most recent call last):
File "pySRURGS.py", line 2714, in
plot_results(SRconfig)
File "pySRURGS.py", line 2204, in plot_results
best_model = result_list._results[0]
IndexError: list index out of range

How may I fix this problem?

Please find attached all the optionnal arguments that I considered.

Regards,

Loïc

weights for dataset

@anthonyrollett suggested that we permit a column of weights such that less important datapoints are weighted less heavily.

The plan to implement this is to
(1) add an additional CLI argument pointing to a CSV which houses the weights,
(2) load this additional CSV into pySRURGS.Dataset
(3) when running pySRURGS.eval_equation, multiply residual by the values in this CSV

Problem when running test.py

I have 8 errors and 1 failure while running test.py . The ERROR is : FileNotFoundError: [WinError 2] The system cannot find the file specified
and the Failure is : AssertionError: False != True
How may I fix this issue ?

Max. number of fit params

I just tried using SRURG for the first time tonight and it's impressively easy to install and use. This is probably a stupid question but I tried varying Max_num_fit_params (between 1 & 3) but it seemed as I got back the same answer with 3 params (p0, p1, p2), regardless of that input. When I increased to 4 then I saw an additional p3 in the Eq sets. I apologize if I've misunderstood the method. Thanks, Tony Rollett
PS. I tried editing the pySRURGS.py to have max_num_fit_params=2 but that did not seem to make any difference.

add functionality for i==0 in exhaustive_search

currently, code skips over the case of i==0 representing a simple terminal because there are zero configuration of operators for such a case. this needs additional logic in the code to make it work

Plotting issue

Hello,
I'm trying to use the exemple on the READme file but at the end, the plotting crash.
Here is the command I entered (Linux terminal):
~/Desktop/Pysrugs/pySRURGS-master$ python3 pySRURGS.py -max_num_fit_params 3 -max_permitted_trees 1000 -plotting ./csv/x1_squared_minus_five_x3.csv 2000

And here is the error :
Traceback (most recent call last):
File "pySRURGS.py", line 2713, in
plot_results(SRconfig)
File "pySRURGS.py", line 2203, in plot_results
best_model = result_list._results[0]
IndexError: list index out of range

Do you know why this error occurs ?

Multiprocessing broken - JOSS release works

So the multiprocessing using a worker does not seem to work.
The JOSS release works fine for multiprocessing but the current version only works when using the -single flag. Will need to revert.

Understanding the random_equation_binary_tree method

Hello,
I have juste discover this library and I'm intersting in the random_equation_binary_tree method. I would like to understand how it works but I can't understand the purposes of each parameters and how the nodes are created.
Can someone enlighten me ?

Documentation of command line arguments

I think the documentation of the command line arguments could be improved, specifically:

What is the purpose of the -run_ID argument? What would the ID be used for?
What is the use of counting all possible equations (with -count)? For example, does this number help me to decide what to use for -max_permitted_trees? Also, isn't it possible to have an infinite number of possible equations?
The help says that the default for -path_to_db is None, but even if it's not set, something is stored in the pySRURGS/db/.
In general, what exactly is stored in the database file, and in which format? You could add suggestions for how to read this data.

If this ends up making the command line help too long, you could also provide a more detailed description in the README.md (instead of copying the command line help), and use shorter descriptions for the command line.

This issue is part of the JOSS review (openjournals/joss-reviews#1675).

optimize the constant

how do you optimize constant in your code?

Test PyPy

See if any performance gains to be had if using pypy instead of cpython

Changing data/parameters between consecutive runs

While I was experimenting with pySRURGS, I ran into the following to problems:

I was changing the values in my input csv file between consecutive runs, and I was surprised that the printed output didn't change at all. I assume that pySRURGS is not supposed to be used that way, so it might be useful to return an error message or a warning in that case.

Between consecutive runs with the same input file, I reduced max_num_fit_params from 3 to 2, resulting in the following error:

File "pySRURGS.py", line 1283, in <module>
  plot_results(path_to_db, path_to_csv, SRconfig)
File "pySRURGS.py", line 1127, in plot_results
  y_calc = eval_equation(params_obj, eval_eqn_string, dataset, mode=data_dict)
File "pySRURGS.py", line 779, in eval_equation
  y_value = eval(function_string)
File "<string>", line 1, in <module>
KeyError: 'p2'

I'm not sure if it makes sense to do that between consecutive runs. If it does, it might be worth fixing this, and if it doesn't, it would again be useful to print a warning or an error.

Those are just suggestions for improving pySRURGS, I do not consider them issues for the JOSS paper.

benchmark generation

The benchmark generation functions all use the setup() function

These retrieve the number of variables from the toy_csv file

We need to make the number of variables user specified via arguments.

sqlite3 concurrency potential issues on network file systems

Converting multiprocessing to system using a master-worker system whereby there is only ever 1 process writing results to disk.

Attribute Error - Result

https://stackoverflow.com/questions/40287657/load-pickled-object-in-different-file-attribute-error

The Result gets pickled in a different namespace than the functions which originally called it.

Issue occurs when doing pytest --cov=pySRURGS -s ./test.py
and not a simple python test.py call.

ith_full_binary_tree naming

Technically, since we permit functions of arity one
ith_full_binary_tree should be renamed to ith_binary_tree
and
ith_full_binary_tree2 should be ith_full_binary_tree