MOSES is a machine-learning tool; it is an "evolutionary program
learner". It is capable of learning short programs that capture
patterns in input datasets. These programs can be output in either
the Atomese programming
language, or in python. For a given data input, the programs will
roughly recreate the dataset on which they were trained.
MOSES has been used in several commercial applications, including
the analysis of medical physician and patient clinical data, and
in several different financial systems. It is also used by OpenCog
to learn automated behaviors, movements and actions in response to
perceptual stimulus of artificial-life virtual agents (i.e. pet-dog
game avatars). Future plans including using it to learn behavioral
programs that control real-world robots, via the OpenPsi implementation
of Psi-theory and ROS nodes running on the OpenCog AtomSpace.
The term "evolutionary" means that MOSES uses genetic programming
techniques to "evolve" new programs. Each program can be thought
of as a tree (similar to a "decision tree", but allowing intermediate
nodes to be any programming-language construct). Evolution proceeds
by selecting one exemplar tree from a collection of reasonably fit
individuals, and then making random alterations to the program tree,
in an attempt to find an even fitter (more accurate) program.
A short example, from begining to end, can be found in
this Jupyter
notebook
(courtesy Robert Haas, for the Mevis plot package.)
License
MOSES is under double license, Apache 2.0 and GNU AGPL 3.
Documentation
Documentation can be found in the /docs directory, which includes a
"QuickStart.pdf" that reviews the algorithms and data structures
used within MOSES. A detailed man-page can be found in
/moses/moses/man/moses.1. There is also a considerable amount of
information in the OpenCog wiki:
http://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Prerequisites
To build and run MOSES, the packages listed below are required. With a
few exceptions, most Linux distributions will provide these packages.
Common OpenCog C++ utilities
http://github.com/opencog/cogutil
It uses exactly the same build procedure as this package. Be sure
to sudo make install at the end.
atomspace
OpenCog Atomspace graph database
http://github.com/opencog/atomspace
It uses exactly the same build procedure as this package. Be sure
to sudo make install at the end.
ure
OpenCog Unified Rule Engine
http://github.com/opencog/ure
It uses exactly the same build procedure as this package. Be sure
to sudo make install at the end.
Optional Prerequisites
The following packages are optional. If they are not installed, some
optional parts of MOSES will not be built. The CMake command, during
the build, will be more precise as to which parts will not be built.
MPI
Message Passing Interface
Required for compute-cluster version of MOSES
Use either MPICHV2 or OpenMPI |
http://www.open-mpi.org/ | libopenmpi-dev
Building MOSES
Perform the following steps at the shell prompt:
cd to project root dir
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
Libraries will be built into subdirectories within build, mirroring the
structure of the source directory root. The flag
-DCMAKE_BUILD_TYPE=Release
results in binaries that are optimized for for performance; ommitting
this flag will result in faster builds, but slower executables.
Unit tests
To build and run the unit tests, from the ./build directory enter (after
building moses as above):
make test
Installation
Just say sudo make install after finishing the build.
This is a design suggestion for reduct in as-moses. Basically, it says that you don't need to use the moses reduct any more, it works (should work) in the atmspace. So for example:
The point of FloatValue and ValueOfLink is that you can do stuff like this:
which should return a column of 0's and 1's, whenever A>B+C row by row. The PlusLink, etc were designed to replace the moses interpreter. They mostly work. Even reduct works .. so for example
should correctly reduce to exactly just (Variable $x). Many (most??) of the reduct rules are in the atomspace, already. I'll fix whatever bugs you find. I'm not sure I want to volunteer to write new reduct rules, though...
The task here is to implement an atomese interpreter program to execute such programs (Plus (Schema "i1") (Schema "i2"))(times (Schema "i1") (Schema "i2")) etc and represent the result.
which return the specified schema value for each row.
an attempt has been made by defining a function for the number of featurevariables. Both approaches don't seem to be efficient though.
problems with the proposed approach
The program can handle any two schemas but not three or four. For example (cog-execute! (Plus "Schema-i1" "Schema-i2" "Schema-i3")))
can not be handled.
Questions
Is there any way to define a function with unknown number of arguments?
Are programs like (cog-execute! (Plus "Schema-i1" (Times "Schema-i2" "Schema-i3"))) are expected to work?
I'm in the process of writing a research proposal that would need to have some kind of deme management system, similar to what ASMOSES does. This is all very much up in the air, and totally unclear, but ...
If it ever occurs to you that the deme management part of this system could be split out into it's own module, that would be a good thing. This is low priority and not urgent.
However I'm thinking we can probably take a middle ground approach
where the unfolding would be much lighter and wouldn't involve hacking
the interpreter so that Plus, etc would support higher level inputs
(which ultimately is probably fine and desired, but since we are in an
exploratory stage we want to avoid too much potentially unnecessary
and complicated hacking). Also, I suspect that this sort of
lightweight unfolding will be beneficial for subsequent Atomese
program processing, such as finding patterns in a population of
programs and evaluating them on new inputs.
Proposal
So here it goes, for instance given (Plus (Schema "i1") (Schema "i2")), the first level of unfolding could be (using unimplemented FunMapLink)
then links it to the inputs i1 and i2, via using Put, then
applies to the domain r1 to r3. The good thing about this
representation is that it allows to abstract away the features (which
can be better to reason about some patterns), and it also makes it
easier to evaluate it on new inputs, because you only need to change
one place (Domain) by say (NewDomain) to express that simply.
After spotting this work in progress: kasimebrahim/atomspace@371efb5 I would like to have a formal design discussed here. The CondLink has been discussed on and off for ten years now, and it always gets rejected because it is always problematic and has issues. I want to review the issues, here, and have this issue as a place to discuss alternative designs.
I'm not sure but I think the CondLink is being envisioned as a kind of if-the-else link:
so that when the above is executed, the EvaluateableAtom is evaluated first, and if it returns true, then IfSoResultAtom is returned as the execution result, else the ElseResultAtom is returend as the execution result.
The problem that arises is that the ElseResultAtom is not definable, when there are variables. Consider this form:
If the evaluatable clause is true, and we find some $X that matches, then there is no problem. However, if the evaluatable clause is false, then there is no such $X, and so it is impossible to say that X is a cat, because there is no such X. There is no way to get that X. The problem is that if-then-else is a kind of "law of the excluded middle", (see wikipedia) and the law of the excluded middle is well-known to cause these kinds of problems, which is why it is generally rejected in constraint-satisfaction systems, action-planning systems, route-finding systems, and theorem-proving systems. And since the atomspace plus URE is a bit like all of these, combined, we need to reject it as well.
The simplest replacement that I can think of is instead having a pair of BindLinks: instead of writing
The above is what you do when you want to have IfSoResultAtom and ElseResultAtom to be executable. But if you only want truth values, then its much simpler: use SequentialAndLink There are two existing examples for this:
This is the initial plan to get started with the AS-MOSES port. It
only treats the first steps, though does so in details.
Initiate the as-moses repo
AS-MOSES represents a rather major departure from the existing MOSES,
I believe it is best to move it to its own repository. This will
minimize confusions to the user and increase the awareness that MOSES
is transitioning to something different.
Here are the steps involved to seed the as-moses repository with the
old MOSES:
Create a doc/as-moses folder under this repo
Move the root README.md under doc/as-moses
Fork the moses repo by rebasing as-moses onto [email protected]:opencog/moses.git and force-push it to [email protected]:opencog/as-moses.git (I'll temporary disable branch
protection on the master after step 1 and 2 have been carried).
Replace Combo by Atomese
This is a rather big undertaking and will be done
progressively. He start here with 2 tasks
Port fitness evaluation to Atomese
Port Reduct to Atomese For now rely on combo reduct
More tasks like storing the population and meta-population in
AtomSpaces will come next.
Port Fitness Evaluation to Atomese
This task can be decomposed into 3 subtasks
Implement Combo to Atomese converter.
Represent problem data in AtomSpace.
Update the Atomese interpreter to handle problem data. Implement dedicated atomese interpreter.
Add Atomese interpreter in fitness evaluation.
Convert Combo to Atomese
To make Atomese program as compact as possible we will use higher
level operators, that is working with predicates or concepts as
opposed to individuals, let me give some examples.
with 2 input features i1, i2, an output feature o, and 3
observations r1 to r3.
In the boolean domain a possible Combo candidate would be:
(or $i1 $i2)
The corresponding Atomese candidate will look like
(Or (Predicate "i1") (Predicate "i2"))
you may notice that the variables have been replaced by
predicates. That is because Or is not operating on individuals, but
rather on predicates, so here Or actually represent the union of the
satisfying sets of i1 and i2. Doing that allows us not to
generates more compact candidates.
Let's assume that the domain was actually real numbers, not Boolean.
Then a possible Combo candidate would be:
(+ $i1 $i2)
Likewise in Atomese this will be translated into
(Plus (Schema "i1") (Schema "i2"))
Notice that i1 and i2 are schema, not predicates. That is because
since the domain is Real, not Boolean, they can no longer be
predicates. Likewise Plus can be overloaded for schemata, similarly
to how one can add 2 mathematical functions where h = f + g means ForAll x, h(x) = f(x) + g(x)
Let us give another example
(+ $i1 $i2 3)
Its Atomese translation could be
(Plus (Schema "i1") (Schema "i2") 3)
but what is 3? In principle 3 should be a constant function that
returns the number 3 for each input. However I suppose it would be
acceptable to overload Plus so that it can mix schemata with
constants and assume constants are in fact constant functions. So that
(Plus (Schema "i1") (Schema "i2") (Number 3))
would be understood as
(Plus (Schema "i1") (Schema "i2") (Lambda X (Number 3)))
We will see how it goes but I think it's doable.
Note that there is an existing Combo to Atomese converter here
it should not require an atomspace (it should be up to the user to add
it to the atomspace of his/her choice). So rather than using functions
such as AtomSpace::add_link it should use createLink, etc.
Represent Problem Data in AtomSpace
In order to use the Atomese interpreter to measure the fitness of a
program over some data, the data need to be loaded to some
AtomSpace.
Or even more compact considering the Cartesian product over features,
using ListLink, to represent a Cartesian product between functions
over the same domain.
The Atomese interpreter would recognize that Plus is applied to
schemata, gather their associated data, iteratively apply the lower
level operator Plus to the associated numbers, and reconstruct the
result as a mapping from inputs (observations r1 to r3) to outputs
((Number 1), etc).
Replace Combo by Atomese Interpreter in Fitness Evaluation
Due to Atomese Reduction not supported, we can only support Atomese
right after reduction. Basically, intances will be turned into reduced
combo trees, then turned into Atomese program, then fitness
evaluated. Let's recall how the fitness function is being called, in
the hill-climbing algorithm, the call occurs here
that turns the instance into an Atomese program and evaluate its
fitness. Then upgrade fitness functions to support Atomese programs.
To break it down, the subtasks are
Rename complexity_based_scorer to combo_based_scorer
Implement atomese_based_scorer similar to combo_based_scorer, that turns the instance into a reduced
combo_tree, convert it to atomese, then call the fitness on this
atomese program.
Overload bscore_base::operator() (with a default dummy
implementation to allow it to be optional for now). Since instance_scorer.h will start growing, it would be best to create
a instance_scorer.cc and move the implementations there.
Start implementing it for various fitness functions. I
recommand to start with logical_bscore which is probably the
simplest fitness function type.
Test MOSES using atomese_based_scorer instead combo_based_scorer for the implemented fitness function types.
Will be done in the next iteration.
Port Reduct to Atomese
The approach suggested at the moment is to explicitely represent the
result of a reduction as a relationship between Atomese program and
normal form. I think it makes sense to introduce a link type just for
that called ReductLink For instance
To acheive that, ReductLink as well as the operators involved in the
Atomese programs must be axiomatized. Then the URE alone should be
able to perform the reduction.
This work is already under way, so far by Yidne.
If its completion takes too long, once we have a Combo to Atomese
converter and vise versa, we could probably just wrap the existing
Combo reduct engine into a URE rule, and use this as a temporary
replacement for Atomese reduction just so that we can keep going with
the remaining of the port.
This is an idea from @robert-haas on a discord channel.
The original moses created combo trees out of the arithmetic ops (plus, minus, times, divide) the boolean ops (and/or/not) and a few others (greater-than, etc...) The allowed ways in which these can be mutated was hard-coded in an adhoc manner. For example, you can only decorate bools with boolean knobs, and contins with contin knobs, etc.
The current as-moses does the same, except not its atomese and not combo. The knob decoration is still adhoc, hard-coded. If as-moses doesn't know about some op, it can't deal with it (for example -- hyperbolic tangent -- you could hack as-moses to add that, but it would be a hack.The next function to come along would be yet another hack.)
The suggestion is to replace the knob-decoration code with a formal specification of a grammar. There would be a grammar, that defines exactly what a "valid program tree" is. New program trees can be created and mutatated, only if they obey the rules of the grammar.
This would allow as-moses to create and mutate trees in any problem domain, and not just arithmetic+bool.
This is interesting, because some of the problem domains are audio and video, and some of the new kinds of ops include lo-pass filters, high-pass filters, chirp filters, squelch filters, etc. It's hard/awkward to try to boil these down to just arithmetic+bools.
This issue is to complement the Section "Represent Problem Data in AtomSpace" of issue #3 as well as much of what has been discussed in issue #12 (from comment #12 (comment) and on). I prefer to create a separate issue for it rather than continuing growing #12 .
Here's yet another way to represent the data (not saying we should implement it, it's just worth considering)
where CurriedFunMapLink is the same thing as in #11 but instead of taking 2 arguments, a function and a list, it takes a single function and turns it into a function from list of inputs to list of outputs.
CurriedFunMapLink could obviously be decomposed into smaller parts, like a say using a CurryLink and a FunMapLink. But since we have none at this point there is no point bothering. And obviously PutLink could probably be used instead of FunMapLink, as @kasimebrahim would probably have pointed out.
In order to avoid re-evaluating programs and sub-programs multiple
times we want to save the results of their evaluations so that if the
interpreter is called again time it only needs to recall the results
rather that recalculate.
For instance if program candidate P is provided to the interpreter
(Plus
(Schema "i1")
(Schema "i2"))
the first time the interpreter will may evaluate it, but the second
time, even if that time P is only a sub-program of another program P' it may re-use the memorized (cached) values.
then P is a sub-program of P' and shouldn't be re-evaluated.
In order to store the results of these evaluations I suggest to use
the same mechanism for efficiently storing feature values described in
issue #16 . That is upon encountering an atom to evaluate, the first
thing the interpreter would do is to check whether some values are
attached to key
Node "*-AS-MOSES:SchemaValuesKey-*"
if so, then return these values, otherwise proceed with the
calculation.
So note that in order to work well the return C++ type of that atomese
interpreter should be ProtoAtomPtr, not Handle, just like in Instantiator::execute().
So it goes with saying that in such a case the Atomese interpreter
will simply return a proto atom representing the list of values, not
the representation described in the Section Update the Atomese Interpreter
to Handle Problem Data of issue #3 , but at this point of the development it's fine.
I'm packaging asmoses for Debian. It looks moses and asmoses basically can co-exist (things are renamed with prefix "as" in asmoses), but pymoses is still pymoses, not pyasmoses. Since 2 Debian packages can't have files of the same name, I have to set Conflicts: to them. Is it possible to rename pymoses or any reason we can't?
As suggested in issue #3, In order to proceed to porting other subsystems we choose to wrap the existing Combo reduct engine.
Given a program P to be reduced, first we want to convert P to a combo::treeCP and reduce it using reduct to CP_reduced then convert it back to Atomese P_reduced.
Finally we need to store it in ReductLink, ReductLink can be
The above video is by Nil Geisweiller, and the asmoses invocation below is extracted verbatim from the above video, almost near the beginning.
I get the following failure with the unhelpful exception message.
mini-me@virtucon ~/h/s/mud-asmoses (master)> justasmoses -H pre -q 0.1 -W 1 -i data.csv -j4 --output-format scheme -c 100terminate called after throwing an instance of 'opencog::AssertionException' what(): Parsing error occurred on line 1 of input fileException: Expecting boolean value, got (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:269) (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:1148)Aborted (core dumped)error: Recipe `run` failed on line 2 with exit code 134
Any input as to best practices to debug code such as asmoses would be welcome.
I think it's perhaps good to set the project name to AS-MOSES, as well as replacing moses by as-moses in the install paths.
This would allow users to have the old more and as-moses installed side-by-side. Not that it's a problem now but worth considering (especially for the day as-moses starts breaking backward compatibility with moses).
The various representations suggested in issues #3, #12 and #14 are great for
reasoning but not so great for efficient calculations, thus the
following suggestion: Represent column values (i.e. values associated
to each feature) as a list of values living in the atom feature
itself. For instance assume we have table
The values feature f1 would be represented as the list [0,1,0]
attached to f1 via the Atom::setValue method. The key could be
Node "*-AS-MOSES:SchemaValuesKey-*"
and the ProtoAtom value could be
FloatValue if f1 is numerical
LinkValue if f1 is Boolean, in such case TrueLink or FalseLink could be used to represent true and false. An
alternative would be to implement BoolValue that holds directly
boolean C++ values which would be more efficient.
That representation could be obtained directly from a Table or from
the various existing representation. Since reasoning isn't needed yet
it could be fine to obtain it directly from the Table.
An another thing we'll want to support is to represent duplicated rows
in the same manner that CTable does, but that's for another time and
another issue.
In CMakeLists.txt the install path is under include/opencog/asmoses/comboreduct/combo. However, there is no comboreduct directory in the current project structure. Instead it should be changed to combo.
Now, if we fix this install path, we will still will run into an error when an external code tries to include the combo library. If we include the header #include <opencog/asmoses/combo/combo/combo.h> in an external source code, the compiler will complain because in combo.h, the include paths don't map to a correct path as the project files are installed under opencog/asmoses not opencog. We can fix this in two ways:
Install everything under opencog directory and not asmoses
Update the project structure and include another directory named asmoses where all the source files reside. Ofcourse, we will also have to update all the header files to reflect the path change
I can send a PR once we agree which option to take.
This interpreter appears to miss the whole idea of atomese: it already has a built-in interpreter, which is called "atomese" -- you just run the code, you don't need an interpreter to run it.
Anyway, this file can be (and should be) removed, because it is impossible for the interpreter to ever know what atom types are actually available in atomese. -- it might have to deal with e.g. DSP types.
Python 2 is no longer updated or supported.
Would running 2to3 be sufficient to upgrade the Python examples to Python 3, or would backward compatibility with Python 2 need to be retained?
In the atomspace ATOMSPACE_LIBRARY depends on execution and query-engine.
And both depend on ATOMSPACE_LIBRARY but neither of them add ATOMSPACE_LIBRARY as a dependency to avoid cyclic dependency, instead there is this. I dont know how it works but assuming it is equivalent or at least similar to
and according to the comments above it, it is supposed to handle the linking issue but in our case it is not.
Another thing I should mention is that on the dependencies added to ATOMSPACE_LIBRARY specifically lambda and executionhere the comment says they are needed for classerver but the classerver is in atombase. moreover when I try removing them every thing works all tests pass. But the comments say removing them results in screwball failures and I dont think it is just a threat considering it is written very recently.