opencog / asmoses Goto Github PK

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search for the AtomSpace (https://github.com/opencog/atomspace)

Home Page: https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search

License: Other

CMake 3.35% Python 1.16% C++ 90.78% C 0.05% Roff 3.55% Shell 0.39% Haskell 0.11% Scheme 0.61%

atomspace machine-learning genetic-algorithm genetic-programming genetic-optimization-algorithm

asmoses's Introduction

MOSES -- Meta-Optimizing Semantic Evolutionary Search

opencog	singnet

MOSES is a machine-learning tool; it is an "evolutionary program learner". It is capable of learning short programs that capture patterns in input datasets. These programs can be output in either the Atomese programming language, or in python. For a given data input, the programs will roughly recreate the dataset on which they were trained.

MOSES has been used in several commercial applications, including the analysis of medical physician and patient clinical data, and in several different financial systems. It is also used by OpenCog to learn automated behaviors, movements and actions in response to perceptual stimulus of artificial-life virtual agents (i.e. pet-dog game avatars). Future plans including using it to learn behavioral programs that control real-world robots, via the OpenPsi implementation of Psi-theory and ROS nodes running on the OpenCog AtomSpace.

The term "evolutionary" means that MOSES uses genetic programming techniques to "evolve" new programs. Each program can be thought of as a tree (similar to a "decision tree", but allowing intermediate nodes to be any programming-language construct). Evolution proceeds by selecting one exemplar tree from a collection of reasonably fit individuals, and then making random alterations to the program tree, in an attempt to find an even fitter (more accurate) program.

It is derived from the ideas forumlated in Moshe Looks' PhD thesis, "Competent Program Evolution", 2006 (Washington University, Missouri). Moshe is also one of the primary authors of this code.

A short example, from begining to end, can be found in this Jupyter notebook (courtesy Robert Haas, for the Mevis plot package.)

License

MOSES is under double license, Apache 2.0 and GNU AGPL 3.

Documentation

Documentation can be found in the /docs directory, which includes a "QuickStart.pdf" that reviews the algorithms and data structures used within MOSES. A detailed man-page can be found in /moses/moses/man/moses.1. There is also a considerable amount of information in the OpenCog wiki: http://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search

Prerequisites

To build and run MOSES, the packages listed below are required. With a few exceptions, most Linux distributions will provide these packages.

boost

C++ utilities package http://www.boost.org/ | libboost-dev

cmake

Build management tool; v2.8 or higher recommended. http://www.cmake.org/ | cmake

cxxtest

Unit test framework http://cxxtest.sourceforge.net/ | https://launchpad.net/~opencog-dev/+archive/ppa | cxxtest

guile

Embedded scheme REPL (version 3.0 or newer is required). http://www.gnu.org/software/guile/guile.html

cogutil

Common OpenCog C++ utilities http://github.com/opencog/cogutil It uses exactly the same build procedure as this package. Be sure to sudo make install at the end.

atomspace

OpenCog Atomspace graph database http://github.com/opencog/atomspace It uses exactly the same build procedure as this package. Be sure to sudo make install at the end.

ure

OpenCog Unified Rule Engine http://github.com/opencog/ure It uses exactly the same build procedure as this package. Be sure to sudo make install at the end.

Optional Prerequisites

The following packages are optional. If they are not installed, some optional parts of MOSES will not be built. The CMake command, during the build, will be more precise as to which parts will not be built.

MPI

Message Passing Interface Required for compute-cluster version of MOSES Use either MPICHV2 or OpenMPI | http://www.open-mpi.org/ | libopenmpi-dev

Building MOSES

Perform the following steps at the shell prompt:

    cd to project root dir
    mkdir build
    cd build
    cmake -DCMAKE_BUILD_TYPE=Release ..
    make

Libraries will be built into subdirectories within build, mirroring the structure of the source directory root. The flag -DCMAKE_BUILD_TYPE=Release results in binaries that are optimized for for performance; ommitting this flag will result in faster builds, but slower executables.

Unit tests

To build and run the unit tests, from the ./build directory enter (after building moses as above):

    make test

Installation

Just say sudo make install after finishing the build.

Examples

Please see the examples directory.

asmoses's People

Contributors

Stargazers

Watchers

asmoses's Issues

Reduct design suggestions.

Copying discussion from opencog/atomspace#2546 and specifically comment opencog/atomspace#2546 (comment)

This is a design suggestion for reduct in as-moses. Basically, it says that you don't need to use the moses reduct any more, it works (should work) in the atmspace. So for example:

The point of FloatValue and ValueOfLink is that you can do stuff like this:

 (Heaviside (Minus
    (ValueOf (Predicate "column A") (Predicate "moses column key"))
    (Plus
         (ValueOf (Predicate "column B") (Predicate "moses column key"))
         (ValueOf (Predicate "column C") (Predicate "moses column key")))))

which should return a column of 0's and 1's, whenever A>B+C row by row. The PlusLink, etc were designed to replace the moses interpreter. They mostly work. Even reduct works .. so for example

(cog-execute!
    (Plus (Number 1) 
        (Times (Variable $x) (Number 2) (Number 0.5))
        Number -1)))

should correctly reduce to exactly just (Variable $x). Many (most??) of the reduct rules are in the atomspace, already. I'll fix whatever bugs you find. I'm not sure I want to volunteer to write new reduct rules, though...

implement an interpreter for atomese programs

The task here is to implement an atomese interpreter program to execute such programs (Plus (Schema "i1") (Schema "i2")) (times (Schema "i1") (Schema "i2")) etc and represent the result.

given an input-table of

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

present the result of

(cog-execute! (Plus "Schema-i1" "Schema-i2")))

in the format

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

Proposed approach

define a plus/times function which takes two arguments (two specified schemas)

(define (plus x y)
  (Put
    (Variable "$R")
    (List
    	;gives the row names in the result set
      	(ExecutionOutput
        	(GroundedSchemaNode "scm:row")
        	(List (Variable "$R"))
      	)
      	(Plus
      		(ExecutionOutputLink
      			(GroundedSchemaNode (string-append "scm:" x))
      			(List (Variable "$R"))
          	)
          	(ExecutionOutputLink
          		(GroundedSchemaNode  (string-append "scm:" y))
          		(List (Variable "$R"))
          	)
      	)
    )
  input-table))

A program that auto generates a program for each feature variables.

(define (featureVariables problemData)
    ....)

The above program is expected to generate programs like

`(define (schema-1 atom)
	...)

(define (schema-2 atom)
	...)

(define (schema-3 atom)
	...)`

which return the specified schema value for each row.
an attempt has been made by defining a function for the number of featurevariables. Both approaches don't seem to be efficient though.

problems with the proposed approach
The program can handle any two schemas but not three or four. For example
(cog-execute! (Plus "Schema-i1" "Schema-i2" "Schema-i3")))
can not be handled.

Questions
Is there any way to define a function with unknown number of arguments?
Are programs like (cog-execute! (Plus "Schema-i1" (Times "Schema-i2" "Schema-i3"))) are expected to work?

Splitting out the deme management part of the code into its own module?

I'm in the process of writing a research proposal that would need to have some kind of deme management system, similar to what ASMOSES does. This is all very much up in the air, and totally unclear, but ...

If it ever occurs to you that the deme management part of this system could be split out into it's own module, that would be a good thing. This is low priority and not urgent.

The mention of ASMOSES for deme management is here: https://github.com/opencog/learn/blob/master/README-Vision.md

Some equivalent Atomese program representations for manipulation and evaluation

Overview

This issue contains some considerations regarding various ways Atomese
programs could be represented, manipulated and evaluated.

Warning: it is not a plan for immediate actions, just some considerations.

Motivation

As suggested by @Bitseat in order to avoid hacking too much the
atomese interpreter
https://github.com/opencog/atomspace/blob/master/opencog/atoms/execution/Instantiator.h#L141
an option would be to unfold an Atomese program to be readily
interpretable by Instantiator::execute.

For instance given the data set represented as

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

and the combo program

(Plus (Schema "i1") (Schema "i2"))

It could be unfolded into

(Set
  (List (Node "r1") (Plus (Number 0) (Number 1)))
  (List (Node "r2") (Plus (Number 1) (Number 0)))
  (List (Node "r3") (Plus (Number 0) (Number 0))))

which passed to the Atomese interpreter would return the desired
result

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

However I'm thinking we can probably take a middle ground approach
where the unfolding would be much lighter and wouldn't involve hacking
the interpreter so that Plus, etc would support higher level inputs
(which ultimately is probably fine and desired, but since we are in an
exploratory stage we want to avoid too much potentially unnecessary
and complicated hacking). Also, I suspect that this sort of
lightweight unfolding will be beneficial for subsequent Atomese
program processing, such as finding patterns in a population of
programs and evaluating them on new inputs.

Proposal

So here it goes, for instance given (Plus (Schema "i1") (Schema "i2")), the first level of unfolding could be (using unimplemented FunMapLink)

(FunMap
  (List
    (Variable "$R")
    (Lambda
      (Variable "$R")
      (Plus
        (ExecutionOutput
          (Schema "f1")
          (Variable "$R"))
        (ExecutionOutput
          (Schema "f2")
          (Variable "$R")))))
  (Domain))

where FunMap is to be distinguished from
http://wiki.opencog.org/w/MapLink as it doesn't assume that its first
argument is a pattern but rather a function, and thus has the same
semantics as
https://hackage.haskell.org/package/base-4.11.1.0/docs/Prelude.html#v:map
or in scheme
https://srfi.schemers.org/srfi-1/srfi-1.html#FoldUnfoldMap

And Domain is just something that retrieves the row names, r1 to
r3, and should probably be written

(Domain (List (Schema "f1") (Schema "f2")))

but is just written (Domain) here for simplicity.

So written in a more casual functional program style it would be

(map (lambda (r) (cons r (+ (f1 r) (f2 r)))) (domain))

Alternatively, as suggested by @kasimebrahim, one could use PutLink

(Put
  (Variable "$R")
  (List
    (Variable "$R")
    (Put
      (Lambda
        (Variable "$R")
        (Plus
          (ExecutionOutput
            (Schema "f1")
            (Variable "$R"))
          (ExecutionOutput
            (Schema "f2")
            (Variable "$R"))))
      (Variable "$R")))
  (Domain))

The next unfolding, which is probably the most interesting is

(FunMap
  (List
    (Variable "$R")
    (Put
      (Lambda
        (VariableList
          (Variable "$X")
          (Variable "$Y"))
        (Plus
          (Variable "$X")
          (Variable "$Y")))
      (Lambda
        (Variable "$R")
        (List
          (Schema "f1")
          (Schema "f2"))))
  (Domain)))

because it exposes the heart of the program

      (Lambda
        (VariableList
          (Variable "$X")
          (Variable "$Y"))
        (Plus
          (Variable "$X")
          (Variable "$Y")))

then links it to the inputs i1 and i2, via using Put, then
applies to the domain r1 to r3. The good thing about this
representation is that it allows to abstract away the features (which
can be better to reason about some patterns), and it also makes it
easier to evaluate it on new inputs, because you only need to change
one place (Domain) by say (NewDomain) to express that simply.

CondLink design

After spotting this work in progress: kasimebrahim/atomspace@371efb5 I would like to have a formal design discussed here. The CondLink has been discussed on and off for ten years now, and it always gets rejected because it is always problematic and has issues. I want to review the issues, here, and have this issue as a place to discuss alternative designs.

I'm not sure but I think the CondLink is being envisioned as a kind of if-the-else link:

IfThenElseLink
      EvaluateableAtom
      IfSoResultAtom
      ElseResultAtom

so that when the above is executed, the EvaluateableAtom is evaluated first, and if it returns true, then IfSoResultAtom is returned as the execution result, else the ElseResultAtom is returend as the execution result.

The problem that arises is that the ElseResultAtom is not definable, when there are variables. Consider this form:

IfThenElseLink
       (PresentLink (Evaluation (Predicate "its a dog") (Variable $X) (Concept "fido"))
       (Inheritance (Variable $X) (Concept "dog"))
       (Inheritance (Variable $X) (Concept "cat"))

If the evaluatable clause is true, and we find some $X that matches, then there is no problem. However, if the evaluatable clause is false, then there is no such $X, and so it is impossible to say that X is a cat, because there is no such X. There is no way to get that X. The problem is that if-then-else is a kind of "law of the excluded middle", (see wikipedia) and the law of the excluded middle is well-known to cause these kinds of problems, which is why it is generally rejected in constraint-satisfaction systems, action-planning systems, route-finding systems, and theorem-proving systems. And since the atomspace plus URE is a bit like all of these, combined, we need to reject it as well.

The simplest replacement that I can think of is instead having a pair of BindLinks: instead of writing

IfThenElseLink
      EvaluateableAtom
      IfSoResultAtom
      ElseResultAtom

write

Bind
     EvaluateableAtom
     IfSoResultAtom

and

Bind
     (Not (EvaluateableAtom))
     ElseResultAtom

The above is what you do when you want to have IfSoResultAtom and ElseResultAtom to be executable. But if you only want truth values, then its much simpler: use SequentialAndLink There are two existing examples for this:

examples/pattern-matcher/sequence.scm

and

examples/pattern-matcher/condition.scm

Add oc_to_string to help debugging

oc_to_string is defined in the atomspace and opencog repositories to pretty print OpenCog C++ data structures like Handle in a way that is easy to call withing a debugger like gdb. See for instance https://github.com/opencog/atomspace/blob/master/opencog/atoms/base/Handle.h#L278

It would be convenient if oc_to_string was overloaded for MOSES data structures as well.

AtomSpace MOSES Port (part I)

This is the initial plan to get started with the AS-MOSES port. It
only treats the first steps, though does so in details.

Initiate the as-moses repo

AS-MOSES represents a rather major departure from the existing MOSES,
I believe it is best to move it to its own repository. This will
minimize confusions to the user and increase the awareness that MOSES
is transitioning to something different.

Here are the steps involved to seed the as-moses repository with the
old MOSES:

Create a doc/as-moses folder under this repo
Move the root README.md under doc/as-moses
Fork the moses repo by rebasing as-moses onto
[email protected]:opencog/moses.git and force-push it to
[email protected]:opencog/as-moses.git (I'll temporary disable branch
protection on the master after step 1 and 2 have been carried).

Replace Combo by Atomese

This is a rather big undertaking and will be done
progressively. He start here with 2 tasks

Port fitness evaluation to Atomese
~~Port Reduct to Atomese~~ For now rely on combo reduct

More tasks like storing the population and meta-population in
AtomSpaces will come next.

Port Fitness Evaluation to Atomese

This task can be decomposed into 3 subtasks

Implement Combo to Atomese converter.
Represent problem data in AtomSpace.
~~Update the Atomese interpreter to handle problem data.~~ Implement dedicated atomese interpreter.
Add Atomese interpreter in fitness evaluation.

Convert Combo to Atomese

To make Atomese program as compact as possible we will use higher
level operators, that is working with predicates or concepts as
opposed to individuals, let me give some examples.

Let's assume the following data to fit

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

with 2 input features i1, i2, an output feature o, and 3
observations r1 to r3.

In the boolean domain a possible Combo candidate would be:

(or $i1 $i2)

The corresponding Atomese candidate will look like

(Or (Predicate "i1") (Predicate "i2"))

you may notice that the variables have been replaced by
predicates. That is because Or is not operating on individuals, but
rather on predicates, so here Or actually represent the union of the
satisfying sets of i1 and i2. Doing that allows us not to
generates more compact candidates.

Let's assume that the domain was actually real numbers, not Boolean.
Then a possible Combo candidate would be:

(+ $i1 $i2)

Likewise in Atomese this will be translated into

(Plus (Schema "i1") (Schema "i2"))

Notice that i1 and i2 are schema, not predicates. That is because
since the domain is Real, not Boolean, they can no longer be
predicates. Likewise Plus can be overloaded for schemata, similarly
to how one can add 2 mathematical functions where h = f + g means
ForAll x, h(x) = f(x) + g(x)

Let us give another example

(+ $i1 $i2 3)

Its Atomese translation could be

(Plus (Schema "i1") (Schema "i2") 3)

but what is 3? In principle 3 should be a constant function that
returns the number 3 for each input. However I suppose it would be
acceptable to overload Plus so that it can mix schemata with
constants and assume constants are in fact constant functions. So that

(Plus (Schema "i1") (Schema "i2") (Number 3))

would be understood as

(Plus (Schema "i1") (Schema "i2") (Lambda X (Number 3)))

We will see how it goes but I think it's doable.

Note that there is an existing Combo to Atomese converter here

https://github.com/opencog/moses/blob/master/moses/comboreduct/main/combo-fmt-converter.cc

but it is extremely limited and outputs strings, while what we want is
something that builds atoms directly.

The code can probably be added in a converter folder of

https://github.com/opencog/moses/tree/master/moses/comboreduct

it should not require an atomspace (it should be up to the user to add
it to the atomspace of his/her choice). So rather than using functions
such as AtomSpace::add_link it should use createLink, etc.

Represent Problem Data in AtomSpace

In order to use the Atomese interpreter to measure the fitness of a
program over some data, the data need to be loaded to some
AtomSpace.

Let us reconsider the example data

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

obtained from the CSV file

i1,i2,o
0,1,1
1,0,1
0,0,0

over the Boolean domain for now.

We need to tell how instances/observations relate to features. Here is
how it could be done. For instance r1 could be represented as

(Evaluation (stv 0 1)
  (Predicate "i1")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "i2")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "o")
  (Node "r1"))

Assuming the domain of data is Real instead of Boolean, then
Execution must be used instead of Evaluation. For instance r1
would be represented as

(Execution
  (Schema "i1")
  (Node "r1")
  (Number 0))
(Execution
  (Schema "i2")
  (Node "r1")
  (Number 1))
(Execution
  (Schema "o")
  (Node "r1")
  (Number 1))

This can be made more compact by considering that a function, say f,
is a set of pairs (x, f(x)). For instance i1 can be described with

(Similarity (stv 1 1)
  (Schema "i1")
  (Set
    (List (Node "r1") (Number 0))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))))

Or even more compact considering the Cartesian product over features,
using ListLink, to represent a Cartesian product between functions
over the same domain.

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

I recommend to use the last representation as it is more compact and I
suspect might actually be easier for the interpreter to process.

Update the Atomese Interpreter to Handle Problem Data

Assuming the AtomSpace is loaded with the table above, how to evaluate

(Plus (Schema "i1") (Schema "i2"))

Such schema i1+i2 is expected to be represented as

(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r1")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r2")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r3")
  (Number 0))

Or equivalently, seeing a function as a set of input/output pairs

(Similarity (stv 1 1)
  (Plus (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (Number 1))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))

However when invoking the cog-execute! on

(Plus (Schema "i1") (Schema "i2"))

the result could simply be

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

The Atomese interpreter would recognize that Plus is applied to
schemata, gather their associated data, iteratively apply the lower
level operator Plus to the associated numbers, and reconstruct the
result as a mapping from inputs (observations r1 to r3) to outputs
((Number 1), etc).

Replace Combo by Atomese Interpreter in Fitness Evaluation

Due to Atomese Reduction not supported, we can only support Atomese
right after reduction. Basically, intances will be turned into reduced
combo trees, then turned into Atomese program, then fitness
evaluated. Let's recall how the fitness function is being called, in
the hill-climbing algorithm, the call occurs here

https://github.com/opencog/moses/blob/master/moses/moses/optimization/hill-climbing.cc#L234

which is an iscorer (for instance scorer), so we want to implement a
new class inheriting iscorer_base

https://github.com/opencog/moses/blob/master/moses/moses/representation/instance_scorer.h#L35

that turns the instance into an Atomese program and evaluate its
fitness. Then upgrade fitness functions to support Atomese programs.

To break it down, the subtasks are

Rename complexity_based_scorer to combo_based_scorer
Implement atomese_based_scorer similar to
combo_based_scorer, that turns the instance into a reduced
combo_tree, convert it to atomese, then call the fitness on this
atomese program.
Overload bscore_base::operator() (with a default dummy
implementation to allow it to be optional for now). Since
instance_scorer.h will start growing, it would be best to create
a instance_scorer.cc and move the implementations there.
Start implementing it for various fitness functions. I
recommand to start with logical_bscore which is probably the
simplest fitness function type.
Test MOSES using atomese_based_scorer instead
combo_based_scorer for the implemented fitness function types.
Will be done in the next iteration.

Port Reduct to Atomese

The approach suggested at the moment is to explicitely represent the
result of a reduction as a relationship between Atomese program and
normal form. I think it makes sense to introduce a link type just for
that called ReductLink For instance

(ReductLink 
  (And
    (Predicate "f1")
    (Predicate "f1"))
  (Predicate "f1")))

would represent that f1 and f1 reduces to f1.

To acheive that, ReductLink as well as the operators involved in the
Atomese programs must be axiomatized. Then the URE alone should be
able to perform the reduction.

This work is already under way, so far by Yidne.

If its completion takes too long, once we have a Combo to Atomese
converter and vise versa, we could probably just wrap the existing
Combo reduct engine into a URE rule, and use this as a temporary
replacement for Atomese reduction just so that we can keep going with
the remaining of the port.

Loading data to AtomSpace

Question: which approach would be preferred?:

load csv directly into atomese
load the data into a table, which can then be converted to atomese.

Grammar-guided program evolution

This is an idea from @robert-haas on a discord channel.

The original moses created combo trees out of the arithmetic ops (plus, minus, times, divide) the boolean ops (and/or/not) and a few others (greater-than, etc...) The allowed ways in which these can be mutated was hard-coded in an adhoc manner. For example, you can only decorate bools with boolean knobs, and contins with contin knobs, etc.

The current as-moses does the same, except not its atomese and not combo. The knob decoration is still adhoc, hard-coded. If as-moses doesn't know about some op, it can't deal with it (for example -- hyperbolic tangent -- you could hack as-moses to add that, but it would be a hack.The next function to come along would be yet another hack.)

The suggestion is to replace the knob-decoration code with a formal specification of a grammar. There would be a grammar, that defines exactly what a "valid program tree" is. New program trees can be created and mutatated, only if they obey the rules of the grammar.

This would allow as-moses to create and mutate trees in any problem domain, and not just arithmetic+bool.

This is interesting, because some of the problem domains are audio and video, and some of the new kinds of ops include lo-pass filters, high-pass filters, chirp filters, squelch filters, etc. It's hard/awkward to try to boil these down to just arithmetic+bools.

Data (like csv file content) representations in Atomese

This issue is to complement the Section "Represent Problem Data in AtomSpace" of issue #3 as well as much of what has been discussed in issue #12 (from comment #12 (comment) and on). I prefer to create a separate issue for it rather than continuing growing #12 .

Here's yet another way to represent the data (not saying we should implement it, it's just worth considering)

(Execution
  (CurriedFunMapLink (Schema "f"))
  (List
    (Node "r1")
    ...
    (Node "rn"))
  (List
    (Number 1)
    ...
    (Number 0))))

where CurriedFunMapLink is the same thing as in #11 but instead of taking 2 arguments, a function and a list, it takes a single function and turns it into a function from list of inputs to list of outputs.

CurriedFunMapLink could obviously be decomposed into smaller parts, like a say using a CurryLink and a FunMapLink. But since we have none at this point there is no point bothering. And obviously PutLink could probably be used instead of FunMapLink, as @kasimebrahim would probably have pointed out.

Subprogram Memoizer

In order to avoid re-evaluating programs and sub-programs multiple
times we want to save the results of their evaluations so that if the
interpreter is called again time it only needs to recall the results
rather that recalculate.

For instance if program candidate P is provided to the interpreter

(Plus
  (Schema "i1")
  (Schema "i2"))

the first time the interpreter will may evaluate it, but the second
time, even if that time P is only a sub-program of another program
P' it may re-use the memorized (cached) values.

For instance if program candidate P' is

(Times
  (Plus
    (Schema "i1")
    (Schema "i2"))
  (Schema "i3"))

then P is a sub-program of P' and shouldn't be re-evaluated.

In order to store the results of these evaluations I suggest to use
the same mechanism for efficiently storing feature values described in
issue #16 . That is upon encountering an atom to evaluate, the first
thing the interpreter would do is to check whether some values are
attached to key

Node "*-AS-MOSES:SchemaValuesKey-*"

if so, then return these values, otherwise proceed with the
calculation.

So note that in order to work well the return C++ type of that atomese
interpreter should be ProtoAtomPtr, not Handle, just like in
Instantiator::execute().

So it goes with saying that in such a case the Atomese interpreter
will simply return a proto atom representing the list of values, not
the representation described in the Section Update the Atomese Interpreter
to Handle Problem Data of issue #3 , but at this point of the development it's fine.

Failing unit tests in Ubuntu 22.04

Some old and rock-solid unit tests are failing in Ubuntu 22.04 for no discerneble reason. I don't have time to debug this. See #105 for details.

pymoses.so -> pyasmoses.so?

I'm packaging asmoses for Debian. It looks moses and asmoses basically can co-exist (things are renamed with prefix "as" in asmoses), but pymoses is still pymoses, not pyasmoses. Since 2 Debian packages can't have files of the same name, I have to set Conflicts: to them. Is it possible to rename pymoses or any reason we can't?

Port Reduct

As suggested in issue #3, In order to proceed to porting other subsystems we choose to wrap the existing Combo reduct engine.
Given a program P to be reduced, first we want to convert P to a combo::tree CP and reduce it using reduct to CP_reduced then convert it back to Atomese P_reduced.
Finally we need to store it in ReductLink, ReductLink can be

 REDUCT_LINK <- ORDERED_LINK "ReductLink"

(ReductLink
    (Schema "Rule")
    (Schema "P")
    (Schema "P_reduced"))

or just

(ReductLink
    (Schema "P")
    (Schema "P_reduced"))

File tokenizer not respecting commas in CSV files

I have been following the only "tutorial" I know of concerning (as)moses:

https://www.youtube.com/watch?v=LAIogkvxyMA

The above video is by Nil Geisweiller, and the asmoses invocation below is extracted verbatim from the above video, almost near the beginning.

I get the following failure with the unhelpful exception message.

mini-me@virtucon ~/h/s/mud-asmoses (master)> just
asmoses -H pre -q 0.1 -W 1 -i data.csv -j4 --output-format scheme -c 100
terminate called after throwing an instance of 'opencog::AssertionException'
  what():  Parsing error occurred on line 1 of input file
Exception: Expecting boolean value, got  (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:269) (/home/mini-me/home/cellar/asmoses/opencog/asmoses/data/table/table_io.cc:1148)
Aborted (core dumped)
error: Recipe `run` failed on line 2 with exit code 134

Any input as to best practices to debug code such as asmoses would be welcome.

Replace MOSES by AS-MOSES in CMakeLists.txt

I think it's perhaps good to set the project name to AS-MOSES, as well as replacing moses by as-moses in the install paths.

This would allow users to have the old more and as-moses installed side-by-side. Not that it's a problem now but worth considering (especially for the day as-moses starts breaking backward compatibility with moses).

What do you think?

Efficient Table Representation

The various representations suggested in issues #3, #12 and #14 are great for
reasoning but not so great for efficient calculations, thus the
following suggestion: Represent column values (i.e. values associated
to each feature) as a list of values living in the atom feature
itself. For instance assume we have table

+--+--+--+
|o |f1|f2|
+--+--+--+
|1 |0 |1 |
+--+--+--+
|1 |1 |0 |
+--+--+--+
|0 |0 |0 |
+--+--+--+

The values feature f1 would be represented as the list [0,1,0]
attached to f1 via the Atom::setValue method. The key could be

Node "*-AS-MOSES:SchemaValuesKey-*"

and the ProtoAtom value could be

FloatValue if f1 is numerical
LinkValue if f1 is Boolean, in such case TrueLink or
FalseLink could be used to represent true and false. An
alternative would be to implement BoolValue that holds directly
boolean C++ values which would be more efficient.

That representation could be obtained directly from a Table or from
the various existing representation. Since reasoning isn't needed yet
it could be fine to obtain it directly from the Table.

An another thing we'll want to support is to represent duplicated rows
in the same manner that CTable does, but that's for another time and
another issue.

Fix incorrect install paths

In CMakeLists.txt the install path is under include/opencog/asmoses/comboreduct/combo. However, there is no comboreduct directory in the current project structure. Instead it should be changed to combo.

Now, if we fix this install path, we will still will run into an error when an external code tries to include the combo library. If we include the header #include <opencog/asmoses/combo/combo/combo.h> in an external source code, the compiler will complain because in combo.h, the include paths don't map to a correct path as the project files are installed under opencog/asmoses not opencog. We can fix this in two ways:

Install everything under opencog directory and not asmoses
Update the project structure and include another directory named asmoses where all the source files reside. Ofcourse, we will also have to update all the header files to reflect the path change

I can send a PR once we agree which option to take.

Atomese interpreter not needed!?

During a code review, I discovered this code:

https://github.com/opencog/asmoses/blob/master/opencog/asmoses/atomese/interpreter/Interpreter.cc

This interpreter appears to miss the whole idea of atomese: it already has a built-in interpreter, which is called "atomese" -- you just run the code, you don't need an interpreter to run it.

Anyway, this file can be (and should be) removed, because it is impossible for the interpreter to ever know what atom types are actually available in atomese. -- it might have to deal with e.g. DSP types.

Python 3 support

Python 2 is no longer updated or supported.
Would running 2to3 be sufficient to upgrade the Python examples to Python 3, or would backward compatibility with Python 2 need to be retained?

libquery-engine.so: undefined reference to opencog::AtomSpace

In the atomspace ATOMSPACE_LIBRARY depends on execution and query-engine.
And both depend on ATOMSPACE_LIBRARY but neither of them add ATOMSPACE_LIBRARY as a dependency to avoid cyclic dependency, instead there is this. I dont know how it works but assuming it is equivalent or at least similar to

add_library(_atomspace IMPORTED)
set_property(TARGET _atomspace PROPERTY IMPORTED_LOCATION 'path')
target_link_libraries(execute _atomspace)
target_link_libraries(query-engine _atomspace)

and according to the comments above it, it is supposed to handle the linking issue but in our case it is not.

Another thing I should mention is that on the dependencies added to ATOMSPACE_LIBRARY specifically lambda and execution here the comment says they are needed for classerver but the classerver is in atombase. moreover when I try removing them every thing works all tests pass. But the comments say removing them results in screwball failures and I dont think it is just a threat considering it is written very recently.