sfu-cl-lab / factorbase Goto Github PK

The source code repository for the FactorBase system

Home Page: https://sfu-cl-lab.github.io/FactorBase/

Java 99.58% Shell 0.06% Python 0.36%

bayesian-network structure-learning relational-database log-linear-model factor-graphs mysql-database relational-learning big-model markov-logic-network mln

factorbase's People

Contributors

Stargazers

Watchers

Forkers

oschulte dachylong neel6384 sfurmb zhensongqian janyqz

factorbase's Issues

add ability to specify subset of variables

perhaps make reduced setup with subset of functors only

Testing Expansions table

Testing Expansions table with course0 entry only

create String : create table `b_star` as Select `course0_counts`.`MULT`  * `student0_counts`.`MULT`  as `MULT` ,`diff(course0)` , `rating(course0)` , `intelligence(student0)` , `ranking(student0)` , course0.course_id from `course0_counts` , `student0_counts`
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'course0.course_id' in 'field list'
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
	at com.mysql.jdbc.Util.getInstance(Util.java:386)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1053)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4096)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4028)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
	at BayesBaseCT_SortMerge.BuildCT_Rnodes_star(BayesBaseCT_SortMerge.java:1130)
	at BayesBaseCT_SortMerge.CTGenerator(BayesBaseCT_SortMerge.java:359)
	at BayesBaseCT_SortMerge.buildCT(BayesBaseCT_SortMerge.java:107)
	at BayesBaseCT_SortMerge.main(BayesBaseCT_SortMerge.java:57)

compute decision trees for Bayesnet CP table

we have code for this from Yuke and maybe Ramakanth somewhere

should we migrate information from Bayes Base website to github?

I think it would be nice to have the main webpages all come from github rather than here.

http://www.cs.sfu.ca/~oschulte/BayesBase/BayesBase.html

For one thing, it's easier to share the editing. We can still use BayesBase to host bigger files, e.g. datasets.

ETL Repository link

The ETL points to a Wikipedia page. Should we add a repository or tutorial link to this?

Issuer of generation cp table: Data truncation: Division by 0

During the generation of the CP tables, if the ratio of the local multi and total number of multi, there will be an error like Data truncation: Division by 0

Example:

The reason is that the calculation of CP is local_mult/total_mult. If the result is smaller than the minimum number of the table column type( default is float(7,6) ), it will end with 0 Cp. In the next step of calculating likelihood, likelihood = LOG(CP) * local_mult. It will have the problem of log(0), which causes the "Data truncation: Division by 0" issue.

Solution:

Modify the schema of CP_table, by adding more decimal numbers after points like float(7,10) for column CP.

allow longer table names to avoid renaming rnodes (current original rnid vs. rnid)

this would make the laj code simpler and the output more readable

rewrite export tools for decision trees

MLN Generator We have code for this from Ramakanth
Classification Propositionalization
Outlier detection propositionalization

add ability to specify individual grounding

see Sarah's code

clean up BuildCT

Bypass BayesBaseCT_SortMerge.java.

Make stand-alone copy of CTGenerator
In RunBB, call CTGenerator, not BayesBaseCT_SortMerge.java.

Clean up metadata_2 script
- rename as metaqueries
- move metadata creation to front (e.g. RnodesPVars), or separate script
- rename rchain.sql to MoebiusJoin

merge metaqueries with and without ADPVariables
move BN part modelmanager.sql from BuildCT to BBRunner or BBLearner (e.g. PathBayesNets)
create view for final output (maximal Rchain length) #41

Reclaim rnid column (see #43)

replace rnid by short_rnid column in lattice generator

Improve usability of CT generator

make readme for stand-alone use of CT generator without BB learning
drop superfluous tables in CT generator (see #28 )
drop superfluous tables in BB learner (see #28)

Use only one DB for learning, e.g. _BN, put CT tables there?

compute BN metrics

reimplement the following metrics

normalized log-likelihood
number of parameters
maybe AIC or BIC for counts. But we've argued that these need to be normalized.

add ability to specify evidence

i.e. functorid = value

experimenting with different ER diagram and compression ratio?

experimenting with different ER diagram and compression ratio?

Fix Functorset transfer

Once we have resolved #34, the Rnode table no longer depends on running lattice generator.

Move Fnodes creation to setup.sql.
Move Fnodes_pvars creation to setup.sql.
Set Foreign key pointers for Functorset.
Set Foreign key points for Expansion.
Transfer Fnodes restricted to Functorset as NewFnodes. (Be sure to copy Rnodes with 2nodes as required)
Transfer 1nodes etc. restricted to Fnodes

Evaluate dependency network predictions

reimplement the following metrics

use materialized views to propagate edges

this would make the Bayes net learning code much simpler

Integrating Parameter Servers into Model Manager?

Integrating PS(Parameter Servers) into Model Manager to support distributed model learning?

Duplicated setVarsFromConfig()

setVarsFromConfig() has been duplicated and has different interpretation in the following:

MakeSetup.java
BayesBaseCT_SortMerge.java
- setVarsFromConfig()
- setVarsFromConfigForTarget() ->?? Changes the existing variable databaseName_stdto a new meaning
FunctorWrapper.java
and many more ...

[Suggestion] : We need to set these as final variables during a single run.

reclaim rnid from Lattice Generator

LatticeGenerator should have been rewritten so that instead of messing up rnid in Rnodes, it works with a copy shortrnid of rnid. Here's a suggestion.

in LatticeGenerator, replace start by copying Rnodes to a temporary table called LatticeRnodes. E.g.
- create table LatticeRNodes as select * from RNodes .
Replace all references to Rnodes table by LatticeRnodes. The lattice generator is free to mess up the LatticeRnodes.
change the column name lattice_membership.name to lattice_membership.Rchain.
Using the lattice-fix script, change the column names lattice_membership.orig_rnid to lattice_membership.member.rnid and lattice_rel.orig_rnid to lattice_rel.rnid_removed.

porting to Spark?

porting to Spark for very large industrial databases

using SparkSQL Operators only, or combine with SQL queries?

fix up metaqueries script

Make logic clearer, use propagation more cleanly. Almost everything is union of clauses from lower in the hierarchy. Make sure we use counts table without rnode so it's easy to connect with start table.

change setup.sql to include foreign key pointers
change setup.sql to generate a pcolumnid (e.g. id(course0)). (may be unnecessary)
change transfer script to include rnodes for each 2node.
make independent table for finding key column of pvariable (rather than use RnodesPvars)

make single table for each metaquery (e.g. pvid/rnid, Clause_Type, Entries). For various reasons, store meta-information:

 + Lattice_Point (e.g. prof0, rchain).
 + clause_type (e.g. where)
 + table_type (e.g. star)
 + entry tupe (e.g. 1node, aggregate).

Reasons:

    + Support FunctorSets in a different way. 
    + Make flat tables by finding one nodes rather than by making a separate ADT_RNodes_1Nodes table.

make single table for Relationship chain types (Counts, Star, False). propagate from the previous table.
+ 2Nodes do not occur in Star tables
+ Different chain types require different table prefixes and different aggregate selections
add metaquery script for flat tables in Rchain
rewrite CTGenerator to work with ClauseTypes rather than separate tables
add groundings table to setup, add to where clause for pvariables
execute model manager script after ct learning in RunBB
test running metaqueries script for link correlation = 1
rename "metaqueries" to MoebiusJoin

implement exception mining metric

from Sarah's work or just using KLD

fix Contingency Table Generator (CTGenerator)

@vidhiJain The contingency table code is in

https://github.com/sfu-cl-lab/FactorBase/blob/master/src/BayesBaseCT_SortMerge.java. It's called from RunBB.java as follows:

//assumes that dbname is in config file and that dbname_setup exists.

BayesBaseCT_SortMerge.CTGenerator();

So it should not be difficult to just call it by itself. It would be progress if we could do that.

The key procedure is CTGenerator()
This is what builds the CT tables. Unfortunately Zhensong made a version of CTGenerator that is for working with groundings called target. Plus he merged this with the nontarget code. Also he merged it with a copy for the case where we are interested only in a subset of the functor nodes. My suggestion would be this.

Make a new branch.
In the new branch, make a copy of BayesBaseCT_SortMerge.java with all the target and subset stuff removed. I can probably even find an older version without the target stuff. See if we can run it then.
Then we can design a CT generator for groundings and subsets from scratch. I think a key move would be to change CT generator so that it takes as input the setup database rather than treat that as a global variable. Then we can use the CT generator with different (temporary) setup databases.

drop temporary CT tables

could use a general drop procedure

this would save a lot of space
also make the output easier to read

change , to _ in Rchains. See https://github.com/sfu-cl-lab/FactorBase/blob/master/src/lattice/short_rnid_LatticeGenerator.java
try dropping tables. Should work now.

use a system that allows direct use of sort-merge join without our own implementation

makes laj code simpler (just call SQL using the right join option, or optimization option) rather than maintaining our own join code

add ability to specify population of interest

supports #23 and #22

again pretty much done

export to i.id. table for outlier detection

Outlier detection propositionalization

depends on #21

introduce view for output

let edges in biggest Rchain be stored in BN_Structure view

conflicting edges on UW

Sajjad reports that he gets the same edge both required and forbidden on UW. I thought Zhensong and I fixed that? Did we lose the fix?

convert to other formats

e.g. Problog, PSL, Prism, weighted model counting

fix rnids - follow up

get rid of the annoying fake rnids

Make LatticeMember output orig_rnid as well as rnid.
Replace rnid with orig_rnid in metadata script.
Drop rnid.
Make view Mapping where 1-length Rchains map to their LatticeMembers.

write ssh tunnel to bugaboo

Hi Wolfgang,

thank you for changing the timeout settings, that’s great. @ZhenSong: please try running the job again.

As for the ssh, we usually work on Mac and Windows. I looked at https://en.wikipedia.org/wiki/Plink and they said Plink was like ssh. So I tried replacing Plink by ssh and it does seem to work!! Specifically, I said

ssh -L 127.0.0.1:3306:db3:3306 -v -l functor -N bugaboo.westgrid.ca

Then I was able to connect to my local host 127.0.0.1 using Mysql workbench and it shows me my db3 files. The only thing I wasn’t able to run mysql from the command line. But I suspect that may just the difficulty of entering the db3 password manually on the command line.

localhost:~ oschulte1$ mysql -h127.0.0.1 -P 3306 -u functor -p
Enter password:
ERROR 1045 (28000): Access denied for user 'functor'@'172.18.1.0' (using password: YES)
localhost:~ oschulte1$

We will try the JDBC connection using the local port forwarding, if that works then we can run our code locally against db3.

@ZhenSong: can you please try if you can first forward the local port to bugaboo, then run BayesBase pointing it to the local port?

Thank you for your patience!

can you run factorbase?

checking bayes nets: arrows go from relational attributes to rnodes, not the other way

adding indexes and primary keys

We had some trouble adding indexes and primary keys in setup tables (see comments.) According to Zhensong: zqian, max key length limitation "The maximum column size is 767 bytes",
enable "innodb_large_prefix" to allow index key prefixes longer than 767 bytes (up to 3072 bytes).
Oct 17, 2013

backup strategy

archive: make dump, perhaps store on clarinet
infrequent access: store on clarinet mysql (e.g. cross-validation). Or maybe get cs-oschulte03 to use clarinet?
frequent access but not production: move to bugaboo. e.g.

nhl_uai
_bn and _ct databases for _std databases

Other points:

use single production mysql server
always mirror the input data

export to i.i.d. table for classification

Classification Propositionalization

depends on #21

compute AUC
compute CLL
compute number of clauses, length of clauses

Link correlation : metadata_2.sql vs metadata_2_nolink.sql

In class BayesBaseCT_SortMerge.java, the LinkCorrelation variable (previously opt2) is redundant . The execution of metadata_2.sql happens !

sfu-cl-lab / factorbase Goto Github PK

factorbase's People

Contributors

Stargazers

Watchers

Forkers

factorbase's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs