GithubHelp home page GithubHelp logo

sfu-cl-lab / factorbase Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 6.0 212.51 MB

The source code repository for the FactorBase system

Home Page: https://sfu-cl-lab.github.io/FactorBase/

Java 99.58% Shell 0.06% Python 0.36%
bayesian-network structure-learning relational-database log-linear-model factor-graphs mysql-database relational-learning big-model markov-logic-network mln

factorbase's People

Contributors

dependabot[bot] avatar greedcat avatar janyqz avatar oschulte avatar parmisnaddaf avatar rmar3a avatar vidhijain avatar woodsouths avatar zeruniverse avatar zhensongqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

factorbase's Issues

Testing Expansions table

Testing Expansions table with course0 entry only

create String : create table `b_star` as Select `course0_counts`.`MULT`  * `student0_counts`.`MULT`  as `MULT` ,`diff(course0)` , `rating(course0)` , `intelligence(student0)` , `ranking(student0)` , course0.course_id from `course0_counts` , `student0_counts`
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'course0.course_id' in 'field list'
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
	at com.mysql.jdbc.Util.getInstance(Util.java:386)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1053)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4096)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4028)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
	at BayesBaseCT_SortMerge.BuildCT_Rnodes_star(BayesBaseCT_SortMerge.java:1130)
	at BayesBaseCT_SortMerge.CTGenerator(BayesBaseCT_SortMerge.java:359)
	at BayesBaseCT_SortMerge.buildCT(BayesBaseCT_SortMerge.java:107)
	at BayesBaseCT_SortMerge.main(BayesBaseCT_SortMerge.java:57)

ETL Repository link

The ETL points to a Wikipedia page. Should we add a repository or tutorial link to this?

Issuer of generation cp table: Data truncation: Division by 0

During the generation of the CP tables, if the ratio of the local multi and total number of multi, there will be an error like Data truncation: Division by 0

Example:
image

The reason is that the calculation of CP is local_mult/total_mult. If the result is smaller than the minimum number of the table column type( default is float(7,6) ), it will end with 0 Cp. In the next step of calculating likelihood, likelihood = LOG(CP) * local_mult. It will have the problem of log(0), which causes the "Data truncation: Division by 0" issue.

Solution:

Modify the schema of CP_table, by adding more decimal numbers after points like float(7,10) for column CP.

clean up BuildCT

  • Bypass BayesBaseCT_SortMerge.java.
  • Make stand-alone copy of CTGenerator
  • In RunBB, call CTGenerator, not BayesBaseCT_SortMerge.java.
  • Clean up metadata_2 script
    • rename as metaqueries
    • move metadata creation to front (e.g. RnodesPVars), or separate script
    • rename rchain.sql to MoebiusJoin
  • merge metaqueries with and without ADPVariables
  • move BN part modelmanager.sql from BuildCT to BBRunner or BBLearner (e.g. PathBayesNets)
  • create view for final output (maximal Rchain length) #41
  • Reclaim rnid column (see #43)
  • replace rnid by short_rnid column in lattice generator
  • Improve usability of CT generator
  • make readme for stand-alone use of CT generator without BB learning
  • drop superfluous tables in CT generator (see #28 )
  • drop superfluous tables in BB learner (see #28)
  • Use only one DB for learning, e.g. _BN, put CT tables there?

compute BN metrics

reimplement the following metrics

  • normalized log-likelihood
  • number of parameters
  • maybe AIC or BIC for counts. But we've argued that these need to be normalized.

Fix Functorset transfer

Once we have resolved #34, the Rnode table no longer depends on running lattice generator.

  1. Move Fnodes creation to setup.sql.
  2. Move Fnodes_pvars creation to setup.sql.
  3. Set Foreign key pointers for Functorset.
  4. Set Foreign key points for Expansion.
  5. Transfer Fnodes restricted to Functorset as NewFnodes. (Be sure to copy Rnodes with 2nodes as required)
  6. Transfer 1nodes etc. restricted to Fnodes

Duplicated setVarsFromConfig()

setVarsFromConfig() has been duplicated and has different interpretation in the following:

  • MakeSetup.java
  • BayesBaseCT_SortMerge.java
    • setVarsFromConfig()
    • setVarsFromConfigForTarget() ->?? Changes the existing variable databaseName_stdto a new meaning
  • FunctorWrapper.java
  • and many more ...

[Suggestion] : We need to set these as final variables during a single run.

reclaim rnid from Lattice Generator

LatticeGenerator should have been rewritten so that instead of messing up rnid in Rnodes, it works with a copy shortrnid of rnid. Here's a suggestion.

  1. in LatticeGenerator, replace start by copying Rnodes to a temporary table called LatticeRnodes. E.g.
    • create table LatticeRNodes as select * from RNodes .
  2. Replace all references to Rnodes table by LatticeRnodes. The lattice generator is free to mess up the LatticeRnodes.
  3. change the column name lattice_membership.name to lattice_membership.Rchain.
  4. Using the lattice-fix script, change the column names lattice_membership.orig_rnid to lattice_membership.member.rnid and lattice_rel.orig_rnid to lattice_rel.rnid_removed.

porting to Spark?

porting to Spark for very large industrial databases

using SparkSQL Operators only, or combine with SQL queries?

fix up metaqueries script

Make logic clearer, use propagation more cleanly. Almost everything is union of clauses from lower in the hierarchy. Make sure we use counts table without rnode so it's easy to connect with start table.

  • change setup.sql to include foreign key pointers

  • change setup.sql to generate a pcolumnid (e.g. id(course0)). (may be unnecessary)

  • change transfer script to include rnodes for each 2node.

  • make independent table for finding key column of pvariable (rather than use RnodesPvars)

  • make single table for each metaquery (e.g. pvid/rnid, Clause_Type, Entries). For various reasons, store meta-information:

     + Lattice_Point (e.g. prof0, rchain).
     + clause_type (e.g. where)
     + table_type (e.g. star)
     + entry tupe (e.g. 1node, aggregate). 
    

Reasons:

    + Support FunctorSets in a different way. 
    + Make flat tables by finding one nodes rather than by making a separate ADT_RNodes_1Nodes table.
  • make single table for Relationship chain types (Counts, Star, False). propagate from the previous table.
    + 2Nodes do not occur in Star tables
    + Different chain types require different table prefixes and different aggregate selections
  • add metaquery script for flat tables in Rchain
  • rewrite CTGenerator to work with ClauseTypes rather than separate tables
  • add groundings table to setup, add to where clause for pvariables
  • execute model manager script after ct learning in RunBB
  • test running metaqueries script for link correlation = 1
  • rename "metaqueries" to MoebiusJoin

fix Contingency Table Generator (CTGenerator)

@vidhiJain The contingency table code is in

https://github.com/sfu-cl-lab/FactorBase/blob/master/src/BayesBaseCT_SortMerge.java. It's called from RunBB.java as follows:

//assumes that dbname is in config file and that dbname_setup exists.

BayesBaseCT_SortMerge.CTGenerator();

  1. So it should not be difficult to just call it by itself. It would be progress if we could do that.

The key procedure is CTGenerator()
This is what builds the CT tables. Unfortunately Zhensong made a version of CTGenerator that is for working with groundings called target. Plus he merged this with the nontarget code. Also he merged it with a copy for the case where we are interested only in a subset of the functor nodes. My suggestion would be this.

  1. Make a new branch.

  2. In the new branch, make a copy of BayesBaseCT_SortMerge.java with all the target and subset stuff removed. I can probably even find an older version without the target stuff. See if we can run it then.

  3. Then we can design a CT generator for groundings and subsets from scratch. I think a key move would be to change CT generator so that it takes as input the setup database rather than treat that as a global variable. Then we can use the CT generator with different (temporary) setup databases.

conflicting edges on UW

Sajjad reports that he gets the same edge both required and forbidden on UW. I thought Zhensong and I fixed that? Did we lose the fix?

fix rnids - follow up

get rid of the annoying fake rnids

  1. Make LatticeMember output orig_rnid as well as rnid.
  2. Replace rnid with orig_rnid in metadata script.
  3. Drop rnid.
  4. Make view Mapping where 1-length Rchains map to their LatticeMembers.

write ssh tunnel to bugaboo

Hi Wolfgang,

thank you for changing the timeout settings, that’s great. @ZhenSong: please try running the job again.

As for the ssh, we usually work on Mac and Windows. I looked at https://en.wikipedia.org/wiki/Plink and they said Plink was like ssh. So I tried replacing Plink by ssh and it does seem to work!! Specifically, I said

ssh -L 127.0.0.1:3306:db3:3306 -v -l functor -N bugaboo.westgrid.ca

Then I was able to connect to my local host 127.0.0.1 using Mysql workbench and it shows me my db3 files. The only thing I wasn’t able to run mysql from the command line. But I suspect that may just the difficulty of entering the db3 password manually on the command line.

localhost:~ oschulte1$ mysql -h127.0.0.1 -P 3306 -u functor -p
Enter password:
ERROR 1045 (28000): Access denied for user 'functor'@'172.18.1.0' (using password: YES)
localhost:~ oschulte1$

We will try the JDBC connection using the local port forwarding, if that works then we can run our code locally against db3.

@ZhenSong: can you please try if you can first forward the local port to bugaboo, then run BayesBase pointing it to the local port?

Thank you for your patience!

adding indexes and primary keys

We had some trouble adding indexes and primary keys in setup tables (see comments.) According to Zhensong: zqian, max key length limitation "The maximum column size is 767 bytes",
enable "innodb_large_prefix" to allow index key prefixes longer than 767 bytes (up to 3072 bytes).
Oct 17, 2013

backup strategy

  1. archive: make dump, perhaps store on clarinet

  2. infrequent access: store on clarinet mysql (e.g. cross-validation). Or maybe get cs-oschulte03 to use clarinet?

  3. frequent access but not production: move to bugaboo. e.g.

  • nhl_uai
  • _bn and _ct databases for _std databases

Other points:

  • use single production mysql server
  • always mirror the input data

add MLN evaluation tools

We had a number of tools for evaluating a generated MLN. May have to collect these from old Bayesbase and from Yuke's code and documentation.

  • compute AUC
  • compute CLL
  • compute number of clauses, length of clauses

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.