sfu-cl-lab / factorbase Goto Github PK
View Code? Open in Web Editor NEWThe source code repository for the FactorBase system
Home Page: https://sfu-cl-lab.github.io/FactorBase/
The source code repository for the FactorBase system
Home Page: https://sfu-cl-lab.github.io/FactorBase/
perhaps make reduced setup with subset of functors only
Testing Expansions table with course0
entry only
create String : create table `b_star` as Select `course0_counts`.`MULT` * `student0_counts`.`MULT` as `MULT` ,`diff(course0)` , `rating(course0)` , `intelligence(student0)` , `ranking(student0)` , course0.course_id from `course0_counts` , `student0_counts`
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'course0.course_id' in 'field list'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1053)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4096)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4028)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
at BayesBaseCT_SortMerge.BuildCT_Rnodes_star(BayesBaseCT_SortMerge.java:1130)
at BayesBaseCT_SortMerge.CTGenerator(BayesBaseCT_SortMerge.java:359)
at BayesBaseCT_SortMerge.buildCT(BayesBaseCT_SortMerge.java:107)
at BayesBaseCT_SortMerge.main(BayesBaseCT_SortMerge.java:57)
we have code for this from Yuke and maybe Ramakanth somewhere
I think it would be nice to have the main webpages all come from github rather than here.
http://www.cs.sfu.ca/~oschulte/BayesBase/BayesBase.html
For one thing, it's easier to share the editing. We can still use BayesBase to host bigger files, e.g. datasets.
The ETL points to a Wikipedia page. Should we add a repository or tutorial link to this?
During the generation of the CP tables, if the ratio of the local multi and total number of multi, there will be an error like Data truncation: Division by 0
The reason is that the calculation of CP is local_mult/total_mult. If the result is smaller than the minimum number of the table column type( default is float(7,6) ), it will end with 0 Cp. In the next step of calculating likelihood, likelihood = LOG(CP) * local_mult. It will have the problem of log(0), which causes the "Data truncation: Division by 0" issue.
Solution:
Modify the schema of CP_table, by adding more decimal numbers after points like float(7,10) for column CP.
this would make the laj code simpler and the output more readable
MLN Generator We have code for this from Ramakanth
see Sarah's code
reimplement the following metrics
i.e. functorid = value
experimenting with different ER diagram and compression ratio?
Once we have resolved #34, the Rnode table no longer depends on running lattice generator.
reimplement the following metrics
this would make the Bayes net learning code much simpler
Integrating PS(Parameter Servers) into Model Manager to support distributed model learning?
setVarsFromConfig() has been duplicated and has different interpretation in the following:
databaseName_std
to a new meaning[Suggestion] : We need to set these as final variables during a single run.
LatticeGenerator should have been rewritten so that instead of messing up rnid
in Rnodes
, it works with a copy shortrnid
of rnid
. Here's a suggestion.
create table LatticeRNodes as select * from RNodes
.Rnodes
table by LatticeRnodes
. The lattice generator is free to mess up the LatticeRnodes
.lattice_membership.name
to lattice_membership.Rchain
.lattice_membership.orig_rnid
to lattice_membership.member.rnid
and lattice_rel.orig_rnid
to lattice_rel.rnid_removed
.porting to Spark for very large industrial databases
using SparkSQL Operators only, or combine with SQL queries?
Make logic clearer, use propagation more cleanly. Almost everything is union of clauses from lower in the hierarchy. Make sure we use counts table without rnode so it's easy to connect with start table.
change setup.sql to include foreign key pointers
change setup.sql to generate a pcolumnid (e.g. id(course0)). (may be unnecessary)
change transfer script to include rnodes for each 2node.
make independent table for finding key column of pvariable (rather than use RnodesPvars)
make single table for each metaquery (e.g. pvid/rnid, Clause_Type, Entries). For various reasons, store meta-information:
+ Lattice_Point (e.g. prof0, rchain).
+ clause_type (e.g. where)
+ table_type (e.g. star)
+ entry tupe (e.g. 1node, aggregate).
Reasons:
+ Support FunctorSets in a different way.
+ Make flat tables by finding one nodes rather than by making a separate ADT_RNodes_1Nodes table.
from Sarah's work or just using KLD
@vidhiJain The contingency table code is in
https://github.com/sfu-cl-lab/FactorBase/blob/master/src/BayesBaseCT_SortMerge.java. It's called from RunBB.java as follows:
//assumes that dbname is in config file and that dbname_setup exists.
BayesBaseCT_SortMerge.CTGenerator();
The key procedure is CTGenerator()
This is what builds the CT tables. Unfortunately Zhensong made a version of CTGenerator that is for working with groundings called target. Plus he merged this with the nontarget code. Also he merged it with a copy for the case where we are interested only in a subset of the functor nodes. My suggestion would be this.
Make a new branch.
In the new branch, make a copy of BayesBaseCT_SortMerge.java with all the target and subset stuff removed. I can probably even find an older version without the target stuff. See if we can run it then.
Then we can design a CT generator for groundings and subsets from scratch. I think a key move would be to change CT generator so that it takes as input the setup database rather than treat that as a global variable. Then we can use the CT generator with different (temporary) setup databases.
could use a general drop procedure
makes laj code simpler (just call SQL using the right join option, or optimization option) rather than maintaining our own join code
depends on #21
let edges in biggest Rchain be stored in BN_Structure view
Sajjad reports that he gets the same edge both required and forbidden on UW. I thought Zhensong and I fixed that? Did we lose the fix?
e.g. Problog, PSL, Prism, weighted model counting
get rid of the annoying fake rnids
Hi Wolfgang,
thank you for changing the timeout settings, that’s great. @ZhenSong: please try running the job again.
As for the ssh, we usually work on Mac and Windows. I looked at https://en.wikipedia.org/wiki/Plink and they said Plink was like ssh. So I tried replacing Plink by ssh and it does seem to work!! Specifically, I said
ssh -L 127.0.0.1:3306:db3:3306 -v -l functor -N bugaboo.westgrid.ca
Then I was able to connect to my local host 127.0.0.1 using Mysql workbench and it shows me my db3 files. The only thing I wasn’t able to run mysql from the command line. But I suspect that may just the difficulty of entering the db3 password manually on the command line.
localhost:~ oschulte1$ mysql -h127.0.0.1 -P 3306 -u functor -p
Enter password:
ERROR 1045 (28000): Access denied for user 'functor'@'172.18.1.0' (using password: YES)
localhost:~ oschulte1$
We will try the JDBC connection using the local port forwarding, if that works then we can run our code locally against db3.
@ZhenSong: can you please try if you can first forward the local port to bugaboo, then run BayesBase pointing it to the local port?
Thank you for your patience!
We had some trouble adding indexes and primary keys in setup tables (see comments.) According to Zhensong: zqian, max key length limitation "The maximum column size is 767 bytes",
enable "innodb_large_prefix" to allow index key prefixes longer than 767 bytes (up to 3072 bytes).
Oct 17, 2013
archive: make dump, perhaps store on clarinet
infrequent access: store on clarinet mysql (e.g. cross-validation). Or maybe get cs-oschulte03 to use clarinet?
frequent access but not production: move to bugaboo. e.g.
Other points:
depends on #21
Utilizing DeepDive/Tuffy for more scalable statistical inference?
Like issue #12
We had a number of tools for evaluating a generated MLN. May have to collect these from old Bayesbase and from Yuke's code and documentation.
In class BayesBaseCT_SortMerge.java, the LinkCorrelation variable (previously opt2) is redundant . The execution of metadata_2.sql happens !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.