GithubHelp home page GithubHelp logo

how to run simbase? about simbase HOT 11 CLOSED

guokr avatar guokr commented on June 19, 2024
how to run simbase?

from simbase.

Comments (11)

mountain avatar mountain commented on June 19, 2024

please check the dir bin in the project dir.

https://github.com/guokr/simbase/tree/master/bin

from simbase.

zloi47 avatar zloi47 commented on June 19, 2024

I used java -server -jar command directly in terminal without start file and then connected via redis-cli. Is there any way of importing data from .txt files? Also if I add more then 5 values in vectors (vadd) and then "vget" it will promt "(error) Unknown server error!". Is there a limit set to 5?

from simbase.

mountain avatar mountain commented on June 19, 2024

You should setup the schemas first just like any RDBMS such as mysql. Had you follow the steps described in https://github.com/guokr/simbase#a-general-application-case ?

Could you elaborate your setup scripts and your vget command in detail?

About the importing, currently we do not have such tool, but it is very easy to write a scripts.

from simbase.

zloi47 avatar zloi47 commented on June 19, 2024

Is basis a relation and article an attribute? What I want to do is to import bag of words (4623 rows of bag of words, each with 10,000 vectors) and than compare for similarity. I do not see any examples of scripts in documentation. Any hints would be good.

from simbase.

mountain avatar mountain commented on June 19, 2024

By concepts, basis is something like tablespace which is a container hold all the data, while vector set is something like a table, and recommendation is a relationship between two vector set.

I am not sure about your cases since I did not see the detail.

from simbase.

zloi47 avatar zloi47 commented on June 19, 2024

I create bag of words representation out of vocabulary with python. I export them to text file. The file looks like(for 10 most common words, 1 vector for each most common word):

1 {0, 0, 0, 0, 1, 1, 0, 0, 1, 0}
2 {1, 1, 0, 0, 1, 1, 1, 1, 1, 1}
3 {0, 1, 1, 0, 1, 0, 0, 0, 1, 1}
4 {1, 1, 0, 1, 1, 0, 1, 0, 1, 1}
5 {1, 0, 0, 0, 1, 1, 0, 0, 1, 0}

Then I import Vectors (without 1,2,3,4,5, they are just sentenceids) to postgres. In postgres I use query to compare two bag of word vectors for the similiraty:

select a.doc, a.sentenceid, b.doc, b.sentenceid, cardinality(array(select unnest(array_positions(a.tokenizedsentence, '1')) intersect select unnest(array_positions(b.tokenizedsentence, '1'))))::float / cardinality(a.tokenizedsentence)::float from nlpdata a, nlpdata b where a.sentenceid < b.sentenceid;

There are 4623 sentences, so 4623 rows. Total number of comparisons is 10,683,753. For 10 most common words time of execution is about 20 minutes. My goal is to compare bag of words which consists of 10,000 most common words, so 10,000 vectors. To consider how much time it takes for 10 vectors, in case of 10,000 vectors it will take about 24 hours. As Simbase works on vectors, I thought it can do these calculations faster.
Also I have another form of representation, which looks like(Embedding Vectors created by Glove):

1 -0.00205523529412 0.00168023529412 -0.00357188235294 -0.000664294117647
2 -0.00120581818182 -0.00318495454545 -5.30909090909e-05 9.39090909091e-05
3 -0.000793523809524 0.000649047619048 0.000342761904762 -0.00143176190476

For second case I wanted cosine distance, but postgres does not have any functions. Maybe Simbase can do something with it?

from simbase.

mountain avatar mountain commented on June 19, 2024

For 10 common words example, try some scripts like below steps in redis-cli:

Setup:

bmk commonword w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
vmk commonword sentence 
rmk sentence sentence cosinesq

Fill data:

vadd sentence 1 0 0 0 0 1 1 0 0 1 0
vadd sentence 2 1 1 0 0 1 1 1 1 1 1
vadd sentence 3 0 1 1 0 1 0 0 0 1 1
vadd sentence 4 1 1 0 1 1 0 1 0 1 1
vadd sentence 5 1 0 0 0 1 1 0 0 1 0 

Query

rrec sentence 1 sentence
rrec sentence 2 sentence

from simbase.

zloi47 avatar zloi47 commented on June 19, 2024

Thanks. But is there any way to import vectors with script? Writing manually is impossible.

from simbase.

mountain avatar mountain commented on June 19, 2024

For the Glove data, it is very neat to add a head for each row:

1 -0.00205523529412 0.00168023529412 -0.00357188235294 -0.000664294117647 

=>

redis-cli -h 127.0.0.1 -p 7654 vadd sentence 1 -0.00205523529412 0.00168023529412 -0.00357188235294 -0.000664294117647 

save it as a shell scripts, then execute the shell.

For your text file, there are two ways:

  1. modify your original python scripts with redis python binding
  2. change your text file into a shell like the Glove example.

from simbase.

zloi47 avatar zloi47 commented on June 19, 2024

Thanks, importing should suit my case. And what about comparing every sentence with each other?

rrec sentence 1 sentence

compares to other sentences or to itself? Also:

`127.0.0.1:7654> rrec sentence 1 sentence

  1. "5"
  2. "3"
  3. "4"

`
Can you explain what are these numbers and comparison is done?

from simbase.

mountain avatar mountain commented on June 19, 2024
> rrec sentence 1 sentence
1) "5"
2) "3"
3) "4"

the result is the id list of nearest sentences ordered by distance.

I am not sure whether you had read our documents or not, please read them before some basic questions. Thank you.

from simbase.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.