GithubHelp home page GithubHelp logo

Comments (10)

miku avatar miku commented on July 30, 2024

The standard backend is mem, so unless you are on a machine with a terabyte of RAM (just estimating), the process will get killed at one point, when it tries to allocate memory. You could try some other backend, e.g. leveldb via -db=leveldb. I just loaded 106M triples (which is about 1/17th of freebase) into a leveldb backend in about 13h.

from cayley.

SergeC avatar SergeC commented on July 30, 2024

Can you please provide more detailed instructions how to run it? Step-by-step instructions would be great.
Do I need to install leveldb?
How to load FreeBase dump to leveldb?

from cayley.

miku avatar miku commented on July 30, 2024

What OS are you targeting?

from cayley.

SergeC avatar SergeC commented on July 30, 2024

On my dev machine OSX 10.9 with 16Gb of RAM but I have very limited HDD space on it. I'd like to use OSX.
On my server Ubuntu 14.04 64bit with 2Gb of RAM which hosts uncompressed database.

I can connect to server via NFS and use its HDD space.

from cayley.

miku avatar miku commented on July 30, 2024

Setting up leveldb was quite straightforward[1]. Replace the /tmp/testdb with the path, where the DB should be created.

$ cayley init -db="leveldb" -dbpath="/tmp/testdb"
$ ls /tmp/testdb
000001.log  CURRENT  LOCK  LOG  MANIFEST-000000

Then load the triples into the database. Note, that this process will take hours or days, if you want to import the complete 330G from freebase (extrapolating from my single data point (106M triples, 13h) the freebase import would take over 200 hours[2]).

$ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt"

[1] You do not need to install leveldb beforehand, since the used library is pure Go, too.

[2] I am looking at methods to speed up import. I think there are two ways: a) make the input smaller, e.g. by applying namespace or vocabulary shortcuts (I wrote some prototype for that with here); b) use a distributed backend.

from cayley.

SergeC avatar SergeC commented on July 30, 2024

How to query database after data load ($ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt")?

What do you think about using grep for simple queries? It's also possible to use zegrep on compressed dump file.

May be its possible to cut only some domains? For example I need only /music domain and don't care about rest data. So cut /music, load, use...

What do think about asking developers to add possibility to load only certain domains ($ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt" -domain="/music")?

Can your prototype export from freebase.nt to json file only /music domain? So I'll load it into mongodb and it would be fine for me.

from cayley.

miku avatar miku commented on July 30, 2024

a) I think the cayley docs are quite nice,
b) grep can be a viable option for adhoc searches - depends on your use case,
c) I would argue against something like -domain, since files can be easily preprocessed with command line tools,
d) No, there is no domain filter in my prototype, I'd just grep out the /music n-triples and then convert those to json.

from cayley.

SergeC avatar SergeC commented on July 30, 2024

If you add to your tool nttoldj possibility to read compressed FB data dump (described #57 (comment)) and process certain domains data load process should become faster. And I assume will be possible to run multiple instances.
And even more you can use modern CPU with onboard memory controller and 32GB of RAM, upload freebase dump(28GB) to RAM disk and operations will be more faster.

since files can be easily preprocessed with command line tools

Can you provide me some examples of commands? Currently I need to do this https://www.freebase.com/user/sergec/views/artists_by_record_label?mql= but get whole list

from cayley.

barakmich avatar barakmich commented on July 30, 2024

@miku is spot-on. I'll also point out that by tweaking some of the database flags (ie, the configuration docs) you can load even more triples into leveldb in even less time :)

from cayley.

barakmich avatar barakmich commented on July 30, 2024

Closing, for lack of scope, and a good answer.

from cayley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.