Comments (10)
The standard backend is mem
, so unless you are on a machine with a terabyte of RAM (just estimating), the process will get killed at one point, when it tries to allocate memory. You could try some other backend, e.g. leveldb via -db=leveldb
. I just loaded 106M triples (which is about 1/17th of freebase) into a leveldb backend in about 13h.
from cayley.
Can you please provide more detailed instructions how to run it? Step-by-step instructions would be great.
Do I need to install leveldb
?
How to load FreeBase dump to leveldb
?
from cayley.
What OS are you targeting?
from cayley.
On my dev machine OSX 10.9 with 16Gb of RAM but I have very limited HDD space on it. I'd like to use OSX.
On my server Ubuntu 14.04 64bit with 2Gb of RAM which hosts uncompressed database.
I can connect to server via NFS and use its HDD space.
from cayley.
Setting up leveldb was quite straightforward[1]. Replace the /tmp/testdb
with the path, where the DB should be created.
$ cayley init -db="leveldb" -dbpath="/tmp/testdb"
$ ls /tmp/testdb
000001.log CURRENT LOCK LOG MANIFEST-000000
Then load the triples into the database. Note, that this process will take hours or days, if you want to import the complete 330G from freebase (extrapolating from my single data point (106M triples, 13h) the freebase import would take over 200 hours[2]).
$ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt"
[1] You do not need to install leveldb beforehand, since the used library is pure Go, too.
[2] I am looking at methods to speed up import. I think there are two ways: a) make the input smaller, e.g. by applying namespace or vocabulary shortcuts (I wrote some prototype for that with here); b) use a distributed backend.
from cayley.
How to query database after data load ($ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt"
)?
What do you think about using grep
for simple queries? It's also possible to use zegrep
on compressed dump file.
May be its possible to cut only some domains? For example I need only /music
domain and don't care about rest data. So cut /music
, load, use...
What do think about asking developers to add possibility to load only certain domains ($ cayley load -db="leveldb" -dbpath="/tmp/testdb" -triples="freebase.nt" -domain="/music"
)?
Can your prototype export from freebase.nt
to json file only /music
domain? So I'll load it into mongodb and it would be fine for me.
from cayley.
a) I think the cayley docs are quite nice,
b) grep
can be a viable option for adhoc searches - depends on your use case,
c) I would argue against something like -domain
, since files can be easily preprocessed with command line tools,
d) No, there is no domain filter in my prototype, I'd just grep
out the /music
n-triples and then convert those to json.
from cayley.
If you add to your tool nttoldj
possibility to read compressed FB data dump (described #57 (comment)) and process certain domains data load process should become faster. And I assume will be possible to run multiple instances.
And even more you can use modern CPU with onboard memory controller and 32GB of RAM, upload freebase dump(28GB) to RAM disk and operations will be more faster.
since files can be easily preprocessed with command line tools
Can you provide me some examples of commands? Currently I need to do this https://www.freebase.com/user/sergec/views/artists_by_record_label?mql= but get whole list
from cayley.
@miku is spot-on. I'll also point out that by tweaking some of the database flags (ie, the configuration docs) you can load even more triples into leveldb in even less time :)
from cayley.
Closing, for lack of scope, and a good answer.
from cayley.
Related Issues (20)
- 'go build ./cmd/cayley' breaks: missing go.sum entry for module providing package github.com/golang/protobuf/proto HOT 3
- Move cayley.io to Github Pages or Netlify or similar HOT 1
- Aggregations
- Code quality: NameOf and ValueOf should return errors HOT 2
- Relationships not displaying correctly in new cayley version HOT 1
- Cayley.io is down again HOT 10
- Seeking for a new maintainer HOT 13
- hidalgo: interfaces changed a bit in newer versions HOT 2
- Security issue with github.com/gogo/protobuf version < 1.3.2 HOT 2
- Repl Allows One Command to Execute and Subsequent Commands Fail HOT 1
- request a new release HOT 1
- Filter path by Label field of quads HOT 1
- Filter path by empty label field (FilterContext) HOT 1
- Plans for a new release? HOT 2
- Build Issue using cayley 0.7.7 HOT 1
- q InternalQuad ==> q *InternalQuad
- gio gui HOT 1
- Plans for a new release?
- Discourse forum down
- Is this project dead? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cayley.