thinkaurelius / faunus Goto Github PK

View Code? Open in Web Editor NEW

260.0 54.0 58.0 28.79 MB

Graph Analytics Engine

Home Page: http://faunus.thinkaurelius.com

License: Apache License 2.0

Shell 1.48% Groovy 2.02% Java 96.50%

faunus's Issues

Why is RawComparator not evaluating?

Make the data flow <FaunusVertex,NullWritable>

Remove as many "news" as possible in Titan adaptor.

FaunusVertexRelationLoader loader = new FaunusVertexRelationLoader(IDHandler.getKeyID(row.left.duplicate()));

public FaunusVertexRelationLoader(final long id) {
    Preconditions.checkArgument(id > 0);
    vertex = new FaunusVertex(id);
}

Use GraphSONFactory for JSON Serialization

In the JSONRecordReader/Writer use GraphSONFactory for graph Element serialization/deserialization:

https://github.com/tinkerpop/blueprints/blob/master/blueprints-core/src/main/java/com/tinkerpop/blueprints/util/io/graphson/GraphSONFactory.java#L41

Make graph-example-1 formatted with new graph-of-the-gods JSON.

Get True Vertex Count For Rexster

If the v.estimate is set to -1 then do a g.V.count() to get the vertex count for a proper split.

Add general InputReader properties to faunus.properties...

...so that data can be loaded with filters.

Do no assume List<Edge> -- Iterable<Edge> (use Gremlin-Java)

Make Faunus Properties be a Configuration (not java.util.Properties)

demonstrate default Hadoop properties in Whirr recipe

https://svn.apache.org/repos/asf/whirr/tags/release-0.7.0/services/hadoop/src/main/resources/whirr-hadoop-default.properties

Document Hadoop Configuration Cascade In Wiki

Make respective .bats for .shs...

gremlin.bat is basically identical to other gremlin.bat files save we now have -i parameter.

Move inputformat config out of FaunusCompiler to static methods in InputFormats.

FaunusGraph -> FaunusPipeline through Wiki docs.

Add hadoop configurations to faunus.properties...

...then submit all those to global JobConf.

Provide support for MapReduceMapSequence

As Map+Reduce?Map* is the general solution for in-memory mapping.

Support Gremlin/Groovy based statistics

g.V.groupCount(Vertex,"{it.age}")

Add Pearson and Spearman correlation for assortative mixing.

http://vangjee.wordpress.com/2012/02/29/computing-pearson-correlation-using-hadoops-mapreduce-mr-paradigm/

For a vertex, edges are indexed by type (why repeat label in edge)

Make the Writable and in-memory representation more efficient.

Make astyanax connector work well with Titan. (dependency tree issue)

        titanconfig.setProperty("storage.backend","cassandra");   // todo: astyanax

Faunus - Titan:BerkeleyJE Adapter

We might also be able to accomodate that through Rexster.

Get away with only sending one vertex long ID over the wire for every edge.

char direction.

Create a TitanHBaseInputFormat and RecordReader

Should holder be WriteableComparable?

Print the faunus.properties when the job is run.

Better error handling around corrupt JSON data...

try/catch logger in GraphSONRecordReader.

Make use of variable length longs for vertex ids.

...also, make use of Titan edge compression techniques.

[INFO] faunus.properties when a Faunus job loads.

Stream vertex edges off of disk -- (HDFS solution? -- ImmutableBytes solution?)

solve the super node problem for a vertex (can't always assume we can hold in memory)

Check how slow invoke() is relative to direct method pointer.

Still faster than serializing to disk?

Improve the memory efficicency of GraphSONRecordReader

As it stands, there are three representations of the same information in memory at every record parse:

the String of the record (up to \n)
the JSONObject representation of the String
the FaunusVertex representation of the JSONObject

If we could go from String -> FaunsVertex, that would be good. However, the best would be InputStreamReader -> FaunusVertex.

Look into ArrayWritable.

http://chasebradford.wordpress.com/category/hadoop/

Support out().out().linkOut/In() as 1 map reduce job. Need better state analysis.

Make a global BAD RECORD counter for all RecordReaders.

back() semantics in Pipes and Faunus slightly different...

...I like Faunus semantics as its a "hop" not a "drop." Think about which model to resolve to.

Provide a more optimized byte[] representation of Faunus elements.

Provide max record size controls in faunus.properties

public void initialize(final InputSplit genericSplit, final TaskAttemptContext context) throws IOException {
final FileSplit split = (FileSplit) genericSplit;
final Configuration job = context.getConfiguration();
//this.maxLineLength = job.getInt("mapred.linerecordreader.maxlength", Integer.MAX_VALUE);

traverse(OUT,"author",IN,"author","coauthor",DROP,"weight")

thinkaurelius / faunus Goto Github PK

faunus's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs