GithubHelp home page GithubHelp logo

vmware / differential-datalog Goto Github PK

View Code? Open in Web Editor NEW
1.3K 30.0 117.0 301.87 MB

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.

License: MIT License

Haskell 24.65% Makefile 0.06% Rust 27.00% Shell 1.43% Python 2.53% C 2.88% Java 37.86% Yacc 0.91% HTML 0.15% Dockerfile 0.05% Go 1.42% Nix 0.07% JavaScript 0.01% CSS 0.03% TypeScript 0.88% Vim Script 0.06%
programming-language datalog ddlog rust incremental

differential-datalog's Introduction

License: MIT CI workflow pipeline status rustc Gitter chat

Differential Datalog (DDlog)

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. With DDlog, the programmer does not need to worry about writing incremental algorithms. Instead they specify the desired input-output mapping in a declarative manner, using a dialect of Datalog. The DDlog compiler then synthesizes an efficient incremental implementation. DDlog is based on Frank McSherry's excellent differential dataflow library.

DDlog has the following key properties:

  1. Relational: A DDlog program transforms a set of input relations (or tables) into a set of output relations. It is thus well suited for applications that operate on relational data, ranging from real-time analytics to cloud management systems and static program analysis tools.

  2. Dataflow-oriented: At runtime, a DDlog program accepts a stream of updates to input relations. Each update inserts, deletes, or modifies a subset of input records. DDlog responds to an input update by outputting an update to its output relations.

  3. Incremental: DDlog processes input updates by performing the minimum amount of work necessary to compute changes to output relations. This has significant performance benefits for many queries.

  4. Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.

  5. In-memory: DDlog stores and processes data in memory. In a typical use case, a DDlog program is used in conjunction with a persistent database, with database records being fed to DDlog as ground facts and the derived facts computed by DDlog being written back to the database.

    At the moment, DDlog can only operate on databases that completely fit the memory of a single machine. We are working on a distributed version of DDlog that will be able to partition its state and computation across multiple machines.

  6. Typed: In its classical textbook form Datalog is more of a mathematical formalism than a practical tool for programmers. In particular, pure Datalog does not have concepts like types, arithmetics, strings or functions. To facilitate writing of safe, clear, and concise code, DDlog extends pure Datalog with:

    1. A powerful type system, including Booleans, unlimited precision integers, bitvectors, floating point numbers, strings, tuples, tagged unions, vectors, sets, and maps. All of these types can be stored in DDlog relations and manipulated by DDlog rules. Thus, with DDlog one can perform relational operations, such as joins, directly over structured data, without having to flatten it first (as is often done in SQL databases).

    2. Standard integer, bitvector, and floating point arithmetic.

    3. A simple procedural language that allows expressing many computations natively in DDlog without resorting to external functions.

    4. String operations, including string concatenation and interpolation.

    5. Syntactic sugar for writing imperative-style code using for/let/assignments.

  7. Integrated: while DDlog programs can be run interactively via a command line interface, its primary use case is to integrate with other applications that require deductive database functionality. A DDlog program is compiled into a Rust library that can be linked against a Rust, C/C++, Java, or Go program (bindings for other languages can be easily added). This enables good performance, but somewhat limits the flexibility, as changes to the relational schema or rules require re-compilation.

Documentation

Installation

Installing DDlog from a binary release

To install a precompiled version of DDlog, download the latest binary release, extract it from archive, add ddlog/bin to your $PATH, and set $DDLOG_HOME to point to the ddlog directory. You will also need to install the Rust toolchain (see instructions below).

If you're using OS X, you will need to override the binary's security settings through these instructions. Else, when first running the DDlog compiler (through calling ddlog), you will get the following warning dialog:

"ddlog" cannot be opened because the developer cannot be verified.
macOS cannot verify that this app is free from malware.

You are now ready to start coding in DDlog.

Compiling DDlog from sources

Installing dependencies manually

  • Haskell stack:
    wget -qO- https://get.haskellstack.org/ | sh
    
  • Rust toolchain v1.52.1 or later:
    curl https://sh.rustup.rs -sSf | sh
    . $HOME/.cargo/env
    rustup component add rustfmt
    rustup component add clippy
    
    Note: The rustup script adds path to Rust toolchain binaries (typically, $HOME/.cargo/bin) to ~/.profile, so that it becomes effective at the next login attempt. To configure your current shell run source $HOME/.cargo/env.
  • JDK, e.g.:
    apt install default-jdk
    
  • Google FlatBuffers library. Download and build FlatBuffers release 1.11.0 from github. Make sure that the flatc tool is in your $PATH. Additionally, make sure that FlatBuffers Java classes are in your $CLASSPATH:
    ./tools/install-flatbuf.sh
    cd flatbuffers
    export CLASSPATH=`pwd`"/java":$CLASSPATH
    export PATH=`pwd`:$PATH
    cd ..
    
  • Static versions of the following libraries: libpthread.a, libc.a, libm.a, librt.a, libutil.a, libdl.a, libgmp.a, and libstdc++.a can be installed from distro-specific packages. On Ubuntu:
    apt install libc6-dev libgmp-dev
    
    On Fedora:
    dnf install glibc-static gmp-static libstdc++-static
    

Building

To build the software once you've installed the dependencies using one of the above methods, clone this repository and set $DDLOG_HOME variable to point to the root of the repository. Run

stack build

anywhere inside the repository to build the DDlog compiler. To install DDlog binaries in Haskell stack's default binary directory:

stack install

To install to a different location:

stack install --local-bin-path <custom_path>

To test basic DDlog functionality:

stack test --ta '-p path'

Note: this takes a few minutes

You are now ready to start coding in DDlog.

vim syntax highlighting

The easiest way to enable differential datalog syntax highlighting for .dl files in Vim is by creating a symlink from <ddlog-folder>/tools/vim/syntax/dl.vim into ~/.vim/syntax/.

If you are using a plugin manager you may be able to directly consume the file from the upstream repository as well. In the case of Vundle, for example, configuration could look as follows:

call vundle#begin('~/.config/nvim/bundle')
...
Plugin 'vmware/differential-datalog', {'rtp': 'tools/vim'} <---- relevant line
...
call vundle#end()

Debugging with GHCi

To run the test suite with the GHCi debugger:

stack ghci --ghci-options -isrc --ghci-options -itest differential-datalog:differential-datalog-test

and type do main in the command prompt.

Building with profiling info enabled

stack clean

followed by

stack build --profile

or

stack test --profile

differential-datalog's People

Contributors

amytai avatar antoninbas avatar argc0 avatar arkivm avatar blp avatar booxter avatar c-nixon avatar d-e-s-o avatar d4hines avatar danbst avatar debnil avatar dependabot[bot] avatar desharchana19 avatar falzberger avatar gfour avatar gz avatar justinpettit avatar kixiron avatar krs85 avatar lalithsuresh avatar lykahb avatar mooly-sagiv avatar paralax avatar price1999a avatar remysucre avatar ryzhyk avatar smadaminov avatar srenatus avatar tobindekorne avatar yjiayu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

differential-datalog's Issues

foreign keys for more compact FTL

Ben's version of FTL allows traversing cross-table pointers, e.g.:

let O = lspip.lsp.dhcpv6_options.option_args

To support this without using nested for loops, we'd need to add some notion of foreign keys and possibly other SQL constraints to the language

DDlog lints

  • Detect unused variables in rules and functions
  • Detect unused intermediate relations (i.e., relations that are neither output nor used to compute other relations).
  • Grouping by complex variables or variables wrapped in Ref<> is inefficient. Ideally, one should group by one or several small identifiers.
  • ???

Optimizations

  • string interning
  • use arrangements for relations that are used as the first atom in multiple rules (a special case of common subexpression elimination)
  • use Box or Arc type for Value to save memory
  • avoid redundant cloning in the generated code
  • use per-key distinct when possible
  • extract "by-self" arrangement from distinct
  • 32-bit weights
  • use unreachable() instead of panic!() in Compile.hs (see TODO in Compile.hs)

https://github.com/frankmcsherry/differential-dataflow/issues/113

Test and document parsec's string parser

We rely on parsec's standard parser for strings, which supports unicode and escaping.

TODO:

  • check and document its exact functionality
  • can we handle all OVSDB strings?

Syntax improvements

  • Rename int -> bigint
  • rename ground relation -> input relation
  • add return statement

fix goldenVsFiles

goldenVsFiles behaves funny when more than one files has changed: it first complains about the first file only. Deleting the corresponding .expect file causes it to report that a new golden file has been generated (without an error). Finally, the third run reports a mismatch in the second golden file.

Test calling DDlog from Java

It should be possible to load a compiled DDlog program and execute transactions from Java.

This requires a couple of preparatory steps:

  • DDlog currently generates static libraries. We have to change the crate type to build .so instead.
  • DDlog still does not have a complete C API.

Background:

  • The generated library file is written to the target/release directory, e.g., tests/datalog_tests/path/target/release/libpath.a.
  • Associated header file is tests/datalog_tests/path/ddlog.h

Cannot have antijoins with wildcards

Also, the error message is not very good.

error: file.dl:211:234-211:235: Argument _method must be specified in this context
 
RBasic_MethodLookup(_simplename, _descriptor, _type, _method) :- RDirectSuperclass(_type, _supertype), RBasic_MethodLookup(_simplename, _descriptor, _supertype, _method), not RBasic_MethodImplemented(_simplename, _descriptor, _type, _).

Type inference using unification

Implement a proper type inference algorithm based on unification. The current ad hoc implementation is hacky and fragile. No idea why I did not do the right thing straight away...

TODOs

Basic Datalog (Leonid)

  • Invoke user callbacks on output relations
  • Generate code for ground facts
  • Framework for (incremental) input/output testing of Datalog programs
  • More examples of interesting Datalog programs
    • span computation.
    • souffle examples.
  • Tutorial on writing and testing Datalog programs (so far, wrote a mini-tutorial on adding new tests, but need a more detailed one introducing language basics and workflow)
  • FFI
  • Performance/memory tests
    • Tools for datalog program memory profiling
  • Update main ddlog executable

FTL (Mihai)

  • Generate Datalog schema from JSON
  • Generate input and output adapters
  • Adapters for external C functions invoked from FTL
  • Test workload

Future improvements

  • Debugging facility
  • CLI interface to ddlog
  • RPC interface to ddlog

Incomplete implementation of assignments in Compile.hs

We currently don't have a way to compile arbitrary l-values to Rust, e.g.,

Constructor{var x, var y} = z

is ok, but

Constructor{x, y} = z

where x and y are previously introduced variables is illegal.

Either implement the logic to support this translation (e.g., by assigning to intermediate variables) or modify validation logic to disallow such programs

Automatic string conversion for type variables

At the moment, the compiler refuses to convert type variables to strings.

This can be fixed by:

  1. Infering that a function expects a type variable to be printable
  2. In the generated Rust function, add the Display trait for this function
  3. Whenever the function is called, check that its concrete arguments satisfy the trait
  4. Implement Display trait for types that have a user-defined 2String method

OVN integration TODOs

TODO list for DDlog/OVN integration:

  • generate DDlog schema from OVSDB schema.
    • This can be done mostly automatically from OVSDB schema files; however we will need some way to specify which OVSDB relations are inputs and which our outputs (can they be both?). Should we add those as annotations to the OVSDB schema or just specify them as command line arguments to the schema conversion tool?
  • port rules from northd.c to northd.dl
  • create data sets for standalone testing of northd.dl (i.e., outside of OVN)
  • generate runtime OVSDB input/output adapters for DDlog
  • create OVN infrastructure to test northd.dl against northd.c on a live system
  • integration testing
  • scale testing (and CPU/memory profiling)
  • suppress Rust warnings
  • catch up with latest OVN master
  • documentation
  • fix remaining test failures
  • test C and DDlog at the same time if DDlog is enabled in the build
  • clean up DDlog patches

mpsc interface to ValMap

ValMap is a data structure that stores the content of output tables. It is updated by each change callback invoked by the differential inspect() operator. ValMap is protected by a mutex, potentially introducing contention in workloads where outputs are frequently updated. One possible solution is to maintain ValMap in a separate thread and use mpsc queue to communicate with this thread from workers. We must be careful though to flush the queue before transaction_commit() returns. Another (even better?) option is to keep a ValMap per worker and only merge them when the client requests a copy of an output table.

Implement bit slice logic in Compile.hs

This is a little tricky, as we use different representations depending on the size of a bit vector (BitUint or uXX) depending on its width. Bit slicing may involve conversion between these representations

Get rid of `let` syntax

We use let in FTL, and var in the rest of the language. There is not reason for these two forms of variable declaration.

No symbol table?

Currently one can have a type that depends on an undefined identifier

typedef t = x

differential-dataflow optimizations

  • use distinct_total instead of distinct where possible.
  • don't convert external collections to Variables in the nested scope (variables use a lot of memory)

support for multi-file specs

The most important use case is combining auto-generated schema (e.g., from OVSDB) with hand-written code. We may be able to get away with using CPP or m4, but must make sure that line numbers in error message refer to original input files.

Allow access to enum fields

The Rust backend currently does not allow access to fields of types with multiple constructors (i.e., types compiled to enums), even if all constructors of a type have the same field

add support for constants

At the moment, constants can be simulated by 0-arity functions. The main downside is the need to use parenthesis every time a constant is referenced. Also, function names start with lower-case letters, whereas it is customary to capitalize constant names.

The Rust code generated for split does not compile

The unmerged tutorial has an example which does not seem to generate correct Rust code.
This is the dl code:

extern function split(s: string, sep: string): set<string>
function split_ip_list(x: string): set<string> =
   split(x, " ")

The generated code does not compile:

fn split_ip_list(x: &String) -> set<String> {
    split(x, (&r###" "###.to_string()))
}

since there is no set type in Rust.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.