GithubHelp home page GithubHelp logo

wyfunique / dbsim Goto Github PK

View Code? Open in Web Editor NEW
15.0 2.0 2.0 175 KB

The codebase for DBSim

Python 93.97% HTML 2.14% JavaScript 3.31% CSS 0.59%
data-science database in-database in-database-analytics query-optimizer sql-parser sql-query

dbsim's Introduction

DBSim

DBSim is an extensible database simulator in pure Python for fast prototyping in-database algorithms. It provides a simulated RDBMS environment, related tools and high extensibility & flexibility for data scientists to fast develop, verify and analyze the prototypes of in-database analytic algorithms.

Implementing any new in-database algorithm directly in real database kernels without pre-verification may not be a good idea and may cause waste of time. If you agree, try DBSim! Spending a little time on prototyping your in-db algorithms in DBSim to convince yourself before formally making them into real-world DBMS.

Features

  • Including all the major components of a general RDBMS query engine: SQL parser, relational operators, logical and physical plan generator, query optimizer, etc.
  • High extensibility and flexibility: DBSim provides APIs to extend almost any of the DB components, including but not limited to adding new keywords to query syntax, implementing custom operators, writing custom optimization rules, extending physical plan executors, etc.
  • Debugging and analyzing tools: query plan visualizer, GUI, etc.
  • Low learning cost for users: writing in Python makes it much easier to learn comparing with the real-world DBMS developed in C/C++/Java.

TODO

  • Add more optimization rules that are commonly applied by general RDBMS
  • Implement Volcano-style optimizer (currently DBSim only supports heuristic (rule-based) optimizer)
  • Develop rule definition language to support simpler claims of custom optimization rules
  • Integrate more performance analyzing and visualization tools

Install

DBSim requires Python 3.6+ . To install necessary dependencies, run the following command:

pip install -r requirements.txt

Run the demo

python demo.py

Examples

We use some basic examples to briefly present DBsim here. Please see dbsim/examples/ and dbsim/tests/ for more detailed examples.

  1. Execute a query end-to-end

    (1) without query optimizer

    from dbsim import dataset as ds
    from dbsim.tests.fixtures.demo_adapter import DemoAdapter
    from dbsim.query_parser import parse_statement
    from dbsim.query import Query
    
    dataset = ds.DataSet()
    dataset.add_adapter(DemoAdapter())
    sql = """
        SELECT musical.title, musical.year
        FROM 
          (SELECT * 
            FROM 
              (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) 
            WHERE 
              animation.mid < 3000
          )
        WHERE musical.year > 1960
    """
    for row in Query(dataset, parse_statement(sql)):
      print(row)

    (2) with query optimizer

    from dbsim import dataset as ds
    from dbsim.tests.fixtures.demo_adapter import DemoAdapter
    from dbsim.query_parser import parse_statement
    from dbsim.query import Query
    from dbsim.planners import rules
    from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner
    
    dataset = ds.DataSet()
    dataset.add_adapter(DemoAdapter())
    planner = HeuristicPlanner(max_limit = float('Inf'))
    planner.addRule(rules.FilterMergeRule())
    planner.addRule(rules.FilterPushDownRule())
    planner.addRule(rules.Selection_SimSelection_Swap_Rule())
    
    sql = """
        SELECT musical.title, musical.year
        FROM 
          (SELECT * 
            FROM 
              (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) 
            WHERE 
              animation.mid < 3000
          )
        WHERE musical.year > 1960
    """
    for row in Query(dataset, parse_statement(sql), optimizer=planner):
      print(row)
  2. Parse a query into abstract syntax tree(AST) and visualize the AST

    from dbsim.query_parser import parse_statement
    from dbsim.utils.visualizer import LogicalPlanViz 
    
    sql = """
        SELECT musical.title, musical.year
        FROM musical
        WHERE musical.year > 1960
    """
    ast = parse_statement(sql)
    LogicalPlanViz.show(ast, view=True)
  3. Manually optimize a logical plan and visualize the resulting plan

    from dbsim import dataset as ds
    from dbsim.tests.fixtures.demo_adapter import DemoAdapter
    from dbsim.query_parser import parse_statement
    from dbsim.query import Query
    from dbsim.planners import rules
    from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner
    from dbsim.utils.visualizer import LogicalPlanViz 
    
    dataset = ds.DataSet()
    dataset.add_adapter(DemoAdapter())
    
    planner = HeuristicPlanner(max_limit = float('Inf'))
    planner.addRule(rules.FilterMergeRule())
    planner.addRule(rules.FilterPushDownRule())
    planner.addRule(rules.Selection_SimSelection_Swap_Rule())
    
    sql = """
        SELECT musical.title, musical.year
        FROM 
          (SELECT * 
            FROM 
              (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) 
            WHERE 
              animation.mid < 3000
          )
        WHERE musical.year > 1960
    """
    plan = Query(dataset, parse_statement(sql)).getPlan()
    best_plan = planner.findBestPlan(plan)
    LogicalPlanViz.show(best_plan, view=True)
  4. **Note: ** We already extended the standard SQL syntax in DBSim as examples to show its extensibility. Please see the documentation for further tutorials. But at this step you can quickly try such an extended-SQL-syntax query in any of the code snippets above:

    sql = """
        SELECT musical.title, musical.year
        FROM 
          (SELECT * 
            FROM 
              (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) 
            WHERE 
              animation.embedding to [1,2,3,4] < 10
          )
        WHERE musical.year > 1960
    """
  5. More examples:

    To try more examples, run the following command (DO NOT cd to dbsim/ or dbsim/examples, just run the command here, i.e., in the root path of this repo):

    python dbsim/examples/<example_filename>.py

    To run the tests in dbsim/tests/, you can use Pytest or manually call each testing function.

Run the GUI

In the repo root path, run following command to start the GUI:

python dbsim/gui/backend/server.py

Then open your browser and go to the link localhost:8888. Please read our paper Extensible Database Simulator for Fast Prototyping In-Database Algorithms for more details on how to use the GUI. The GUI is only tested on Microsoft Edge, and currently we do not guarantee its compatibility with other browsers.

Note:

(1) To run a query, you need to let the query input box lose focus, like clicking anywhere outside the box.

(2) We provide example datasets and query in the GUI when it is started. You can try with them or use your own datasets and queries.

Documentation

See the Wiki of this repo.

Acknowledgement

DBSim is developed based on Splicer (trivio/splicer: Splicer - adds relation querying (SQL) to any python project (github.com)). We also borrowed some ideas from Apache Calcite.

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@misc{https://doi.org/10.48550/arxiv.2204.09819,
  doi = {10.48550/ARXIV.2204.09819},
  url = {https://arxiv.org/abs/2204.09819},
  author = {Wang, Yifan and Wang, Daisy Zhe},
  title = {Extensible Database Simulator for Fast Prototyping In-Database Algorithms},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

dbsim's People

Contributors

wyfunique avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.