GithubHelp home page GithubHelp logo

mahengyang / openmldb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from 4paradigm/openmldb

0.0 1.0 0.0 83.87 MB

OpenMLDB is an open-source database particularly designed to efficiently provide consistent data for machine learning driven applications.

License: Apache License 2.0

CMake 0.68% Shell 0.74% Python 7.53% C++ 71.89% Java 15.06% SWIG 0.33% JavaScript 0.01% Scala 3.58% LLVM 0.08% Makefile 0.08% Dockerfile 0.03%

openmldb's Introduction

build status docker pulls slack discuss codecov release license gitee maven central maven central pypi

English version | 中文版

1. Introduction

OpenMLDB is an open-source database particularly designed to efficiently provide consistent data for machine learning. A database for machine learning consists of two major tasks: feature extraction and feature access, which are served as data provisioning for offline training and online inference. Without OpenMLDB, there are two separate systems for online and offline data provisioning, which cost significant effort to verify the online-offline consistency. On the contrary, OpenMLDB supports the unified SQL programming and its execution engine for both online and offline data provisioning. As a result, the online-offline consistency is inherently guaranteed. Moreover, the system is carefully designed and optimized to ensure the efficiency. By taking advantages of OpenMLDB, database engineers are now able to write SQL scripts only to efficiently provide consistent data to machine learning, and an offline model can be immediately deployed for online serving with little cost involved.

image-20211103103052252

The above figure illustrates the OpenMLDB workflow. SQL engineers first write SQL scripts for offline feature extraction, which provides data for offline model training. When the model quality is satisfied, the online feature extraction and access can be enabled immediately for online serving without additional efforts involved. Thanks to the unified SQL programming and execution engine, the online-offline consistency verification is eliminated, which is inherently guaranteed by OpenMLDB. Furthermore, certain optimization techniques (e.g., data skew optimization and in-memory indexing for offline and online feature extraction, respectively) are adopted to ensure that the performance requirement can be met for both offline training and online inference. In summary, OpenMLDB enables SQL as the only programming interface for consistent and efficient data provisioning for both offline model training and online inference serving.

2. Highlight Features

2.1. SQL Programming APIs

We believe SQL is the most suitable programming APIs for feature engineering because of its elegant design and popularity. OpenMLDB enables SQL as the programming APIs for developers for both offline and online feature extraction. Besides, we extend the capability of standard SQL and make it more powerful for feature extraction.

2.2 Online-Offline Consistency

Based on the SQL programming APIs, we design an unified execution engine for both online and offline feature extraction. As a result, the online-offline consistency is inherently guaranteed by OpenMLDB with no other cost.

2.3. Efficiency

We propose a few techniques to improve the performance for both offline and online feature extraction. As a result, our offline feature extraction can be significantly faster than existing opensource bigdata processing frameworks. Moreover, our online service can provide low latency (tens of milliseconds) to meet the performance requirement of online inference.

You can read our below section (7. Publications & Blogs) for more technical detail.

2.4 Integrated CLI

We provide a powerful integrated CLI for SQL programming, job management, online and offline deployment, and database administration. Developers who are familiar with database's CLIs should be very comfortable with our tool.

Note that, the CLI of current release 0.3.0 supports the cluster mode partially. It will be fully supported in the next release of 0.4.0

3. Build & Install

👉 Read more

4. Demo & QuickStart

Since OpenMLDB v0.3.0, we have introduced two operating modes, which are cluster mode and standalone mode. The cluster mode is suitable for large-scale datasets and real-world applications, which provides the scalability and high-availability. On the other hand, the lightweight standalone mode running on a single node is ideal for small businesses and demonstration.

We demonstrate the workflow using the cluster and standalone modes:

5. Roadmap

We list a few highlight features that we have planned in the future releases. Please join our community to understand more about our planning and discuss your ideas.

Version Est. release date Highlight features
0.4.0 End of 2021 - Full support of standalone and cluster modes in the integrated CLI
0.5.0 2022 Q1 - Monitoring APIs and tools for online serving
- Efficient queries over a fairly long period of time by window functions
- Kafka/Pulsar connector support for online data source

6. Community

You may join our community for feedback and discussion

  • Email: [email protected]

  • Slack Workspace: You may find useful information of release notes, user support, development discussion and even more from our various Slack channels.

  • GitHub Issues and Discussions: If you are a serious developer, you are most welcome to join our discussion on GitHub. GitHub Issues are used to report bugs and collect new requirements. GitHub Discussions are mostly used by our project maintainers to publish and comment RFCs.

  • Blogs (Chinese)

  • WeChat Groups (Chinese):

    img

7. Publications & Blogs

openmldb's People

Contributors

tobegit3hub avatar wuyou10206 avatar jingchen2222 avatar aceforeverd avatar imotai avatar zekai427 avatar elasticlog avatar dl239 avatar vagetablechicken avatar yjrobin avatar dependabot[bot] avatar mahengyang avatar magnetowang avatar lumianph avatar peizhaoyou avatar xuman2019 avatar zhanghaohit avatar altale avatar huilinwu2 avatar nicholas-sr avatar luyuxiao211 avatar huqianshan avatar shawn-happy avatar nhankiet avatar shouren avatar ashish-patwal avatar lotabout avatar rhnsharma avatar cc004 avatar heiyan1shengdun avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.