tensorbase / tensorbase Goto Github PK
View Code? Open in Web Editor NEWTensorBase is a new big data warehousing with modern efforts.
Home Page: https://tensorbase.io/
License: Apache License 2.0
TensorBase is a new big data warehousing with modern efforts.
Home Page: https://tensorbase.io/
License: Apache License 2.0
generally, all support to fixed length type is easy to add. Just pick up one currently implemented type as an example:)
Decimal32: Decimal(9,2)
Decimal64: Decimal(9,2)
more are here: https://clickhouse.tech/docs/en/sql-reference/data-types/decimal/
...
The easiest way to support builtin functions seems to be by using ScalarUDF or AggregateUDF.
To support this the following changes would have to be made
lightjit::builtins
We have to convert the functions into type datafusion::ScalarUDF
.
Also, add a get_udf()
function that matches a string to the function.
Should we still have these located here or maybe move them into a different crate?
lang::parse
add a new function parse_builtins(p)
. This would be similar to the current parse_tables(p)
function but look for any builtins and return a HashSet
of the ones found.
engine::run()
add let builtins = parse::parse_builtins(p)?;
. We would have to also add this field to datafusion::run
the same as tabs
or cols
.
let (tabs, cols) = parse::parse_tables(p)?;
let builtins = parse::parse_tables(p)?; // <--- New
log::debug!("projections - tabs: {:?}, cols: {:?}", tabs, cols);
datafusions::run(ms, ps, current_db, raw_query, query_id, tabs, cols, builtins, qs) // <- also pass builtins
engine::datafusion::run()
Before running we can check the sql
if any builtins are used. If used all we need is to cxt.register_udf(builtin)
.
Sudo code
let mut ctx = ExecutionContext::new();
...
if !builtins.is_empty() {
for f in builtins.drain() {
let udf = get_udf(f)?;
ctx.register_udf();
}
}
let df = ctx.sql(raw_query)?;
...
partition pruning based on the condition of partition key
Arrow-DataFusion has already supported the parts of TPC-H. But TensorBase does not support the storage of all that data types. To enable this benchmarks, it makes TensorBase more feature-mature.
From the Arrow-DataFusion, we should support the following type: DataType::Float64, DataType::Utf8, DataType::Date32. However, this is not economical and performance way. Firstly, it suggest enable Decimal, String, Datetime.
@jinmingjian , Jin, I attempted to connect to the TB server using the Clickhouse JDBC driver (pulled in by DBeaver) on port 9528, but the connection attempt times out on the read. I tried configuring the connection properties both with no Database/Schema specified as well as with default. I also tried with the No authentication option set. Below are screen captures illustrating the connection properties and the connection error. Note that I also confirmed that the firewall is turned off.
When click the Slack Channel
link in README or in official website, it redirects you to tensorbase's official Slack Channel link: https://tensorbase.slack.com/, but without an invitation. So I think newcomers cannot log in.
I can log in other slack workspaces like Kubernetes, so I guess it's just because tensorbase's slack link is not an invitation link. See that of k8s', there is a button says "GET MY INVITE" for people who are not in the group.
Hello,
I am very interested in your project and I am attempting to begin testing it out. However, the documentation for tools that exist in m0 does not seem to be accurate (baseops, baseshell). Subsequently, attempting to use the clickhouse client to create a very simple table using ddl fails. I am not sure what to use for ENGINE although it appears to be required and using MergeTree fails. I tried with and without ORDER BY. Any assistance you can provide would be greatly appreciated.
-Chris Whelan
TensorBase :) create table sales (title string) ENGINE = MergeTree ORDER BY title;
CREATE TABLE sales
(
`title` string
)
ENGINE = MergeTree
ORDER BY title
Query id: 22fd667c-851d-4087-9fb7-5a58128003de
0 rows in set. Elapsed: 0.001 sec.
Received exception from server (version 2021.3.0):
Code: 3. DB::Exception: Received from localhost:9528. WrappingLangError(ASTError). Error when AST processing.
if uses DataFusion Ballista, this feature may be easily achieved. not sure who wants to try firstly:)
part of #18
It's not an easy way to try tensorbase
.
The Blog Hello, Base has some docs about nyc_taxi
datasets benchmarks with ClickHouse. Yet the dataset is pretty large, it's hard for users to explorer tensorbase
quickly.
Maybe we can implement some table functions like numbers
or number_mt
in ClickHouse.
should be partially ok...
"getting started" section should work on windows 10 wsl2. but I have no machine for the verification temporarily...
it may adapt to other existed clients to some extent when necessary:
clickhouse, rdbc, odbc, jdbc, mysql, pg
a quick fix is using a 3rd party lib as slow path.
part of #18
via data-gen
part of #23
Building tensorbase may cost 5 ~ 6 minutes for linking.
Building [=======================> ] 332/333: server(bin)
What is the status of this project? I only see an initial commit m0 and hardly any coding changes afterwards. Is this project terminated?
Hi, the documentation indicates that the string data type is supported, but attempting to insert a string into an existing tables fails with NoFixedSizeDataTypeError.
SHOW CREATE TABLE sales
Query id: e96c5bbe-52ad-4fcc-9df0-1afdab76700e
┌─statement─────────────────────────────────────────────────┐
│ create table sales ( Region String ) ENGINE = BaseStorage │
└───────────────────────────────────────────────────────────┘
1 rows in set. Elapsed: 0.000 sec.
TensorBase :) insert into sales (Region) values ('North')
INSERT INTO sales (Region) VALUES
Query id: 98e40494-bbbc-461d-bb7d-7e7798987b4d
1 rows in set. Elapsed: 0.001 sec.
Received exception from server (version 2021.3.0):
Code: 4. DB::Exception: Received from localhost:9528. WrappingMetaError(NoFixedSizeDataTypeError). No fixed size for dynamic sized data type.
The same problem occurs with Decimal(x,y) data types.
another fixed length type
TB now already survives from the application crash. it is nice to have a WAL to protect against the kernel crash or machine sudden shutdown.
如题
error: failed to get `baselog` as a dependency of package `meta v0.1.0 (/Users/kaichen/Documents/projects/tensorbase/crates/meta)`
Caused by:
failed to load source for dependency `baselog`
Caused by:
Unable to update https://github.com/tensorbase/baselog.git
Caused by:
failed to find branch `master`
Caused by:
cannot locate remote-tracking branch 'origin/master'; class=Reference (4); code=NotFound (-3)
Is it possible to consider joining a foundation to ensure opensource continuity. Since this is AL 2.0 perhaps ASF may be a good fit if you can be accepted, otherwise there are other foundation like the Linux Foundation, Cloud Native Foundation, etc. which you can approach.
error[E0433]: failed to resolve: could not find `addr_of` in `ptr`
--> /Users/kaichen/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.40/src/error.rs:606:14
|
606 | ptr::addr_of!((*unerased.as_ptr())._object) as *mut E,
| ^^^^^^^ could not find `addr_of` in `ptr`
error[E0433]: failed to resolve: could not find `addr_of` in `ptr`
--> /Users/kaichen/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.40/src/error.rs:647:22
|
647 | ptr::addr_of!((*unerased.as_ptr())._object) as *mut E,
| ^^^^^^^ could not find `addr_of` in `ptr`
error: aborting due to 2 previous errors
For more information about this error, try `rustc --explain E0433`.
error: could not compile `anyhow`
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed
Utilizing the latest version of the Clickhouse-Native-JDBC driver available from Maven (2.5.4), I attempted to connect to TB, but an unsupported client version exception is thrown.
@jinmingjian Jin, please let me know if you would also like me to try testing programmatically via Java code directly.
basically we have, just need some meta works + some tests is enough:)
the logic in arrow/DF ignores timezone, but CH re-interpret the presentation according to the server's timezone.
Currently, TensorBase only supports single node mode. A single node may not have enough space for all the data and we need to store them in a distributed manner. By introducing components like Ballista, we can enable TB to support distributed storage and query.
Currently, a ClickHouse compatible SQL query will be parsed and passed to TB/engine, TB/engine will then invoke DataFusion to execute the query. To support distributed storage and query, we can add a distributed engine (e.g., Ballista) between TB/engine and DataFusion.
For example, when TB is configured to use Ballista to support distributed storage and query, TB/engine can act as a Ballista client and send ExecuteQuery to Ballista scheduler. The scheduler will then distribute the work to executer(s). For more details about the architecture of Ballista, please refer to this doc.
In the future, TB may support different distributed engines other than Ballista. We should be able to integrate them in a similar manner.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.