GithubHelp home page GithubHelp logo

Comments (7)

pwentrys avatar pwentrys commented on June 22, 2024 1

Just wanted to say I strongly agree with you on your first point, @asuhan. I would argue that even if you come up with a need case, it wouldn't be the best (going by fastest) solution. If you're doing this with that much data, the person with the use-case might need to adjust other parts in their system's flow. I view any select * from foo statement to be poor practice (lack of future-proofing) and "unreasonable" because you just opened the door for many errors that will be done by others working with the system.

from heavydb.

asuhan avatar asuhan commented on June 22, 2024

@neko940709 Thanks for reporting, which query? It usually means failure to optimize away an intermediate projection. We don't want to let it run anyway because it'd take a very long time / use a lot of memory.

The larger point: we're aware that we need to work towards TPC-H compatibility. Over the next few months, we plan to get most / all of the queries working. Several pieces of work related to this goal are already in progress.

from heavydb.

neko940709 avatar neko940709 commented on June 22, 2024

The query is like this:
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where l_orderkey = o_orderkey and c_custkey = o_custkey and c_mktsegment = 'BUILDING' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate;

The query executes on V2.17.2 TPC-H benchmark which I used to generate 10GB dataset. This query is also a provided sample and I simplify it. Actually, I have met this exception several times.
The first time is doing the following query:
select * from newdata;
The NEWDATA table consists of 3 columns, all the columns are INTEGER type. The size of Newdata table is about 2.5GB. If you said it's a failure of the intermediate projection, I DONT understand why this simple query will also throw exception?
This is the log message:
I0524 23:10:21.887410 132126 MapDHandler.cpp:2295] select * from newdata; I0524 23:10:21.888273 132126 Calcite.cpp:196] User mapd catalog mapd sql 'select * from newdata; I0524 23:10:21.897106 132126 Calcite.cpp:221] Time marshalling in JNI 0 (ms), Time in Java Calcite 9 (ms) E0524 23:10:22.045779 132126 MapDHandler.cpp:2331] Exception: Query would require a scan without a limit on table(s): newdata

from heavydb.

neko940709 avatar neko940709 commented on June 22, 2024

More, there is other exception I've got during the TPC-H test.Some of the sql will make the mapd_server crash. Here is an example:
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%SMALL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' order by s_acctbal desc, n_name, s_name, p_partkey;
This query throws exception like this:
Check failed: static_cast<size_t>(curr_idx) < target_count_ (24 vs. 24)

from heavydb.

neko940709 avatar neko940709 commented on June 22, 2024

Apart from the TPC-H data, I ran the queries you give on your official website. The query was executed on the FLIGHT data in 2.6GB size. The query is the following:
select count(*) from flights_2008_7M where origin_name='Lambert-St Louis International' and dest_name = 'Lincoln Municipal;
And the exception is:
The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
How should I modify the sql ?

from heavydb.

asuhan avatar asuhan commented on June 22, 2024

I'll answer the questions one by one.

  1. select * from newdata tries to select everything from a big table. It's a projection without a "reasonable" limit (I think we ask it to be at most 10M). If you want to see a sample of the data, you can specify a limit and it'll work. Allowing this type of query without a limit will need us to implement pagination. I believe we'll end up doing it (for the sake of least surprise principle), but we haven't found an use-case which wouldn't be served by specifying a reasonable limit.
  2. It's not an exception, it's a crash. We'll look into it very soon.
  3. You shouldn't modify the SQL, we'll fix it. Your query misses the end quote for the second literal string, should be'Lincoln Municipal', not 'Lincoln Municipal. We'll check / fix it on our website, thanks for the catch.

from heavydb.

randyzwitch avatar randyzwitch commented on June 22, 2024

Closing this issue for now, but please feel free to start another discussion at https://community.mapd.com/ if this is still a problem.

from heavydb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.