Hash Join exception and multi-table JOIN?? about heavydb HOT 13 CLOSED

heavyai commented on July 20, 2024

Hash Join exception and multi-table JOIN??

from heavydb.

Comments (13)

m1mc commented on July 20, 2024

Hi @neko940709, thanks for your interest in MapD. The exception means MapD failed to build hash table for b.imsi. There are some reasons like whether b.imsi is unique in value b/c we are still working on one to many hash join and the supported loop join on such big inner table would be unacceptably slow to you. Can you swap inner and outer table? We also have some limitations on the types of a.imsi and b.imsi for now so I'd ask you for them.
We support multi-way join even if the inner table is from inner query. Did you run into any failure?

Sent from my Motorola Nexus 6 using FastHub

from heavydb.

neko940709 commented on July 20, 2024

Thanks for your answer. I swap the table order of the query, but I get the same exception. The imsi attribute is varchar(32) both table of the tables. For the JOIN operation has some strictions, I haven't run it successfully before. Can you give me some examples of the multi-way join and the join of inner query? I found some cases in your official website(the data is about the flight info), but there's no JOIN in those queries.

from heavydb.

m1mc commented on July 20, 2024

For text, we only support dictionary encoded text as join key. Can you reimport the data using this encoding?
You can find some info about join support here:
https://www.mapd.com/docs/latest/mapd-core-guide/dml/#table-expression-and-join-support
and some example here:
https://www.mapd.com/docs/latest/mapd-core-guide/dml/#explain-calcite

Sent from my Motorola Nexus 6 using FastHub

from heavydb.

asuhan commented on July 20, 2024

@neko940709 Is imsi unique in either of the two tables? What are the type and the encoding?

Work is under way and I'm confident we'll get these queries running pretty soon. Do you mind telling us the schemas (type and encoding) for all the tables involved, how many rows are in each table and the distinct number of values in the columns involved in the join? This will help us cover all bases. Thanks!

from heavydb.

neko940709 commented on July 20, 2024

This is the DISTINCT result of these two tables:

mapdql> select distinct count(test_a.imsi) from test_a;
EXPR$0
19999998
mapdql> select count(test_a.imsi) from test_a;
EXPR$0
19999998
mapdql> select distinct count(test_zc1.imsi) from test_zc1;
EXPR$0
79999998
mapdql> select count(test_zc1.imsi) from test_zc1;
EXPR$0
79999998

So the imsi is unique for sure. The CREATE TABLE expressions are the following:
test_a table:
mapdql> create table test_a( id varchar(32), start_time varchar(32), imsi text encoding dict, imei varchar(32), host varchar(64), upload bigint, download bigint, lac varchar(32), ci varchar(32), user_ip varchar(32) );

test_zc1 table:
create table test_zc1( month_number varchar(2), stat_month varchar(8), user_id varchar(32), customer_id varchar(32), brand_id varchar(32), big_brand_p varchar(32), product_class varchar(32), product_id varchar(32), area_id varchar(32) , ic_no varchar(32), age varchar(32), sex varchar(32), user_status varchar(32), innet_date varchar(32), apply_date varchar(32), suspend_date varchar(32), terminate_date varchar(32), is_full varchar(32), firstcall_date varchar(32), firstcall_flag varchar(32), public_flag varchar(32), user_site varchar(32), user_type varchar(32), feeset_type varchar(32), is_group_user varchar(32), is_vip varchar(32), vip_grade varchar(32), vip_type varchar(32), is_special varchar(32), is_camera varchar(32), is_high_camera varchar(32), vip_camera_type varchar(32), fee_consume varchar(32), town_id varchar(32), account_id varchar(32), contect_tel varchar(32), contect_addr varchar(256), contect_email varchar(32), country_type varchar(32), imsi text encoding dict, phone_mgr varchar(32), vip_intype varchar(32), regstatus varchar(32), is_985schoool_user varchar(32), is_211schoool_user varchar(32) );

I've followed the suggestion to modify the type of the imsi into text encoding dict , but still got the exception.
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_zc1 b ON a.imsi =b.imsi; E0702 12:26:36.712345 103746 MapDHandler.cpp:2331] Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin

from heavydb.

neko940709 commented on July 20, 2024

More, if I want to see the table structure through mapdql , what command can I use?
And I find new version of Mapd has released. How to update the current version into the newest?

from heavydb.

asuhan commented on July 20, 2024

Thanks! You've said that "The test_a table contains 200 millions of records, and the test_zc1 contains 800 millions", in which case the distinct counts are a lot lower than the total number of rows, which would explain the hash join failure. Can we get the COUNT(*) on the tables as well? 80M distinct entries is not a lot, we should be able to do it even without (soon to be available) sharding. I'll generate some data and look into what's going on next week.

Regarding the upgrade, let's ask @andrewseidl and @dwayneberry.

from heavydb.

asuhan commented on July 20, 2024

To see a table structure, use \d <table name>.

from heavydb.

neko940709 commented on July 20, 2024

Thanks for the info you provide! And here is the count(*) result:

mapdql> select count(*) from test_a;
EXPR$0
19999998
mapdql> select count(*) from test_zc1;
EXPR$0
79999998
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_zc1 b ON a.imsi =b.imsi;
E0702 14:38:47.448840 103746 MapDHandler.cpp:2331] Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin
Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin

from heavydb.

dwayneberry commented on July 20, 2024

copied from community forum for completeness:

I ran the following step with the latest build of mapd-core open source code

create data

awk 'BEGIN { for (i = 1; i <= 80000000; ++i) print i,",",i }' > test_a.csv
awk 'BEGIN { for (i = 1; i <= 20000000; ++i) print i,",",i }' > test_b.csv

I then ran the following queries in mapdql which seems to duplicate your steps

mapdql> create table test_a (imsi text, download bigint);
mapdql> create table test_b (b_imsi text, b_download bigint);
mapdql> copy test_a from '/data/test_a.csv' with (header='false');
Result
Loaded: 80000000 recs, Rejected: 0 recs in 111.367000 secs
mapdql> copy test_b from '/data/test_b.csv' with (header='false');
Result
Loaded: 20000000 recs, Rejected: 0 recs in 30.025000 secs
mapdql> select count(*) from test_a;
EXPR$0
80000000
mapdql> select distinct count(imsi) from test_a;
EXPR$0
80000000
mapdql> select count(*) from test_b;
EXPR$0
20000000
mapdql> select distinct count(b_imsi) from test_b;
EXPR$0
20000000
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_b b ON a.imsi =b.b_imsi;
EXPR$0|EXPR$1
20000000|200000010000000
mapdql> \version
MapD Server Version: 3.1.1dev-20170702-2966fac6
mapdql>

Could you try these steps and see where your behaviour diverges as based on your details above this should simulate roughly your environment.

Please confirm exactly which version of MapD you are running

regards

from heavydb.

neko940709 commented on July 20, 2024

Hi @dwayneberry ,thanks for your answer.
I follow your steps and the final JOIN is successful, then I rewrite the data generator and have the new data to try JOIN, it success again. But when I try the old data, it still throws exception. I guess maybe there's something wrong with the old data, but I cannot figure it out. Though, thanks a lot for your help.
And, the Mapd version I'm running is:

mapdql> \version
MapD Server Version: 3.0.1dev-20170523-5a5dcc2

I wanna know how to update to the newest release?

from heavydb.

dwayneberry commented on July 20, 2024

@neko940709 Are you running the community edition? Or is this a build you made from the repo?

Basically to upgrade you just run the new binary pointing to your existing data directory.

In the case of community edition to get latest version, go back to your email with the download link and get a new version. That link will always fetch you the latest version.

In the case of you having built it yourself from the open source repo, you would fetch the latest code and then rebuild.

from heavydb.

randyzwitch commented on July 20, 2024

Closing this issue, but if this issue still persists with one of our current 4.x releases, please feel free to open a new issue or start a conversation at https://community.mapd.com/

from heavydb.

Hash Join exception and multi-table JOIN?? about heavydb HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs