Comments (13)
Hi @neko940709, thanks for your interest in MapD. The exception means MapD failed to build hash table for b.imsi
. There are some reasons like whether b.imsi
is unique in value b/c we are still working on one to many hash join and the supported loop join on such big inner table would be unacceptably slow to you. Can you swap inner and outer table? We also have some limitations on the types of a.imsi
and b.imsi
for now so I'd ask you for them.
We support multi-way join even if the inner table is from inner query. Did you run into any failure?
Sent from my Motorola Nexus 6 using FastHub
from heavydb.
Thanks for your answer. I swap the table order of the query, but I get the same exception. The imsi
attribute is varchar(32)
both table of the tables. For the JOIN operation has some strictions, I haven't run it successfully before. Can you give me some examples of the multi-way join and the join of inner query? I found some cases in your official website(the data is about the flight info), but there's no JOIN in those queries.
from heavydb.
For text, we only support dictionary encoded text as join key. Can you reimport the data using this encoding?
You can find some info about join support here:
https://www.mapd.com/docs/latest/mapd-core-guide/dml/#table-expression-and-join-support
and some example here:
https://www.mapd.com/docs/latest/mapd-core-guide/dml/#explain-calcite
Sent from my Motorola Nexus 6 using FastHub
from heavydb.
@neko940709 Is imsi
unique in either of the two tables? What are the type and the encoding?
Work is under way and I'm confident we'll get these queries running pretty soon. Do you mind telling us the schemas (type and encoding) for all the tables involved, how many rows are in each table and the distinct number of values in the columns involved in the join? This will help us cover all bases. Thanks!
from heavydb.
This is the DISTINCT result of these two tables:
mapdql> select distinct count(test_a.imsi) from test_a;
EXPR$0
19999998
mapdql> select count(test_a.imsi) from test_a;
EXPR$0
19999998
mapdql> select distinct count(test_zc1.imsi) from test_zc1;
EXPR$0
79999998
mapdql> select count(test_zc1.imsi) from test_zc1;
EXPR$0
79999998
So the imsi
is unique for sure. The CREATE TABLE expressions are the following:
test_a table:
mapdql> create table test_a( id varchar(32), start_time varchar(32), imsi text encoding dict, imei varchar(32), host varchar(64), upload bigint, download bigint, lac varchar(32), ci varchar(32), user_ip varchar(32) );
test_zc1 table:
create table test_zc1( month_number varchar(2), stat_month varchar(8), user_id varchar(32), customer_id varchar(32), brand_id varchar(32), big_brand_p varchar(32), product_class varchar(32), product_id varchar(32), area_id varchar(32) , ic_no varchar(32), age varchar(32), sex varchar(32), user_status varchar(32), innet_date varchar(32), apply_date varchar(32), suspend_date varchar(32), terminate_date varchar(32), is_full varchar(32), firstcall_date varchar(32), firstcall_flag varchar(32), public_flag varchar(32), user_site varchar(32), user_type varchar(32), feeset_type varchar(32), is_group_user varchar(32), is_vip varchar(32), vip_grade varchar(32), vip_type varchar(32), is_special varchar(32), is_camera varchar(32), is_high_camera varchar(32), vip_camera_type varchar(32), fee_consume varchar(32), town_id varchar(32), account_id varchar(32), contect_tel varchar(32), contect_addr varchar(256), contect_email varchar(32), country_type varchar(32), imsi text encoding dict, phone_mgr varchar(32), vip_intype varchar(32), regstatus varchar(32), is_985schoool_user varchar(32), is_211schoool_user varchar(32) );
I've followed the suggestion to modify the type of the imsi
into text encoding dict , but still got the exception.
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_zc1 b ON a.imsi =b.imsi; E0702 12:26:36.712345 103746 MapDHandler.cpp:2331] Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin
from heavydb.
More, if I want to see the table structure through mapdql , what command can I use?
And I find new version of Mapd has released. How to update the current version into the newest?
from heavydb.
Thanks! You've said that "The test_a table contains 200 millions of records, and the test_zc1 contains 800 millions", in which case the distinct counts are a lot lower than the total number of rows, which would explain the hash join failure. Can we get the COUNT(*)
on the tables as well? 80M distinct entries is not a lot, we should be able to do it even without (soon to be available) sharding. I'll generate some data and look into what's going on next week.
Regarding the upgrade, let's ask @andrewseidl and @dwayneberry.
from heavydb.
To see a table structure, use \d <table name>
.
from heavydb.
Thanks for the info you provide! And here is the count(*)
result:
mapdql> select count(*) from test_a;
EXPR$0
19999998
mapdql> select count(*) from test_zc1;
EXPR$0
79999998
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_zc1 b ON a.imsi =b.imsi;
E0702 14:38:47.448840 103746 MapDHandler.cpp:2331] Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin
Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin
from heavydb.
copied from community forum for completeness:
Hi
I ran the following step with the latest build of mapd-core
open source code
create data
awk 'BEGIN { for (i = 1; i <= 80000000; ++i) print i,",",i }' > test_a.csv
awk 'BEGIN { for (i = 1; i <= 20000000; ++i) print i,",",i }' > test_b.csv
I then ran the following queries in mapdql which seems to duplicate your steps
mapdql> create table test_a (imsi text, download bigint);
mapdql> create table test_b (b_imsi text, b_download bigint);
mapdql> copy test_a from '/data/test_a.csv' with (header='false');
Result
Loaded: 80000000 recs, Rejected: 0 recs in 111.367000 secs
mapdql> copy test_b from '/data/test_b.csv' with (header='false');
Result
Loaded: 20000000 recs, Rejected: 0 recs in 30.025000 secs
mapdql> select count(*) from test_a;
EXPR$0
80000000
mapdql> select distinct count(imsi) from test_a;
EXPR$0
80000000
mapdql> select count(*) from test_b;
EXPR$0
20000000
mapdql> select distinct count(b_imsi) from test_b;
EXPR$0
20000000
mapdql> SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_b b ON a.imsi =b.b_imsi;
EXPR$0|EXPR$1
20000000|200000010000000
mapdql> \version
MapD Server Version: 3.1.1dev-20170702-2966fac6
mapdql>
Could you try these steps and see where your behaviour diverges as based on your details above this should simulate roughly your environment.
Please confirm exactly which version of MapD you are running
regards
from heavydb.
Hi @dwayneberry ,thanks for your answer.
I follow your steps and the final JOIN is successful, then I rewrite the data generator and have the new data to try JOIN, it success again. But when I try the old data, it still throws exception. I guess maybe there's something wrong with the old data, but I cannot figure it out. Though, thanks a lot for your help.
And, the Mapd version I'm running is:
mapdql> \version
MapD Server Version: 3.0.1dev-20170523-5a5dcc2
I wanna know how to update to the newest release?
from heavydb.
@neko940709 Are you running the community edition? Or is this a build you made from the repo?
Basically to upgrade you just run the new binary pointing to your existing data directory.
In the case of community edition to get latest version, go back to your email with the download link and get a new version. That link will always fetch you the latest version.
In the case of you having built it yourself from the open source repo, you would fetch
the latest code and then rebuild.
from heavydb.
Closing this issue, but if this issue still persists with one of our current 4.x releases, please feel free to open a new issue or start a conversation at https://community.mapd.com/
from heavydb.
Related Issues (20)
- [GPU Logic Bug] SELECT DISTINCT <column> FROM <table> ORDER BY 1 DESC LIMIT 10 Brings Errors HOT 1
- [GPU Error Bug] <column> NOT IN <column(overflow)> Brings Errors
- ERR_OUT_OF_CPU_MEM: Not enough host memory to execute the query HOT 2
- [GPU Error Bug] SELECT <column> FROM <table> WHERE <column> OR <column> OR CAST(<number> + CAST( <column> AS INT) AS BOOLEAN) Brings Errors
- [Crash Bug] INSERT INTO <table>(<column>, <column>) VALUES(TRUE, TRUE) Brings Errors
- [Crash Bug] SELECT <column> FROM <table> JOIN <table> ON FALSE Brings Errors HOT 1
- [Crash Bug] SELECT * FROM <table> JOIN <table> ON CAST(<number> AS BOOLEAN) WHERE FALSE Brings Errors HOT 2
- [Crash Bug] SELECT * FROM <table> JOIN <table> ON NULL WHERE FALSE Brings Errors HOT 2
- [GPU Logic Bug] SELECT DISTINCT <column> FROM <table> WHERE CAST(<column> AS INT) != 1 Brings Errors
- [GPU Error Bug] SELECT * FROM <table> WHERE ((<column> + <column>) < <column>) OR (<column> = <column>) Brings Errors HOT 1
- golang python HOT 10
- [GPU Error Bug] SELECT * FROM <table> JOIN ( SELECT ALL <number> FROM <table>) AS <alias> Brings Errors
- [GPU Error Bug] CAST(<column>+<column>(overflow) AS BOOLEAN) Brings Errors
- Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) HOT 1
- Intermitted SIGSEGV errors crashing heavyDB HOT 6
- Cannot import on an individual leaf. Please import from the Aggregator. HOT 1
- pinned memory HOT 2
- Failed to compile heavyDB; CUDA architecture not detected HOT 3
- Some demos on the website are not working or outdated HOT 1
- Error Running HeavyDB with Nvidia Nsight Compute: Broken Pipe in Thrift Connection HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from heavydb.