Comments (7)
Similar thing happens to cross join: #11927
from polars.
seems left join also wrong
df = pl.DataFrame({
"id":['1','2','3']
})
df2 = pl.DataFrame({
"id":['4','5','6']
})
ctx = pl.SQLContext(df=df, df2=df2)
lp1=ctx.execute(
"""
SELECT df.id as id1,df2.id as id2
FROM df
left JOIN df2
on df.id=df2.id
""",
eager=False,
)
lp1.collect()
shape: (3, 2)
┌─────┬─────┐
│ id1 ┆ id2 │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 2 │
│ 3 ┆ 3 │
└─────┴─────┘
should returns
shape: (3, 2)
┌─────┬─────┐
│ id1 ┆ id2 │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ 1 ┆ null │
│ 2 ┆ null │
│ 3 ┆ null │
└─────┴─────┘
from polars.
@universalmind303 can you take a look here? It seems we create wrong plans somehow, as it only happens in SQL.
from polars.
Triage:
-
For the first query: we don't actually recognise/support implicit join syntax (at all) yet, so it gets parsed (incorrectly) as a simple filter op where
A == B
. I'll have to see if we can identify the implicit join syntax and either generate the equivalent inner join or raise an error. -
For the second query: it seems we aren't resolving the post-join column selection properly here. We're doing the join correctly, but then returning only the cols from the left hand table. Will look at this first 🤔
from polars.
SELECT t.A, t.fruits, t1.B, t1.cars FROM t, t1 WHERE t.A=t1.B
still returns wrong result in version 10. 20.31
from polars.
SELECT t.A, t.fruits, t1.B, t1.cars FROM t, t1 WHERE t.A=t1.B
still returns wrong result in version 10. 20.31
Indeed - as mentioned earlier we don't support implicit join syntax at the moment; can you confirm that the other error is fixed and open an issue for implicit join syntax as a new feature request? (Easier for us to track open/closed issues if they are distinct) 😎👍
from polars.
confirmed the inner join syntax works now
from polars.
Related Issues (20)
- Add an `is_not_in` expression HOT 1
- Inconsistent default parameter in polars Rust vs Python rolling window
- `reshape` + `.arr.to_struct()` capacity overflow PanicException
- from_dicts without strict=false can result in silent data loss with ragged data HOT 1
- Add `polars` to SQL query translation for databases (like `dbplyr`).
- `DataFrame.to_dicts` change str values HOT 3
- assert_frame_equal's error is too verbose
- Crash with parallel cumulative_eval HOT 2
- Adding with_row_index() to a CSV LazyFrame does not always add the column HOT 2
- Categorical cast causes out-of-bounds gather access with string cache turned on
- As of polars 1.8.x (but not 1.7.x), partition_by with as_dict=True strips leading zeros from strings on the partition column HOT 4
- Broadcasting issue HOT 13
- Support datetime arithemetic within lists
- Support arithmetic operations between numeric List Series and a scalar HOT 1
- with_columns(pl.lit("02").alias('TEST')) is truncating leading zeroes of numeric-only strings HOT 4
- Make `ColumnNotFoundError` in `with_columns` as nice as it is in `columns` from `read_csv` HOT 1
- Cannot create literal for pd.Timedelta with allow_object=False
- Issue reading S3 files HOT 1
- `write_ndjson` produces incorrect data that does not match dataframe HOT 1
- Exporting StringView Array to Arrow C Data Interface sets wrong number of buffers HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.