Comments (5)
I actually ran into this same issue from the other "end" recently when I fixed some bugs with subsets that were not monotonically specified. So this same bug is a problem if either:
- The
cols
to choose aren't monotonic. Or - The
subset_cols_indices
aren't monotonic.
I fixed the subset_cols_indices
problem by correcting the function that was creating the SplitMatrix and then requiring that input indices be monotonic.
Sorry about not realizing at the time that the cols
problem would also be an issue!
from tabmat.
@tbenthompson, if you could look into this I would like to get your opinion on whether we should spend the time fixing the _split_col_subsets
method or write a new method for the column subsetting.
from tabmat.
Yes, you're correct that _split_col_subsets
currently assumes the input columns are ordered monotonically. One thing we could do is sort the columns first, then call _split_col_subsets
and then "unsort" the outputs. That would probably be the easiest thing to do. In order to minimize the amount of Cython, it might be nice to implement that sort/unsort step in a python wrapper function rather than directly in the Cython code.
I could help with this tomorrow or soon. Let me know if that would be helpful.
from tabmat.
My PR currently contains a fix.
My solution was to create a column_map
attribute to the SplitMatrix
object. It stores the mapping from index location to actual location in the matrices.
Then, creating a _split_col_subsets
-like function that supports unordered columns and duplicate column was much more easy. I also modified the getcol
method to use this.
We can discuss this in the coming days. Downside to this is that we need to store an array of size n_cols
x 2, but I think it's worth it.
from tabmat.
@MarcAntoineSchmidtQC did this get fixed or did it get dropped when you closed your PR? Is it worth reviving just this fix?
from tabmat.
Related Issues (20)
- Build script in PyPI source version uses default `jemalloc` HOT 5
- Cannot sandwich SplitMatrix with non-owned array
- dlopen symbol not found issue with M1 wheel HOT 3
- `-march` is not cross-platform HOT 1
- Daily run failure: Unit tests HOT 1
- Sandwich product fails for very large dense matrices
- Sandwich product fails for large F-contiguous matrices
- Cross sandwich product fails for split matrices with large dense matrix part
- wheel build for aarch64 is incredibly slow HOT 1
- Equality comparison is incorrect for `CategoricalMatrix` HOT 4
- `matvec` inconsistent behavior when used with the `col` argument. HOT 3
- `.getcol` method ignores `drop_first` attribute
- `SplitMatrix.__init__()` does not handle `SplitMatrix` inputs correctly
- Sandwich product fails for large F-contiguous matrices in 3.1.8 HOT 1
- Daily run failure: Unit tests
- Installing with Pip on Mac leads to ImportError HOT 6
- Missing Linux x86_64 wheels for version 3.1.12
- Daily run failure: Unit tests HOT 2
- Create SplitMatrix from polars data frame
- Daily run failure: Unit tests
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabmat.