GithubHelp home page GithubHelp logo

Comments (5)

tbenthompson avatar tbenthompson commented on May 29, 2024 1

I actually ran into this same issue from the other "end" recently when I fixed some bugs with subsets that were not monotonically specified. So this same bug is a problem if either:

  1. The cols to choose aren't monotonic. Or
  2. The subset_cols_indices aren't monotonic.

I fixed the subset_cols_indices problem by correcting the function that was creating the SplitMatrix and then requiring that input indices be monotonic.

Sorry about not realizing at the time that the cols problem would also be an issue!

from tabmat.

MarcAntoineSchmidtQC avatar MarcAntoineSchmidtQC commented on May 29, 2024

@tbenthompson, if you could look into this I would like to get your opinion on whether we should spend the time fixing the _split_col_subsets method or write a new method for the column subsetting.

from tabmat.

tbenthompson avatar tbenthompson commented on May 29, 2024

Yes, you're correct that _split_col_subsets currently assumes the input columns are ordered monotonically. One thing we could do is sort the columns first, then call _split_col_subsets and then "unsort" the outputs. That would probably be the easiest thing to do. In order to minimize the amount of Cython, it might be nice to implement that sort/unsort step in a python wrapper function rather than directly in the Cython code.

I could help with this tomorrow or soon. Let me know if that would be helpful.

from tabmat.

MarcAntoineSchmidtQC avatar MarcAntoineSchmidtQC commented on May 29, 2024

My PR currently contains a fix.

My solution was to create a column_map attribute to the SplitMatrix object. It stores the mapping from index location to actual location in the matrices.
Then, creating a _split_col_subsets-like function that supports unordered columns and duplicate column was much more easy. I also modified the getcol method to use this.

We can discuss this in the coming days. Downside to this is that we need to store an array of size n_cols x 2, but I think it's worth it.

from tabmat.

tbenthompson avatar tbenthompson commented on May 29, 2024

@MarcAntoineSchmidtQC did this get fixed or did it get dropped when you closed your PR? Is it worth reviving just this fix?

from tabmat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.