GithubHelp home page GithubHelp logo

Comments (2)

LeonAksman avatar LeonAksman commented on July 18, 2024

Hi Jake,

Thanks a lot for raising this issue. I finally got around to this as I'm planning a fairly large update to the master branch. It won't change the algorithm, but will make the code installable as a python package along with a bunch of changes in simrun.py to better showcase pSuStaIn's features.

I agree that the way the code was plain wrong. Looking at your proposed changes, I realized that if the user passes an array of folds it won't work, so maybe we can use this logic instead:

        if select_fold != []:
            Nfolds                          = len(select_fold)
        else:
            select_fold                     = test_idxs
            Nfolds   

This will set the number of folds to what the user passed in or use all folds if the user didn't pass anything in. It will also make sure that select_fold holds the folds to be run.

Then I added this:

        for fold in range(Nfolds):

            indx_train                      = np.array([x for x in range(self.__sustainData.getNumSamples()) if x not in select_fold[fold]])
            indx_test                       = select_fold[fold]

Where I replaced test_idxs[fold] with select_fold[fold] in both lines.

What do you think of this?

from pysustain.

illdopejake avatar illdopejake commented on July 18, 2024

Hi Leon,

Thanks for looking into this. Sorry I didn't respond earlier. I just got around to having another look at this, and I'm still running into a similar issue. I think the source of issue is ultimately lack of documentation for this function.

So, as I learned from the tutorial notebook, test_idxs is supposed to be a nested list (ie a list of lists). Specifically, there is a length n list of lists containing m indices, where n is the # of folds and m is the number of individuals in the test set for a given fold. (As an aside, this seems less intuitive to me than just an n x m array.)

Then, the user is is prompted to pass the select_fold argument if he/she wishes in a parallel context. The default is an empty list, which to me is unintuitive. Why another list? I would have expected this argument to just be an integer, 0 through n, where n is the number of folds. I'm not sure what the actual argument is supposed to be? It appears from the default (and your prior comment) that it's supposed to be an array, but if the user already did the work to compile the list of lists in the first place, why pass another array for the select fold? I would find it easier to just pass an integer indicating which fold the user wants to use, and the solution I proposed (janky as it may be) allows that.

I may just be misunderstanding something, and this comment might be obviated by documentation explaining the expected input. But I couldn't find any documentation for this function, nor was there an example on the tutorial notebook.

Anyway, I've changed it locally and everything is fine, so consider this just a suggestion and no worries if you disagree! Just some food for thought.

<3
--Jake

from pysustain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.