GithubHelp home page GithubHelp logo

Comments (6)

noxtoby avatar noxtoby commented on August 17, 2024

Cheers @illdopejake. I've come across a similar error. @LeonAksman and I added the hacky fix, I believe.

Yours could be fixedhacked by adding the infinitesimal amount to p_perm_k_weighted on line 257 and/or line 331 of ZscoreSustain.py.

This is somewhat unsatisfying since the error seems to be deeper than this — given that p_perm_k_weighted is calculated by ZscoreSustain's _calculate_likelihood_stage()

I don't currently have bandwidth to dig into _calculate_likelihood_stage(), but would happily try to help someone if they have a go first.

from pysustain.

sea-shunned avatar sea-shunned commented on August 17, 2024

Hacky fix extended to other lines in cec24a2. Tests still pass, just FYI.

I agree the error may be deeper than a numerical stability issue...Potentially a good candidate for something to look at in a pySuStaIn hackathon.

from pysustain.

illdopejake avatar illdopejake commented on August 17, 2024

Dear SuStaIn heroes:

Having weird bugs again, maybe related to the above? This time I am using an old dataset that the original SuStaIn worked fine with, but am here using the newest version of SuStaIn. I had to change several lines to get it to work, and I really had no idea what I was doing, but I wanted to bring it to your attention.

First, I am getting a bunch more divide by zero warnings, specifically in lines 555, 556 and 559 of AbstractSustain.py. In particular, line 556 seemed to be "fatal" during debugging, so I used the previous strategy above, and changed it to:

total_prob_subtype_norm = total_prob_subtype / ((np.tile(np.sum(total_prob_subtype, 1).reshape(len(total_prob_subtype), 1), (1, N_S))) + 1e-250)

Next, I was getting a dimension error around lines 577 and lines 584. The structure of the code seems to anticipate this:

try:
    ml_subtype[i]           = this_subtype
except:
    ml_subtype[i]           = this_subtype[0][0]

The problem is, the exception was not getting caught. To get around this, I changed the try/except into an if/else statement:

if type(this_subtype) == tuple:
    ml_subtype[i]           = this_subtype[0][0]
else:
    ml_subtype[i]           = this_subtype

I then had to do something kind of similar a few lines below that at line 584:

if type(this_subtype) == tuple:
    prob_ml_subtype[i]  = this_prob_subtype
else:
    prob_ml_subtype[i]  = this_prob_subtype[this_subtype]

Altogether, these completely uninformed and pathetic hacks seemed to "solve" the issue -- SuStaIn ran with no issue after that. But they are perhaps suggestive of some other little gotchas lingering in the code, and I have no idea why these issues occurred.

I'm wondering if the reason I keep running into these issues is because I'm using data with really large Z scores. Because PET is rather sensitive, it's not uncommon to get Z-scores above 30 for example when normalizing to controls. Could that perhaps explain what's going on? Do ya'll ever run tests/simulations with big z-scores like that?

Thanks friends!

--Jake

from pysustain.

noxtoby avatar noxtoby commented on August 17, 2024

Hey @illdopejake — thanks as ever for your detailed feedback.

I reckon you've hit the nail on the head with the high-z-scores as I've had similar weirdness to you. Usually I find that the Z_max parameter of ZscoreSustain fixes it: must be larger than the actual data.

I believe that there aren't any tests/simulations on this.

Being more of a MixtureSuStaIn guy myself, I'd hope that one of the Zscore gurus here could have a look (@ayoung11 obviously at the top of that list).

from pysustain.

ayoung11 avatar ayoung11 commented on August 17, 2024

I'm on the case! My guess would be that it's a precision problem that causes the likelihood to go to zero for z-scores that are very far from the z-scores that have been included in z_vals and z_max, but will let you know when I get to the bottom of it.

from pysustain.

katrinaCode avatar katrinaCode commented on August 17, 2024

Hi all,

I've found the source and fix for this error -- numpy runs out of memory in float64 and returns 0 if the argument of np.exp() is less than ~-700. This means that p_perm_k = np.exp(p_perm_k) in line 239 in_calculate_likelihood_stage will be all 0s if p_perm_k is all < -700, and this then propagates into NaNs starting with the f_opt calculation in line 262 in _optimise_parameters.

The fix is to ensure p_perm_k is a longdouble in _calculate_likelihood_stage. This increases the precision to allow for arguments of up to ~-11300 in np.exp().

I can't comment on why p_perm_k is becoming so negative and what that indicates, maybe the devs will have some idea? I personally usually ran into this error in later stages of the solve (i.e. on cluster 4 of 6, never usually on the first one or two splits).

I will post my fix below of my entire working _calculate_likelihood_stage function, starting at line 209 as everything above line 212 is unchanged. It contains extra checks/failsafes/print statements and some debugging notes, and could be simplified.

The fix is in ensuring p_perm_k, factor, coeff, and x are all initialized/defined as longdoubles, and double-checking that p_perm_k is a longdouble before performing np.exp(p_perm_k). Then, there are some checks to see if p_perm_k is all zeros after np.exp(p_perm_k) (which it shouldn't be after implementing the fix unless it's run out of memory even using longdoubles) which is what causes the NaNs later on and ultimately the IndexError.

def _calculate_likelihood_stage(self, sustainData, S):
        ''' ... lines 164 to 209

        stage_value                         = 0.5 * point_value[:, :point_value.shape[1] - 1] + 0.5 * point_value[:, 1:]

        M                                   = sustainData.getNumSamples()   #data_local.shape[0]
       
        # fix starts here!
        # p_perm_k is initialized as all 0s.
        p_perm_k                            = np.zeros((M, N + 1), dtype="longdouble")
        if p_perm_k.dtype != "longdouble":
            print("p_perm_k dtype after being initialized: ", p_perm_k.dtype)


        # optimised likelihood calc - take log and only call np.exp once after loop
        sigmat = np.array(self.std_biomarker_zscore)

        factor                              = np.log(1. / np.sqrt(np.pi * 2.0) * sigmat).astype("longdouble")
        coeff                               = np.log(1. / float(N + 1)).astype("longdouble")

        # original
        """
        for j in range(N+1):
            x                   = (data-np.tile(stage_value[:,j],(M,1)))/sigmat
            p_perm_k[:,j]       = coeff+np.sum(factor-.5*x*x,1)
        """
        # faster - do the tiling once
        # stage_value_tiled                   = np.tile(stage_value, (M, 1))
        # N_biomarkers                        = stage_value.shape[0]
        # for j in range(N + 1):
        #     stage_value_tiled_j             = stage_value_tiled[:, j].reshape(M, N_biomarkers)
        #     x                               = (sustainData.data - stage_value_tiled_j) / sigmat  #(data_local - stage_value_tiled_j) / sigmat
        #     p_perm_k[:, j]                  = coeff + np.sum(factor - .5 * np.square(x), 1)
        # p_perm_k                            = np.exp(p_perm_k)

        # even faster - do in one go
        x = ((sustainData.data[:, :, None] - stage_value) / sigmat[None, :, None]).astype("longdouble")

        # debug notes
        # here we are adding a constant (coeff), so p_perm_k should not be all 0s.
        # since log(#) is always < 0, coeff is negative, and I have not yet found any 
        # instances of np.sum(factor[None, :, None] - 0.5 * np.square(x), 1) being positive.
        # it then calls to question how p_perm_k could be all 0 if its maximum threshold is 
        # less than 0. 
        # its maximum threshold should be coeff, i.e. log(1/7) if N_S_max = 6
        
        # turns out, it is all 0 due to memory overflow.
        p_perm_k = coeff + np.sum(factor[None, :, None] - 0.5 * np.square(x), 1)
        
       # double checking the dtype, because I'm paranoid
        if p_perm_k.dtype != "longdouble":
            print("p_perm_k dtype after calc: ", p_perm_k.dtype)

        
        if (p_perm_k < -700).all():
            print("p_perm_k very large negative number, so exping it will be 0 with float64 dtype.", p_perm_k.dtype, np.mean(np.exp(p_perm_k)))
            if p_perm_k.dtype != "longdouble":
                p_perm_k = p_perm_k.astype("longdouble")
                print("changed p_perm_k's dtype to longdouble in if.", p_perm_k.dtype)
                
           
            p_perm_k = np.exp(p_perm_k)
            if np.mean(abs(p_perm_k)) == 0:
                print("p_perm_k all 0 in if statement")
       
        else:
            p_perm_k = np.exp(p_perm_k)
            if p_perm_k.dtype != "longdouble":
                p_perm_k = p_perm_k.astype("longdouble")
                print("changed p_perm_k's dtype to longdouble in else.", p_perm_k.dtype)

                
            if np.mean(abs(p_perm_k)) == 0:
                print("p_perm_k all 0 in else statement")

        # this should not be triggered if the above fixes work.
        if np.mean(abs(p_perm_k)) == 0:
            print(sum(sum(p_perm_k)), np.shape(p_perm_k))
            print("\np_perm_k is all 0")
        
        return p_perm_k



Let me know if this fixes things on your end, as it resolved the issue for me.

Thanks!

from pysustain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.