Dear SuStaIn friends, I had encountered an error on a number of occa

Cheers <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hacky fix extended to other lines in <a class="commit-link" data-hovercard-type="commi

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Fix for "rare" divide by zero problem about pysustain HOT 6 OPEN

ucl-pond commented on August 17, 2024

Fix for "rare" divide by zero problem

from pysustain.

Comments (6)

noxtoby commented on August 17, 2024

Cheers @illdopejake. I've come across a similar error. @LeonAksman and I added the hacky fix, I believe.

Yours could be ~~fixed~~hacked by adding the infinitesimal amount to p_perm_k_weighted on line 257 and/or line 331 of ZscoreSustain.py.

This is somewhat unsatisfying since the error seems to be deeper than this — given that p_perm_k_weighted is calculated by ZscoreSustain's _calculate_likelihood_stage()

I don't currently have bandwidth to dig into _calculate_likelihood_stage(), but would happily try to help someone if they have a go first.

from pysustain.

sea-shunned commented on August 17, 2024

Hacky fix extended to other lines in cec24a2. Tests still pass, just FYI.

I agree the error may be deeper than a numerical stability issue...Potentially a good candidate for something to look at in a pySuStaIn hackathon.

from pysustain.

illdopejake commented on August 17, 2024

Dear SuStaIn heroes:

Having weird bugs again, maybe related to the above? This time I am using an old dataset that the original SuStaIn worked fine with, but am here using the newest version of SuStaIn. I had to change several lines to get it to work, and I really had no idea what I was doing, but I wanted to bring it to your attention.

First, I am getting a bunch more divide by zero warnings, specifically in lines 555, 556 and 559 of AbstractSustain.py. In particular, line 556 seemed to be "fatal" during debugging, so I used the previous strategy above, and changed it to:

total_prob_subtype_norm = total_prob_subtype / ((np.tile(np.sum(total_prob_subtype, 1).reshape(len(total_prob_subtype), 1), (1, N_S))) + 1e-250)

Next, I was getting a dimension error around lines 577 and lines 584. The structure of the code seems to anticipate this:

try:
    ml_subtype[i]           = this_subtype
except:
    ml_subtype[i]           = this_subtype[0][0]

The problem is, the exception was not getting caught. To get around this, I changed the try/except into an if/else statement:

if type(this_subtype) == tuple:
    ml_subtype[i]           = this_subtype[0][0]
else:
    ml_subtype[i]           = this_subtype

I then had to do something kind of similar a few lines below that at line 584:

if type(this_subtype) == tuple:
    prob_ml_subtype[i]  = this_prob_subtype
else:
    prob_ml_subtype[i]  = this_prob_subtype[this_subtype]

Altogether, these completely uninformed and pathetic hacks seemed to "solve" the issue -- SuStaIn ran with no issue after that. But they are perhaps suggestive of some other little gotchas lingering in the code, and I have no idea why these issues occurred.

I'm wondering if the reason I keep running into these issues is because I'm using data with really large Z scores. Because PET is rather sensitive, it's not uncommon to get Z-scores above 30 for example when normalizing to controls. Could that perhaps explain what's going on? Do ya'll ever run tests/simulations with big z-scores like that?

Thanks friends!

--Jake

from pysustain.

noxtoby commented on August 17, 2024

Hey @illdopejake — thanks as ever for your detailed feedback.

I reckon you've hit the nail on the head with the high-z-scores as I've had similar weirdness to you. Usually I find that the Z_max parameter of ZscoreSustain fixes it: must be larger than the actual data.

I believe that there aren't any tests/simulations on this.

Being more of a MixtureSuStaIn guy myself, I'd hope that one of the Zscore gurus here could have a look (@ayoung11 obviously at the top of that list).

from pysustain.

ayoung11 commented on August 17, 2024

I'm on the case! My guess would be that it's a precision problem that causes the likelihood to go to zero for z-scores that are very far from the z-scores that have been included in z_vals and z_max, but will let you know when I get to the bottom of it.

from pysustain.

katrinaCode commented on August 17, 2024

Hi all,

I've found the source and fix for this error -- numpy runs out of memory in float64 and returns 0 if the argument of np.exp() is less than ~-700. This means that p_perm_k = np.exp(p_perm_k) in line 239 in_calculate_likelihood_stage will be all 0s if p_perm_k is all < -700, and this then propagates into NaNs starting with the f_opt calculation in line 262 in _optimise_parameters.

The fix is to ensure p_perm_k is a longdouble in _calculate_likelihood_stage. This increases the precision to allow for arguments of up to ~-11300 in np.exp().

I can't comment on why p_perm_k is becoming so negative and what that indicates, maybe the devs will have some idea? I personally usually ran into this error in later stages of the solve (i.e. on cluster 4 of 6, never usually on the first one or two splits).

I will post my fix below of my entire working _calculate_likelihood_stage function, starting at line 209 as everything above line 212 is unchanged. It contains extra checks/failsafes/print statements and some debugging notes, and could be simplified.

The fix is in ensuring p_perm_k, factor, coeff, and x are all initialized/defined as longdoubles, and double-checking that p_perm_k is a longdouble before performing np.exp(p_perm_k). Then, there are some checks to see if p_perm_k is all zeros after np.exp(p_perm_k) (which it shouldn't be after implementing the fix unless it's run out of memory even using longdoubles) which is what causes the NaNs later on and ultimately the IndexError.

def _calculate_likelihood_stage(self, sustainData, S):
        ''' ... lines 164 to 209

        stage_value                         = 0.5 * point_value[:, :point_value.shape[1] - 1] + 0.5 * point_value[:, 1:]

        M                                   = sustainData.getNumSamples()   #data_local.shape[0]
       
        # fix starts here!
        # p_perm_k is initialized as all 0s.
        p_perm_k                            = np.zeros((M, N + 1), dtype="longdouble")
        if p_perm_k.dtype != "longdouble":
            print("p_perm_k dtype after being initialized: ", p_perm_k.dtype)


        # optimised likelihood calc - take log and only call np.exp once after loop
        sigmat = np.array(self.std_biomarker_zscore)

        factor                              = np.log(1. / np.sqrt(np.pi * 2.0) * sigmat).astype("longdouble")
        coeff                               = np.log(1. / float(N + 1)).astype("longdouble")

        # original
        """
        for j in range(N+1):
            x                   = (data-np.tile(stage_value[:,j],(M,1)))/sigmat
            p_perm_k[:,j]       = coeff+np.sum(factor-.5*x*x,1)
        """
        # faster - do the tiling once
        # stage_value_tiled                   = np.tile(stage_value, (M, 1))
        # N_biomarkers                        = stage_value.shape[0]
        # for j in range(N + 1):
        #     stage_value_tiled_j             = stage_value_tiled[:, j].reshape(M, N_biomarkers)
        #     x                               = (sustainData.data - stage_value_tiled_j) / sigmat  #(data_local - stage_value_tiled_j) / sigmat
        #     p_perm_k[:, j]                  = coeff + np.sum(factor - .5 * np.square(x), 1)
        # p_perm_k                            = np.exp(p_perm_k)

        # even faster - do in one go
        x = ((sustainData.data[:, :, None] - stage_value) / sigmat[None, :, None]).astype("longdouble")

        # debug notes
        # here we are adding a constant (coeff), so p_perm_k should not be all 0s.
        # since log(#) is always < 0, coeff is negative, and I have not yet found any 
        # instances of np.sum(factor[None, :, None] - 0.5 * np.square(x), 1) being positive.
        # it then calls to question how p_perm_k could be all 0 if its maximum threshold is 
        # less than 0. 
        # its maximum threshold should be coeff, i.e. log(1/7) if N_S_max = 6
        
        # turns out, it is all 0 due to memory overflow.
        p_perm_k = coeff + np.sum(factor[None, :, None] - 0.5 * np.square(x), 1)
        
       # double checking the dtype, because I'm paranoid
        if p_perm_k.dtype != "longdouble":
            print("p_perm_k dtype after calc: ", p_perm_k.dtype)

        
        if (p_perm_k < -700).all():
            print("p_perm_k very large negative number, so exping it will be 0 with float64 dtype.", p_perm_k.dtype, np.mean(np.exp(p_perm_k)))
            if p_perm_k.dtype != "longdouble":
                p_perm_k = p_perm_k.astype("longdouble")
                print("changed p_perm_k's dtype to longdouble in if.", p_perm_k.dtype)
                
           
            p_perm_k = np.exp(p_perm_k)
            if np.mean(abs(p_perm_k)) == 0:
                print("p_perm_k all 0 in if statement")
       
        else:
            p_perm_k = np.exp(p_perm_k)
            if p_perm_k.dtype != "longdouble":
                p_perm_k = p_perm_k.astype("longdouble")
                print("changed p_perm_k's dtype to longdouble in else.", p_perm_k.dtype)

                
            if np.mean(abs(p_perm_k)) == 0:
                print("p_perm_k all 0 in else statement")

        # this should not be triggered if the above fixes work.
        if np.mean(abs(p_perm_k)) == 0:
            print(sum(sum(p_perm_k)), np.shape(p_perm_k))
            print("\np_perm_k is all 0")
        
        return p_perm_k

Let me know if this fixes things on your end, as it resolved the issue for me.

Thanks!

from pysustain.

Fix for "rare" divide by zero problem about pysustain HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs