Hey, When I am running a forward pass with your HSCRF module, I am g

When I print loss after line 175, I get below i

In following code, obviously, loss should be size of 1. <div class="s

I add some annotations: <a class="commit-link" href="https://github.com/ZhixiuYe/HSCRF

How to use loss from HSCRF? about hscrf-pytorch HOT 9 CLOSED

zhixiuye commented on May 27, 2024

How to use loss from HSCRF?

from hscrf-pytorch.

Comments (9)

ZhixiuYe commented on May 27, 2024

When I print loss after line 175, I get below information

Variable containing:
 92.9371
[torch.cuda.FloatTensor of size 1 (GPU 0)]

I guess you are using torch 0.4.0, but this code is written in 0.2.0. You can install the correct pytorch version and try again.

to_scalar function is just for recording the current loss and print out in https://github.com/ZhixiuYe/HSCRF-pytorch/blob/master/train.py#L199. And it's ok to delete line 177.

from hscrf-pytorch.

sebastianGehrmann commented on May 27, 2024

Unfortunately, the rest of the code I am using is pytorch 0.4.0, so I can't mix and match and need to port the SCRF code. For some guidance, could you print some more sizes so I can see what I need to fix?

My tag size is 4 since I am not doing NER, and am trying to only get binary labels, so my tags are (no tag, tag, start, end). Using test data with batch size 10, I am getting the following sizes:

 def forward(self, feats, mask_word, tags, mask_tag):
        self.batch_size = feats.size(0)
        self.sent_len = feats.size(1)
        # feats: (10 x 48 x 256)
        # mask_words (10)
        # tags: (10, 40, 4)
        # mask_tag: (10, 40)
        feats = self.dense(feats)
        self.SCRF_scores = self.HSCRF_scores(feats)
        # self.SCRF_scores: (10, 48, 48, 4, 4)
        forward_score = self.get_logloss_denominator(self.SCRF_scores, mask_word)
        # forward_score: (1)
        numerator = self.get_logloss_numerator(tags, self.SCRF_scores, mask_tag)
        # numerator: (209)
        loss =  (forward_score - numerator.sum()) / self.batch_size
        # loss: (209)
        return loss

Here are the two functions annotated:

    def get_logloss_numerator(self, goldfactors, scores, mask):
        # mask: (10, 40)
        batch_size = scores.size(0) # 10
        sent_len = scores.size(1) # 48
        tagset_size = scores.size(3) # 4
        goldfactors = goldfactors[:, :, 0]*sent_len*tagset_size*tagset_size + goldfactors[:,:,1]*tagset_size*tagset_size+goldfactors[:,:,2]*tagset_size+goldfactors[:,:,3]
        # goldfactors: (10, 40)
        factorexprs = scores.view(batch_size, -1)
        # factorexprs: (10, 36864)
        val = torch.gather(factorexprs, 1, goldfactors)
        # val: (10, 40)
        numerator = val.masked_select(mask)
        # numerator: (209)
        return numerator

    def get_logloss_denominator(self, scores, mask):
        logalpha = Variable(torch.FloatTensor(self.batch_size, self.sent_len+1, self.tagset_size).fill_(-10000.)).cuda()
        # logalpha: (10, 49, 4)
        logalpha[:, 0, self.start_id] = 0.
        istarts = [0] * self.ALLOWED_SPANLEN + range(self.sent_len - self.ALLOWED_SPANLEN+1)
        # len(istarts): 49
        for i in range(1, self.sent_len+1):
                tmp = scores[:, istarts[i]:i, i-1] + \
                        logalpha[:, istarts[i]:i].unsqueeze(3).expand(self.batch_size, i - istarts[i], self.tagset_size, self.tagset_size)
                tmp = tmp.transpose(1, 3).contiguous().view(self.batch_size, self.tagset_size, (i-istarts[i])*self.tagset_size)
                max_tmp, _ = torch.max(tmp, dim=2)
                tmp = tmp - max_tmp.view(self.batch_size, self.tagset_size, 1)
                logalpha[:, i] = max_tmp + torch.log(torch.sum(torch.exp(tmp), dim=2))

        mask = mask.unsqueeze(1).unsqueeze(1).expand(self.batch_size, 1, self.tagset_size)
        # mask: (10,1,4)
        alpha = torch.gather(logalpha, 1, mask).squeeze(1)
        # alpha: (10,4)
        return alpha[:,self.stop_id].sum() # return: (1)

=======================><=========================
Edit: As it turns out, I summed the wrong tensor - sizes are all correct. I am now getting a ton of leaf variable has been moved into the graph interior errors, due to the indexing and overwriting in values in these functions. Did you encounter these errors when you built the model? How did you address this?

from hscrf-pytorch.

ZhixiuYe commented on May 27, 2024

In following code, obviously, loss should be size of 1.

        forward_score = self.get_logloss_denominator(self.SCRF_scores, mask_word)
        # forward_score: (1)
        numerator = self.get_logloss_numerator(tags, self.SCRF_scores, mask_tag)
        # numerator: (209)
        loss =  (forward_score - numerator.sum()) / self.batch_size
        # loss: (209)

leaf variable has been moved into the graph interior. I guess it'is because that in pytorch 0.4.0, the class Variable has been removed and replaced by tensor. But I'm not very familiar with pytorch 0.4.0 that I don;t know the details.

from hscrf-pytorch.

sebastianGehrmann commented on May 27, 2024

I managed to refactor this to torch.cat operations so the error is resolved. I now run into a problem that I can't quite understand from your code - your HSCRF_scores functions only computes the likelihoods for positive labels, but keeps O/start/end at -1e5 (by setting it to values in the m30000 tensor). Where in your SCRF code do you actually compute the probability that a tag is O?

from hscrf-pytorch.

ZhixiuYe commented on May 27, 2024

First of all, you can refer to this paper Semi-Markov Conditional Random Fields for Information Extraction for some details about semi-Markov CRFs.
Actually, HSCRF_scores is to computes scores and the shape of scores is (self.batch_size, self.sent_len, self.sent_len, self.tagset_size, self.tagset_size), which is corresponding to g^k(j, x, s) in that paper instead of likelihoods.

from hscrf-pytorch.

sebastianGehrmann commented on May 27, 2024

Thanks for the link to the paper. It might be helpful to annotate your code with the corresponding equations to help code understanding. I still don't get why O is never scored. Eq(2) in your linked paper defines g^k in terms of y_j and y_{j-1}, but the code is only scoring the different tags.

from hscrf-pytorch.

ZhixiuYe commented on May 27, 2024

I get!
This line if span == 0: , I calculate the score of O, and I assume that the socre of O can be calculated only when its length is one, and when its length is more than one, we don't calculate its score.

from hscrf-pytorch.

sebastianGehrmann commented on May 27, 2024

I see - but even when I only print the result of the code for span length 0,

 tmp = torch.cat((self.transition[:, :validtag_size].unsqueeze(0).unsqueeze(0) + emb_x[:, 0, :, :validtag_size].unsqueeze(2),
                                 m10000,
                                 self.transition[:, -2:].unsqueeze(0).unsqueeze(0) + emb_x[:, 0, :, -2:].unsqueeze(2)), 3)
                scores[:, diag0, diag0] = tmp

every entry looks like this:

 [[ 6.1834e-01, -1.0000e+04, -4.2706e-01,  2.8736e-01],
  [-4.2289e-01, -1.0000e+04, -4.3145e-02, -1.0890e+00],
  [-5.2040e-01, -1.0000e+04, -3.2427e-01, -1.1558e+00],
  [-6.0971e-01, -1.0000e+04,  2.9183e-01,  5.1828e-01]],

I only have one tag, so the first entry is for that tag, the one for O is all not calculated at all, and the last two are START and STOP.

from hscrf-pytorch.

ZhixiuYe commented on May 27, 2024

I add some annotations: c7142f2#diff-e90865298a808f704cff7317a658876e
These four entries are a tag(like PER), STOP, START and O respectively.

from hscrf-pytorch.

How to use loss from HSCRF? about hscrf-pytorch HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs