The mx-lsoftmax from luoyetx

Why minus -w[yi] in `x_grad[i] += alpha * o_grad[i, yi] * (df_dx - w[yi])`?

Do you mind explaining this part? I am unable to get why need to -w[yi] here.

whether multi-label classification can use the loss?

nan value

I reimplemented in tensorflow, but find it is hard to train.
It is very easy get nan value.
How could to avoid this?

The loss suddenly to be nan.

I set the parameters as :
beta:1000
margin:4
scale=0.9997
beta_min=5

And after some iteration, the cross entropy loss suddenly become nan. Using the C++ layer and compile with mxnet. Does anyone have idea of this situation and how to solve it?

Is it possible to add "grad_scale" argument in the lsoftmax operator?

I guess grad_scale can be important for each loss function. Thus it would be better if you can add this argument in your operator.

Hi! I have a problem while i'm running mnist.py. Parameter: gpu=0, op-impl = 'py'.
`Namespace(batch_size=128, beta=100.0, beta_min=0, gpu=0, lr=0.01, margin=1, model_prefix='model/mnist', no_lsoftmax=False, num_epoch=20, op_impl='py', profile=False, scale=0.99, test=True, train=True)
[18:38:20] src/io/iter_mnist.cc:94: MNISTIter: load 60000 images, shuffle=1, shape=(128,1,28,28)
[18:38:21] src/io/iter_mnist.cc:94: MNISTIter: load 10000 images, shuffle=1, shape=(128,1,28,28)
Error in LSoftmax.infer_type: Traceback (most recent call last):
File "/home/sasha/mxnet/python/mxnet/operator.py", line 650, in infer_type_entry
types = [_DTYPE_MX_TO_NP[tensor_types[i]] for i in range(n_in)]
KeyError: -1

[18:38:21] /home/sasha/mxnet/dmlc-core/include/dmlc/./logging.h:304: [18:38:21] src/operator/custom/./custom-inl.h:231: Check failed: reinterpret_cast(info_->callbacks[kCustomOpPropInferType])( types.size(), types.data(), info_->contexts[kCustomOpPropInferType])

Stack trace returned 10 entries:
[bt] (0) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fce50b52bdc]
[bt] (1) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op12CustomOpProp9InferTypeEPSt6vectorIiSaIiEES5_S5+0xf17) [0x7fce516edc37]
[bt] (2) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x12d4fa3) [0x7fce51787fa3]
[bt] (3) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x266ef7a) [0x7fce52b21f7a]
[bt] (4) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2670282) [0x7fce52b23282]
[bt] (5) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x26710ee) [0x7fce52b240ee]
[bt] (6) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x32c) [0x7fce52b4d75c]
[bt] (7) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x3c9) [0x7fce51b5f9e9]
[bt] (8) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass9InferTypeENS_5GraphESt6vectorIiSaIiEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1d8) [0x7fce51bb74c8]
[bt] (9) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4lessISD_ESaISt4pairIKSD_S4_EEERKSt6vectorIS4_SaIS4_EESR_SR_RKSt13unordered_mapISD_NS2_6TShapeESt4hashISD_ESt8equal_toISD_ESaISG_ISH_ST_EEERKSS_ISD_iSV_SX_SaISG_ISH_iEEERKSN_INS_9OpReqTypeESaIS18_EERKSt13unordered_setISD_SV_SX_SaISD_EEPSN_INS_7NDArrayESaIS1I_EES1L_S1L_PSS_ISD_S1I_SV_SX_SaISG_ISH_S1I_EEEPNS_8ExecutorERKSS_INS2_9NodeEntryES1I_NS2_13NodeEntryHashENS2_14NodeEntryEqualESaISG_IKS1S_S1I_EEE+0x750) [0x7fce51baff10]

Traceback (most recent call last):
File "mnist.py", line 201, in
train()
File "mnist.py", line 89, in train
epoch_end_callback=mx.callback.do_checkpoint(args.model_prefix))
File "/home/sasha/mxnet/python/mxnet/module/base_module.py", line 459, in fit
for_training=True, force_rebind=force_rebind)
File "/home/sasha/mxnet/python/mxnet/module/module.py", line 399, in bind
state_names=self.state_names)
File "/home/sasha/mxnet/python/mxnet/module/executor_group.py", line 214, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/sasha/mxnet/python/mxnet/module/executor_group.py", line 310, in bind_exec
shared_group))
File "/home/sasha/mxnet/python/mxnet/module/executor_group.py", line 586, in bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/sasha/mxnet/python/mxnet/symbol.py", line 1433, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (128, 1L, 28L, 28L)
softmax_label: (128,)
Error in operator custom0: [18:38:21] src/operator/custom/./custom-inl.h:231: Check failed: reinterpret_cast(info->callbacks[kCustomOpPropInferType])( types.size(), types.data(), info->contexts[kCustomOpPropInferType])

Stack trace returned 10 entries:
[bt] (0) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fce50b52bdc]
[bt] (1) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op12CustomOpProp9InferTypeEPSt6vectorIiSaIiEES5_S5+0xf17) [0x7fce516edc37]
[bt] (2) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x12d4fa3) [0x7fce51787fa3]
[bt] (3) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x266ef7a) [0x7fce52b21f7a]
[bt] (4) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2670282) [0x7fce52b23282]
[bt] (5) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(+0x26710ee) [0x7fce52b240ee]
[bt] (6) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x32c) [0x7fce52b4d75c]
[bt] (7) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x3c9) [0x7fce51b5f9e9]
[bt] (8) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass9InferTypeENS_5GraphESt6vectorIiSaIiEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1d8) [0x7fce51bb74c8]
[bt] (9) /home/sasha/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4lessISD_ESaISt4pairIKSD_S4_EEERKSt6vectorIS4_SaIS4_EESR_SR_RKSt13unordered_mapISD_NS2_6TShapeESt4hashISD_ESt8equal_toISD_ESaISG_ISH_ST_EEERKSS_ISD_iSV_SX_SaISG_ISH_iEEERKSN_INS_9OpReqTypeESaIS18_EERKSt13unordered_setISD_SV_SX_SaISD_EEPSN_INS_7NDArrayESaIS1I_EES1L_S1L_PSS_ISD_S1I_SV_SX_SaISG_ISH_S1I_EEEPNS_8ExecutorERKSS_INS2_9NodeEntryES1I_NS2_13NodeEntryHashENS2_14NodeEntryEqualESaISG_IKS1S_S1I_EEE+0x750) [0x7fce51baff10]
`

Why pick out the case of p=0 when you calculate the derivative of `cos(m thetha)`?

follow wy1iu/LargeMargin_Softmax_Loss

the author has open source the code, https://github.com/wy1iu/LargeMargin_Softmax_Loss. We should follow the lambda descent strategy he uses. Maybe some modification also can be applied.

Why need `le = lambda x, y: x < y or abs(x-y) < eps`?

If the function is smooth, shouldn't it not matter whether it is k or k+1 if it is at the boundary between k and k+1?

How to run c++ code?

Hi, I am a freshman about mxnet and wonder to know how to run your code(c++ version). Could you help me, please?

L-Softmax not building

I tried using the LSoftmax files but got the errors (can be seen below):

perator/lsoftmax.o
src/operator/lsoftmax.cc:30:3: error: stray '\302' in program
   <title>insightface/lsoftmax.cc at master · deepinsight/insightface · GitHub</title>
   ^
src/operator/lsoftmax.cc:30:3: error: stray '\267' in program
src/operator/lsoftmax.cc:30:3: error: stray '\302' in program
src/operator/lsoftmax.cc:30:3: error: stray '\267' in program
src/operator/lsoftmax.cc:159:10: warning: missing terminating ' character
     <!-- '"` --><!-- </textarea></xmp> --></option></form><form class="js-site-search-form" data-scope-type="Repository" data-scope-id="102057483" data-scoped-search-url="/deepinsight/insightface/search" data-unscoped-search-url="/search" action="/deepinsight/insightface/search" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="&#x2713;" />
          ^
src/operator/lsoftmax.cc:159:5: error: missing terminating ' character
     <!-- '"` --><!-- </textarea></xmp> --></option></form><form class="js-site-search-form" data-scope-type="Repository" data-scope-id="102057483" data-scoped-search-url="/deepinsight/insightface/search" data-unscoped-search-url="/search" action="/deepinsight/insightface/search" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="&#x2713;" />
     ^
src/operator/lsoftmax.cc:506:69: error: stray '#' in program
         <td id="LC7" class="blob-code blob-code-inner js-file-line">#<span class="pl-k">include</span> <span class="pl-s"><span class="pl-pds">&quot;</span>./lsoftmax-inl.h<span class="pl-pds">&quot;</span></span></td>
                                                                     ^
src/operator/lsoftmax.cc:812:10: warning: missing terminating ' character
     <!-- '"` --><!-- </textarea></xmp> --></option></form><form class="js-jump-to-line-form" action="" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="&#x2713;" />
          ^
src/operator/lsoftmax.cc:812:5: error: missing terminating ' character
     <!-- '"` --><!-- </textarea></xmp> --></option></form><form class="js-jump-to-line-form" action="" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="&#x2713;" />
     ^
src/operator/lsoftmax.cc:864:5: error: stray '\342' in program
     You can’t perform that action at this time.
     ^
src/operator/lsoftmax.cc:864:5: error: stray '\200' in program
src/operator/lsoftmax.cc:864:5: error: stray '\231' in program
src/operator/lsoftmax.cc:7:1: error: expected unqualified-id before '<' token
 <!DOCTYPE html>
 ^
src/operator/lsoftmax.cc:506:150: error: expected unqualified-id before '<' token
         <td id="LC7" class="blob-code blob-code-inner js-file-line">#<span class="pl-k">include</span> <span class="pl-s"><span class="pl-pds">&quot;</span>./lsoftmax-inl.h<span class="pl-pds">&quot;</span></span></td>
                                                                                                                                                      ^
src/operator/lsoftmax.cc:506:200: error: expected unqualified-id before '<' token
         <td id="LC7" class="blob-code blob-code-inner js-file-line">#<span class="pl-k">include</span> <span class="pl-s"><span class="pl-pds">&quot;</span>./lsoftmax-inl.h<span class="pl-pds">&quot;</span></span></td>
                                                                                                                                                                                                        ^
Makefile:431: recipe for target 'build/src/operator/lsoftmax.o' failed
make: *** [build/src/operator/lsoftmax.o] Error 1

@luoyetx It would be helpful if anyone can tell me about the error.

why we need minus w_y_i in the end?

i don't understand why we need minus w_y_i (or x_i) in the end of the formula. silly question------

doubt for c++ api

following code,where your definition of those two parameter(data,label) in lsoftmax-inl.h,how python know those params
fc4 = mx.sym.LSoftmax(data=embedding, label=label, num_hidden=10, beta=args.beta, margin=args.margin, scale=args.scale, beta_min=args.beta_min, verbose=True)

Make beta play exactly the same role as lambda

Current implement uses beta to weight f_i_yi = |w_yi||x_i|cos(mt) instead of lambda to weight f_i_yi = |w_yi||x_i|cos(t) because lambda is a keyword in Python and can not be used to name a variable. As described in Largin Margin Softmax Paper, we may want to generally reduce lambda to 0 which means generally increase beta to some number as big as 1000. It's kind of wired. I consider to use beta exactly as lambda to weight original f_i_yi = |w_yi||x_i|cos(t) and add a scale parameter in the op to generally reduce it to 0 during training.

How to deal with the inference like the usual fc way?

In this code, we replace the last fc layer with l-softmax layer but not the softmax layer during training. How could we do the inference? Using l-softmax layer maybe too time cost comparing with fc layer...

Tools of visualization

What tool do you use to visualize the result in readme? Matplotlib seems ugly ....

derivative of f repect to x

Hello,
Could you help explain why we need to calculate the derivative of f respect to x (df / dx)?

I am confused by x. My understanding is we only need to calculate the df / dw

Thanks

Why this particular construction?

Hi --

I was wondering where you got the idea for the specific construction of the L-softmax. It seems like maybe you could achieve a similar goal by enforcing a margin like

norm(W) * norm(x) * (m * cos(theta) - m + 1)

instead of

norm(W) * norm(x) * cos(m * theta)

as you do in the paper.

The former seems simpler because you don't have to worry about constructing a psi function that behaves well for all values of theta, m doesn't have to be integer valued, etc. Also, in the paper, the gradient of psi is 0 at pi/2, which AFAICT is an undesirable side effect of the choice of psi. Is that right, or is there some reason that grad psi(pi/2) should be 0?

The proposed alternative above would have the same shape as cos in [0, pi] but with a range of [-m, 1], which seems maybe more natural.

Thoughts? Am I missing something? Did you try this and it stunk in practice?

Thanks

Scale要怎么设?是不是1/num_gpu?

Why the directions of digits in visualization are exactly the same?

Not only in your ReadMe, but also in my training results. The directions of each digits in the visualization are exactly same. I think the directions should depend on the random initialization to some extent. So is there are something in your code which caused that?

Thx~

luoyetx / mx-lsoftmax Goto Github PK

mx-lsoftmax's People

Contributors

Stargazers

Watchers

Forkers

mx-lsoftmax's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs