Expected Behavior Hi, I am trying to use wordrep on part in future

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Unexpected `Bus error: 10` while using `wordrep` about mitie HOT 10 CLOSED

tymzar commented on July 28, 2024

Unexpected `Bus error: 10` while using `wordrep`

from mitie.

Comments (10)

davisking commented on July 28, 2024 1

Not sure. I want to say the dataset was like 40GB maybe. And the machine had 128GB of ram maybe. It was a long time ago though so take that with a grain of salt.

from mitie.

tymzar commented on July 28, 2024 1

@davisking , on 4Gb dataset, for me it looks like OpenBLAS running out of memory. I wan monitoring the RAM and swap was full but memory usage was about 40% what's strange :/

I will try to increase the dataset incrementally and see when it fails. I have no idea.

lldb -- ./wordrep -e <path>
(lldb) target create "./wordrep"
Current executable set to '<path>' (arm64).
(lldb) settings set -- target.run-args  "-e" "<path>"
(lldb) run
Process 46942 launched: '<path>' (arm64)
number of raw ASCII files found: 76
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 76
Sample 50000000 random context vectors
Now do CCA (left size: 50000000, right size: 50000000).
Process 46942 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x6e01ccc1fc)
    frame #0: 0x0000000192905900 libLAPACK.dylib`SLARFT + 400
libLAPACK.dylib`SLARFT:
->  0x192905900 <+400>: ldr    s0, [x27, x23, lsl  #2]
    0x192905904 <+404>: fcmp   s0, #0.0
    0x192905908 <+408>: b.ne   0x192905974               ; <+516>
    0x19290590c <+412>: sub    x23, x23, #0x1
Target 0: (wordrep) stopped.

from mitie.

davisking commented on July 28, 2024

Hard to say. That shouldn't happen though. Try running the program in gdb and getting a stack trace to see what's going on.

from mitie.

tymzar commented on July 28, 2024

@davisking I will do that (a will post results an approx. 2h). Do you roughly remember the size od dataset that English or Spanish model was trained on as well as the RAM of the machine?

from mitie.

tymzar commented on July 28, 2024

Okay, thank you is there any formula or approximation that I can calculate needed amount of RAM (A want to train a model for Polish on dataset 50-80GB). Do you know how big total_word_feature_extractor would be?

from mitie.

davisking commented on July 28, 2024

I don't recall. It should be linear though. Try some sizes and see what happens. But I guess you are saying you think you are just running out of RAM? I would normally expect a bus error to be something else, but I'm not sure if OS X just reports the error differently.

from mitie.

tymzar commented on July 28, 2024

@davisking I knew from some of the issues that MITIE requires a lot of RAM, but that's quite surprising ://

dataset -> total_word_feature_extractor.dat

1. 50MB -> 335MB

number of raw ASCII files found: 1
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 1
Sample 50000000 random context vectors
Now do CCA (left size: 8582326, right size: 8582326).
correlations:   0.783697   0.495714   0.428661   0.417896   0.399711   0.308799   0.257686   0.241372   0.214332   0.206914   0.180628   0.151268   0.143147   0.135543   0.129035    0.11404   0.104493   0.094976  0.0889702   0.081165  0.0765073  0.0743562  0.0730994  0.0653017  0.0622959  0.0602029  0.0504991  0.0483258  0.0475654  0.0458159  0.0405218  0.0396676  0.0377837   0.035018  0.0321907  0.0302941  0.0294151  0.0272736  0.0250081   0.023913  0.0227566  0.0216142  0.0207634  0.0199015  0.0191722  0.0172601  0.0167749  0.0165069  0.0156701  0.0152914  0.0150729  0.0147807  0.0135261   0.013273  0.0125623  0.0120217  0.0117249  0.0115241   0.010776  0.0105152  0.0102252 0.00982769 0.00967478 0.00906732 0.00888962 0.00882777 0.00853846 0.00810019 0.00803891 0.00766455  0.0073715 0.00711813 0.00686214  0.0067737 0.00648305 0.00637957 0.00621849 0.00609243 0.00578847 0.00560462 0.00551808 0.00540755 0.00527975  0.0051427 0.00495427  0.0048308 0.00470006 0.00460154 0.00457549 0.00444141 
CCA done, now build up average word vectors
num words: 200000
num word vectors loaded: 200000
got word vectors, now learn how they correlate with morphological features.
building morphological vectors
L.size(): 200000
R.size(): 200000
Now running CCA on word <-> morphology...
correlations:  0.972561  0.671965  0.612678  0.579442  0.505745  0.410469  0.370399  0.320987  0.303507  0.295264  0.284905  0.272496  0.260294  0.252493  0.247422  0.243564  0.224549  0.215319  0.211788  0.202069  0.198979  0.193592  0.187116  0.179735  0.177608  0.173987  0.167869  0.165495  0.159846  0.157329  0.152932  0.148687  0.146318  0.144891    0.1425  0.140035  0.138354  0.137298   0.13551  0.133037  0.131869  0.129603  0.129255  0.126594  0.125905  0.123318  0.119747  0.116908  0.116507  0.115395  0.114808  0.111849  0.111262  0.108799  0.108121  0.106949  0.105413  0.103576  0.103322  0.102464  0.101616  0.100574  0.100245 0.0993405 0.0986815 0.0981801 0.0973369 0.0969129 0.0962343 0.0956761 0.0950916 0.0949497 0.0937699 0.0930668  0.092798 0.0924761 0.0915229 0.0906338 0.0902752 0.0897365 0.0893135 0.0891064 0.0884818 0.0879273 0.0873452 0.0871056 0.0864901 0.0862346 0.0858451 0.0855675 

morphological feature dimensionality: 90
total word feature dimensionality: 271

2. 100MB -> 337MB

number of raw ASCII files found: 2
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 2
Sample 50000000 random context vectors
Now do CCA (left size: 17204027, right size: 17204027).
correlations: 0.779923 0.512237 0.436919 0.423704 0.412156 0.32337 0.260955 0.245038 0.222629 0.220487 0.186529 0.155599 0.145268 0.140225 0.1293 0.11533 0.10416 0.0969533 0.0920494 0.0812887 0.0788473 0.0760371 0.0711883 0.0664675 0.0616015 0.0582108 0.0499312 0.0481921 0.0471149 0.0457483 0.0405427 0.0391695 0.0366974 0.0343238 0.0317766 0.0301706 0.0284668 0.02708 0.024634 0.0230824 0.0221957 0.0211691 0.0201103 0.0192433 0.0177503 0.0170046 0.0165348 0.0160024 0.0154991 0.014842 0.0144806 0.0140849 0.0132977 0.0126606 0.012073 0.0118559 0.0112087 0.0108456 0.0105672 0.0102032 0.0100555 0.0096982 0.00935652 0.00908955 0.00844935 0.00818099 0.00796244 0.00762995 0.00753236 0.0072838 0.00706413 0.00690194 0.00681546 0.00654413 0.00626689 0.00608943 0.00596117 0.00557491 0.00536563 0.0051692 0.00503521 0.00487138 0.00479157 0.00464975 0.0042314 0.00396397 0.00389778 0.00373412 0.0036037 0.00347846
CCA done, now build up average word vectors
num words: 200000
num word vectors loaded: 200000
got word vectors, now learn how they correlate with morphological features.
building morphological vectors
L.size(): 200000
R.size(): 200000
Now running CCA on word <-> morphology...
correlations: 0.978974 0.713797 0.657182 0.632944 0.558261 0.47269 0.425248 0.372064 0.351736 0.339963 0.320005 0.311463 0.30237 0.294466 0.289986 0.280466 0.265051 0.25838 0.247886 0.235087 0.232196 0.225109 0.224001 0.212538 0.204304 0.202062 0.191768 0.184285 0.182187 0.181151 0.174696 0.169313 0.167562 0.16549 0.160309 0.157552 0.154192 0.152316 0.150774 0.146907 0.144155 0.142709 0.141969 0.138627 0.137163 0.134016 0.132602 0.129624 0.12818 0.126129 0.125144 0.12363 0.122081 0.120231 0.118327 0.116443 0.115346 0.114512 0.113029 0.112386 0.111879 0.111077 0.110131 0.108435 0.107854 0.107057 0.106172 0.105093 0.104389 0.103112 0.102088 0.101712 0.100469 0.0994505 0.0989811 0.0984102 0.0982125 0.0972503 0.096731 0.0956087 0.0950048 0.094337 0.0939208 0.0925872 0.0919503 0.0914311 0.0909645 0.0905782 0.0905216 0.0894758

morphological feature dimensionality: 90
total word feature dimensionality: 271

3. 150MB -> failed

number of raw ASCII files found: 3
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 3
Sample 50000000 random context vectors
Now do CCA (left size: 25828623, right size: 25828623).
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x6f1a629238)
    frame #0: 0x0000000192905900 libLAPACK.dylib`SLARFT + 400
libLAPACK.dylib`SLARFT:
->  0x192905900 <+400>: ldr    s0, [x27, x23, lsl  #2]
    0x192905904 <+404>: fcmp   s0, #0.0
    0x192905908 <+408>: b.ne   0x192905974               ; <+516>
    0x19290590c <+412>: sub    x23, x23, #0x1

4. 200MB -> failed

number of raw ASCII files found: 4
num words: 200000
saving word counts to top_word_counts.dat
number of raw ASCII files found: 4
Sample 50000000 random context vectors
Now do CCA (left size: 34482189, right size: 34482189).
Process 49420 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x6f1d74b3b0)
    frame #0: 0x0000000192905900 libLAPACK.dylib`SLARFT + 400
libLAPACK.dylib`SLARFT:
->  0x192905900 <+400>: ldr    s0, [x27, x23, lsl  #2]
    0x192905904 <+404>: fcmp   s0, #0.0
    0x192905908 <+408>: b.ne   0x192905974               ; <+516>
    0x19290590c <+412>: sub    x23, x23, #0x1
Target 0: (wordrep) stopped.

Im not so sure what to do next, do you have any tips?

from mitie.

tymzar commented on July 28, 2024

But I guess you are saying you think you are just running out of RAM? I would normally expect a bus error to be something else, but I'm not sure if OS X just reports the error differently.

RAM was my initial guess (because I see that it's capped to 40% and swap is drained), but I can be mistaken :c

from mitie.

tymzar commented on July 28, 2024

@davisking do you have any ideas?

from mitie.

davisking commented on July 28, 2024

No idea. You will have to debug into it and see what the deal is.

from mitie.

Unexpected `Bus error: 10` while using `wordrep` about mitie HOT 10 CLOSED

Comments (10)

dataset -> total_word_feature_extractor.dat

1. 50MB -> 335MB

2. 100MB -> 337MB

3. 150MB -> failed

4. 200MB -> failed

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs