Hi. I pushed gtsam into Debian, and I see (among other things) that it doesn't bui

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Tagging <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

gtsam doesn't build on i386 or armhf about gtsam HOT 29 OPEN

dkogan commented on August 19, 2024

gtsam doesn't build on i386 or armhf

from gtsam.

Comments (29)

dkogan commented on August 19, 2024 1

Hi. Thanks for replying

I need to spin up an armhf box first, thanks for reporting!

I see the same behavior on i686 and armhf, and since you probably have an amd64 box, you can more easily reproduce the i686 failure. If you have an amd64 install of debian, install the i686 cross-compiler: sudo apt install g++-i686-linux-gnu And then you can compile the offending file as noted in the bug report. But tell it to use the i686 compiler. I see this: ***@***.***:~/debianstuff/gtsam$ i686-linux-gnu-g++ \ -DBOOST_ALL_NO_LIB \ -DBOOST_ATOMIC_DYN_LINK \ -DBOOST_CHRONO_DYN_LINK \ -DBOOST_DATE_TIME_DYN_LINK \ -DBOOST_FILESYSTEM_DYN_LINK \ -DBOOST_PROGRAM_OPTIONS_DYN_LINK \ -DBOOST_REGEX_DYN_LINK \ -DBOOST_SERIALIZATION_DYN_LINK \ -DBOOST_SYSTEM_DYN_LINK \ -DBOOST_THREAD_DYN_LINK \ -DBOOST_TIMER_DYN_LINK \ -DNDEBUG \ -I"." \ -I"CppUnitLite" \ -isystem /usr/include/eigen3 \ -fstack-protector-strong \ -Wformat \ -Werror=format-security \ -Wno-deprecated-declarations \ -Wdate-time \ -D_FORTIFY_SOURCE=2 \ -DNDEBUG \ -Wall \ -fPIC \ -Wreturn-local-addr \ -Werror=return-local-addr \ -Wreturn-type \ -Werror=return-type \ -Wformat \ -Werror=format-security \ -Wsuggest-override \ -Wno-unused-local-typedefs \ -o /tmp/tst.o \ -c "gtsam/linear/tests/testSparseEigen.cpp" In file included from ./gtsam/base/FastSet.h:28, from ./gtsam/inference/Key.h:22, from ./gtsam/inference/Ordering.h:23, from ./gtsam/inference/EliminateableFactorGraph.h:26, from ./gtsam/linear/GaussianFactorGraph.h:24, from gtsam/linear/tests/testSparseEigen.cpp:21: ./gtsam/base/Testable.h: In instantiation of 'bool gtsam::assert_equal(const V&, const V&, double) [with V = int]': gtsam/linear/tests/testSparseEigen.cpp:44:3: required from here ./gtsam/base/Testable.h:99:26: error: incomplete type 'gtsam::traits<int>' used in nested name specifier 99 | if (traits<V>::Equals(actual,expected, tol)) | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ ./gtsam/base/Testable.h:102:21: error: incomplete type 'gtsam::traits<int>' used in nested name specifier 102 | traits<V>::Print(expected,"expected:\n"); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ./gtsam/base/Testable.h:103:21: error: incomplete type 'gtsam::traits<int>' used in nested name specifier 103 | traits<V>::Print(actual,"actual:\n"); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~

Also, do you also build the Python wrapper? Currently I think they need to be built together.

I do, yes. This creates other problems, but I have workarounds. Once all the things build on all the arches, I'll submit more bug reports to hopefully ingest the fixes into your tree.

from gtsam.

dkogan commented on August 19, 2024 1

OK, this issue appears to be fixed, but now the tests fail on armhf: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=armhf&ver=4.2%7E9%2Bdfsg-5&stamp=1692186176&raw=0

The other arches still have problems too: https://buildd.debian.org/status/package.php?p=gtsam

I'm going to have to come back to this later. If you want to work on some of these issues in the meantime, that would be great. The package tree is here: https://buildd.debian.org/status/package.php?p=gtsam

from gtsam.

dkogan commented on August 19, 2024 1

Hi. It looks like the failure mode on i686 and armhf is the same one: the build completes, but the tests fail: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=i386&ver=4.2%7E9%2Bdfsg-5&stamp=1692266835&raw=0

Since you don't need any new hardware on i686, can I get you to look into that? You'll be far more effective at debugging this than I. Thanks.

from gtsam.

ProfFan commented on August 19, 2024

I need to spin up an armhf box first, thanks for reporting! Also, do you also build the Python wrapper? Currently I think they need to be built together.

from gtsam.

ProfFan commented on August 19, 2024

@dkogan I think you only need to change the EXPECT(assert_equal()) to EXPECT_LONGS_EQUAL()

from gtsam.

dkogan commented on August 19, 2024

Thanks much. Appears to work. Doing another upload; let's see how it does...

from gtsam.

ProfFan commented on August 19, 2024

@dkogan I think we currently only support 64-bit platforms, is this supported on Debian?

from gtsam.

dkogan commented on August 19, 2024

Fan Jiang ***@***.***> writes:

@dkogan I think we currently only support 64-bit platforms, is this supported on Debian?

What is "support"? There's no reason gtsam should be arch-specific: it's a tool to solve a math problem. The 10 arches on top of the buildd page are the ones that must be functional for this to be included in Debian: https://buildd.debian.org/status/package.php?p=gtsam The build issues are varied. armel needs wants to be linked with -latomic. I asked for that in the build, but apparently only some parts of the build respected that setting. This is almost certainly a bug in the gtsam build system: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=armel&ver=4.2%7E9%2Bdfsg-5&stamp=1692172353&raw=0 armhf has failing tests: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=armhf&ver=4.2%7E9%2Bdfsg-5&stamp=1692186176&raw=0 mips64el has some linker problem I haven't debugged yet: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=mips64el&ver=4.2%7E9%2Bdfsg-5&stamp=1692184823&raw=0 mipsel also wants -latomic and apparently the debug symbols are too large to fit into its ELF segments: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=mipsel&ver=4.2%7E9%2Bdfsg-5&stamp=1692192262&raw=0 And so on. None of these are inherent limitations in gtsam. It's all fixable, but will take some time. I'll get back to it eventually, but if you want to work on it in the meantime, you can.

from gtsam.

ProfFan commented on August 19, 2024

@dkogan I can look at it this week

from gtsam.

ProfFan commented on August 19, 2024

Tagging @jlblancoc for this is a Debian-related issue

from gtsam.

jlblancoc commented on August 19, 2024

First, thanks a lot @dkogan for finally being brave enough to move this forward! :-)

All these errors are caused by assumptions and/or never-happened-yet situations about the different sizes of "int", "long", "size_t" in those "uncommon" archs, it's not related directly with Debian packaging or flags per se...

from gtsam.

jlblancoc commented on August 19, 2024

PS: The comment above was for build errors.

For failing tests, I cannot see a clear reason from the buildd logs. In the past, reasons for tests to fail in other projects on "uncommon" archs have been:

Undefined / uninitialized memory in some archs. I recommend running failing tests against valgrind. I added targets named make testFoo.valgrind to help debugging that. Also, temporarily, @dkogan , it would be great to add valgrind to Build-Depends and running make check_valgrind instead of make check to see all the valgrind reports from the build farm. Or, if you have Debian Developer access to the build boxes (I don't), try that locally without involving a complete DD upload...
Thresholds in numerical algorithms: many numerical methods where residuals, traces, etc. are checked (e.g. rank checking of matrix) have significant different numbers when run on i386 or other archs in comparison to amd64 / arm64, and thresholds may need to be updated or adapted, for all cases, or with conditional #if for those archs.

It's still a long track ahead, but it's worth it! 👍

from gtsam.

dkogan commented on August 19, 2024

Hi. Yes. You could run valgrind or use the Debian build infrastructure and so on. But let's start with i686. This doesn't require anything that we all don't have already, and should be very simple to reproduce. I don't have the cycles currently. Can one of you please look at it? Thanks.

from gtsam.

ProfFan commented on August 19, 2024

I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522

from gtsam.

dkogan commented on August 19, 2024

Fan Jiang ***@***.***> writes:

I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522

That looks like a good candidate. Are the patches in that PR more or less done? Should I try applying them to the Debian builds?

from gtsam.

ProfFan commented on August 19, 2024

Fan Jiang @.***> writes:
I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522
That looks like a good candidate. Are the patches in that PR more or less done? Should I try applying them to the Debian builds?

@dkogan Yes that would be a good test, let's apply and see what happens

from gtsam.

dkogan commented on August 19, 2024

The patches in #1522 fixed most of the test failures but not all. I just tried on armhf, and there's one failure still:

 69/221 Test  #69: testSymbolicFactorGraph ............***Failed    0.03 sec
Not equal:
expected:
: cliques: 4, variables: 6
expected:
- P( e0 l0 b0)
expected:
| - P( s0 | b0 l0)
expected:
| - P( t0 | e0 l0)
expected:
| - P( x0 | e0)
actual:
: cliques: 4, variables: 6
actual:
- P( e0 l0 b0)
actual:
| - P( t0 | e0 l0)
actual:
| - P( x0 | e0)
actual:
| - P( s0 | b0 l0)
./gtsam/symbolic/tests/testSymbolicFactorGraph.cpp:93: Failure: "assert_equal(asiaBayesTree, actual2)" 
There were 1 failures

Yall should test this all yourselves I think. I would start with i686. I suggest using a Debian install on your native arch (presumably amd64), crossing to i686. This is trivial to set up, and works without emulation. Let me know if you need help.

from gtsam.

dellaert commented on August 19, 2024

If you can root-cause it and suggest a fix/PR that I think would be the most helpful. Might be some cross-platform non-determinism that we have not yet encountered.

from gtsam.

dkogan commented on August 19, 2024

This has a lot of gtsam-specific complexity, and you can debug and fix this much faster than I can. Building on i686 is trivial for you to do (as described above), and you should start there. Let me know if you need help setting this up.

from gtsam.

ProfFan commented on August 19, 2024

This seems to be just an order issue in the comparison of Bayes tree, these two trees are equivalent. @dkogan I think you can mask this test in i386 first and we can work on a fix to the comparison

from gtsam.

dkogan commented on August 19, 2024

Fan Jiang ***@***.***> writes:

This seems to be just an order issue in the comparison of Bayes tree, these two trees are equivalent. @dkogan I think you can mask this test in i386 first and we can work on a fix to the comparison

OK. So to be clear I should: - Apply the 3 patches in #1522 - Disable this one test that failed with armhf Yes? I'm certain there will be other issues, but this should get us closer.

from gtsam.

ProfFan commented on August 19, 2024

Yes :)

from gtsam.

dkogan commented on August 19, 2024

Alright, I tried it. Build logs are here: https://buildd.debian.org/status/package.php?p=gtsam

Click on the red "Build-Attempted" to get the log for each particular arch. Notes:

It looks like there are two separate sets of tests. If the first set doesn't all pass, it doesn't even try to run the second set. This is a bug: it should run all the tests always so that we can get a clear idea of what's failing
armhf and armel pass all of the first set of tests (all 221 of them), but fails some of the second set. I don't know enough to comment about what the failures mean
i386 fails one of the first set of tests: it looks like it just barely misses some thresholds, so it's probably not a failure. But since it failed something in the first set, we don't know if any of the second set of tests would have failed

The other arches either succeeded, or haven't been built yet.

Can somebody please take a look? Thanks

from gtsam.

gtsam doesn't build on i386 or armhf about gtsam HOT 29 OPEN

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs