Comments (13)
Config: https://gist.github.com/bluecmd/82298eabea8a7a4bf7d447e263f125b5
from fboss.
My working thesis is that https://github.com/facebook/fboss/blob/master/fboss/agent/hw/bcm/oss/BcmControlPlane.cpp is missing an init of queueManager_
EDIT:
We've tried initing queueManager_ like this: dhtech@837bf7f without success. Current thesis is that controlPlane_ in BcmSwitch might not be correctly set. initTables for example seem to set it, but I cannot find any call for the function.
from fboss.
Thanks for the report - can you confirm before the update everything was working correctly? (sorry if it's a dumb question - I'm trying not to sound too surprised :-)
from fboss.
Also, unfortunately it's not trivial for me to test the OSS internally, so if you could compile with debugging options (similar to what's described here: https://stackoverflow.com/questions/7724569/debug-vs-release-in-cmake), we can get a stack trace with line numbers and can make a better guess as to what broke. Thanks for the interest!
from fboss.
We were running at the stated commit, or at least one around that commit, in June and it certainly worked. The same switch was unboxed from storage and the only thing we did was:
- Modify the config to set port speed
- Realize we have to patch fboss to support that transceiver we want to use
- Compile a new fboss from head
- Install libsodium23 from Debian stretch backports as that seems to be a new dependency
- Start fboss - boom
We can certainly compile a version at the previously stated commit and run with it to verify the bisection window, and we will change the config back to the one we ran successfully in June.
from fboss.
And we will get back to you with the result of the cmake -DCMAKE_BUILD_TYPE=Debug
run
from fboss.
This is the log for the startup based on:
dhtech@r1a0:~/fboss$ git status
HEAD detached at e440fcd
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: fboss/agent/ApplyThriftConfig.cpp
modified: getdeps.sh
no changes added to commit (use "git add" and/or "git commit -a")
dhtech@r1a0:~/fboss$ git diff
diff --git a/fboss/agent/ApplyThriftConfig.cpp b/fboss/agent/ApplyThriftConfig.cpp
index cc087e7..79f2bd8 100644
--- a/fboss/agent/ApplyThriftConfig.cpp
+++ b/fboss/agent/ApplyThriftConfig.cpp
@@ -328,7 +328,7 @@ shared_ptr<SwitchState> ThriftConfigApplier::run() {
// Make sure there is a one-to-one map between vlan and interface
// Remove this sanity check if multiple interfaces are allowed per vlans
auto& entry = vlanInterfaces_[vlanInfo.first];
- if (entry.interfaces.size() != 1) {
+ if (entry.interfaces.size() > 1) {
auto cpu_vlan = newState->getDefaultVlan();
if (vlanInfo.first != cpu_vlan) {
throw FbossError("Vlan ", vlanInfo.first, " refers to ",
diff --git a/getdeps.sh b/getdeps.sh
index ec60709..19f8fc2 100755
--- a/getdeps.sh
+++ b/getdeps.sh
@@ -117,9 +117,9 @@ NPROC=$(grep -c processor /proc/cpuinfo)
fi
# iproute2 v4.4.0
update https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git 7ca63aef7d1b0c808da0040c6b366ef7a61f38c1
- update https://github.com/facebook/folly.git
- update https://github.com/facebook/wangle.git
- update https://github.com/facebook/fbthrift.git
+ update https://github.com/facebook/folly.git v2018.06.04.00
+ update https://github.com/facebook/wangle.git v2018.06.04.00
+ update https://github.com/facebook/fbthrift.git v2018.06.04.00
update https://github.com/no1msd/mstch.git
update https://github.com/facebook/zstd.git
update https://github.com/google/googletest.git release-1.8.0
That should be the state where we ran it in June.
This is the startup log:
dhtech@wedge1:~$ sudo /usr/local/bin/wedge_agent -mode=wedge -mgmt_if=ma1 -config=/etc/wedge.json
E0928 17:48:54.393641 23401 WedgeProductInfo.cpp:136] json parse error on line 0: expected json value
E0928 17:48:54.393896 23401 WedgeProductInfo.cpp:67] json parse error on line 0: expected json value
DMA pool size: 16777216
PCI unit 0: Dev 0xb850, Rev 0x03, Chip BCM56850_A2, Driver BCM56850_A0
Initializing platform
Device Configuration - SUCCESS!
SOC unit 0 attached to PCI device BCM56850_A2
Boot flags: Cold boot
rc: unit 0 device BCM56850_A2
open /dev/linux-bcm-knet: : No such file or directory
rc: MMU initialized
rc: L2 Table shadowing enabled
rc: Port modes initialized
Common SDK init completed
E0928 17:49:08.767557 23439 QsfpCache.cpp:166] Exception talking to qsfp_service: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0928 17:49:08.795991 23439 WedgePort.cpp:85] Error retrieving info for transceiver 0 Exception: St13runtime_error: Transceiver 0 not in cache
E0928 17:49:08.901234 23439 WedgePort.cpp:85] Error retrieving info for transceiver 0 Exception: St13runtime_error: Transceiver 0 not in cache
The transceiver error is due to I'm not having qsfp_service
running, I currently have a weird QSFP in which makes qsfp_service
crash, but I don't believe it's relevant here (and I'm remotely present so I cannot unplug it).
Building from master:
# docker run --name fboss_build_phase2 -v `pwd`:/tmp/code fboss/tmp /bin/sh -c "cd /tmp/code && mkdir -p build && cd build && cmake -DCMAKE_BUILD_TYPE=Debug .. && make -j`grep -c processor /proc/cpuinfo`"
dhtech@wedge1:~$ sudo /usr/local/bin/wedge_agent -mode=wedge -mgmt_if=ma1 -config=/etc/wedge.json
E0928 18:20:48.462908 23573 WedgeProductInfo.cpp:140] json parse error on line 0: expected json value
E0928 18:20:48.463168 23573 WedgeProductInfo.cpp:67] json parse error on line 0: expected json value
I0928 18:20:48.507074 23573 Main.cpp:338] serving on localhost on port 5909
DMA pool size: 16777216
PCI unit 0: Dev 0xb850, Rev 0x03, Chip BCM56850_A2, Driver BCM56850_A0
Initializing platform
Device Configuration - SUCCESS!
SOC unit 0 attached to PCI device BCM56850_A2
Boot flags: Cold boot
rc: unit 0 device BCM56850_A2
open /dev/linux-bcm-knet: : No such file or directory
rc: MMU initialized
rc: L2 Table shadowing enabled
rc: Port modes initialized
Common SDK init completed
I0928 18:21:02.347016 23593 BcmSwitch.cpp:559] Initializing BcmSwitch for unit 0
*** Aborted at 1538158862 (unix time) try "date -d @1538158862" if you are using GNU date ***
PC: @ 0x564a3693b2d4 std::unique_ptr<>::get()
*** SIGSEGV (@0x10) received by PID 23573 (TID 0x7f70048f3700) from PID 16; stack trace: ***
@ 0x7f701be5e0c0 (unknown)
@ 0x564a3693b2d4 std::unique_ptr<>::get()
@ 0x564a3693b2f2 std::unique_ptr<>::operator->()
@ 0x564a369366a8 facebook::fboss::BcmControlPlane::getMulticastQueueSettings()
@ 0x564a3692470a facebook::fboss::BcmSwitch::getColdBootSwitchState()
@ 0x564a36926ccc facebook::fboss::BcmSwitch::init()
@ 0x564a36aae464 facebook::fboss::SwSwitch::init()
@ 0x564a369e2a03 facebook::fboss::Initializer::initImpl()
@ 0x564a369e2322 facebook::fboss::Initializer::initThread()
@ 0x564a369eab32 _ZSt13__invoke_implIvRKMN8facebook5fboss11InitializerEFvvEPS2_JEET_St21__invoke_memfun_derefOT0_OT1_DpOT2_
@ 0x564a369eaabf _ZSt8__invokeIRKMN8facebook5fboss11InitializerEFvvEJPS2_EENSt9result_ofIFOT_DpOT0_EE4typeESA_SD_
@ 0x564a369eaa70 _ZNKSt12_Mem_fn_baseIMN8facebook5fboss11InitializerEFvvELb1EEclIJPS2_EEEDTcl8__invokedtdefpT6_M_pmfspcl7forwardIT_Efp_EEEDpOS8_
@ 0x564a369eaa17 _ZNSt12_Bind_simpleIFSt7_Mem_fnIMN8facebook5fboss11InitializerEFvvEEPS3_EE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
@ 0x564a369ea853 std::_Bind_simple<>::operator()()
@ 0x564a369ea61a std::thread::_State_impl<>::_M_run()
@ 0x7f7014f45e6f (unknown)
@ 0x7f701be54494 start_thread
@ 0x7f70146baacf clone
@ 0x0 (unknown)
Segmentation fault
Not sure why the debug didn't take, do I need to do anything more than passing that to cmake?
from fboss.
Sorry for the slow reply to this. I'm also surprised that the debug symbols didn't pop up for this - let me figure that out in parallel. Looking at the code and the error, I think the patch below may help. I'm trying to test it locally, but our ability to emulate a non-facebook setup inside facebook is limited :-(
If you get a chance to test this before I do, please let me know if it fixes the problem.
diff --git a/fbcode/fboss/agent/hw/bcm/oss/BcmControlPlaneQueueManager.cpp b/fbcode/fboss/agent/hw/bcm/oss/BcmControlPlaneQueueManager.cpp
--- a/fbcode/fboss/agent/hw/bcm/oss/BcmControlPlaneQueueManager.cpp
+++ b/fbcode/fboss/agent/hw/bcm/oss/BcmControlPlaneQueueManager.cpp
@@ -20,8 +20,9 @@
std::shared_ptr<PortQueue> BcmControlPlaneQueueManager::getCurrentQueueSettings(
cfg::StreamType /*streamType*/,
- opennsl_cos_queue_t /*cosQ*/) const {
- return std::shared_ptr<PortQueue>{};
+ opennsl_cos_queue_t cosQ) const {
+ // stub implementation - depends on newer OpenNSL
+ return std::make_shared<PortQueue>(cosQ);
}
from fboss.
also, if you could please create a new issue for the qsfp_service segfault. Even if it is because you have a funky optic, it still shouldn't segfault. I won't promise I can fix it promptly, but still good to track.
Thank you again!
from fboss.
Done, #77 . I'm compiling with your patch right now - I had to pull my OpenNSL 3.5.0.1 changes as well, but I guess they should be orthogonal to this issue.
from fboss.
@capveg The patch seems to not work sadly, https://gist.github.com/bluecmd/bd16185170dff642de197e34349aa14c
I wish the stack trace could be more useful. Granted, that particular build I did not run with the cmake debug settings, I can try to re-do that and spend some time to see if I can get line numbers if you feel it would be the next logical step.
from fboss.
Quick note to say that I'll echo this behavior (and this trace) on a Wedge 100 and using the master current as of around a month ago. Stack trace is similarly not very useful. I did try to apply the patch described above with no useful results.
In our case, we also don't have a functional FBOSS version to revert to, 'cause the infrastructure for a Wedge 100 FBOSS on Open Network Linux seems to be pretty broken at the moment (for quite a few other reasons). I will try a build of e440fcd and seeing if that gets us going, though.
from fboss.
Still have crash on startup on EventBase::bumpHandlingTime
in 2021 facebook/flipper#2577
from fboss.
Related Issues (20)
- [build] build failed on switch(centos 7)
- ModuleNotFoundError: No module named 'fboss.fb_thrift_clients' HOT 1
- question for HSDK support
- Build Error: TransientFailure: Failed to download http://localhost:8000/opennsa-6.5.22.tgz
- Build error (Missing CONTAINER.tar) for Docker Container for Debian 10
- Build error .
- i2c_smbus_* not defined HOT 4
- Port numbering in FBOSS Python Tool output HOT 4
- Does`parserType` set to `0` works as expected ? HOT 2
- Logging in qsfp_service HOT 1
- OpenNSL 3.5.0.1 report HOT 21
- qsfp_service crashes on bad modules
- Using interface ID 1000 crashes wedge_agent
- Make Wedge transparent to VLANs
- compile fboss fail:fboss/agent/hw/bcm/BcmHostKey.cpp.o' failed
- fboss build failure HOT 5
- Unable to run SAI fake tests HOT 3
- no limitation for "%s" while calling fscanf() HOT 2
- Please make tests conditional on the cmake option BUILD_TESTING, and make benchmark also conditional on some cmake variable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fboss.