GithubHelp home page GithubHelp logo

Comments (21)

fenugrec avatar fenugrec commented on August 16, 2024

Hm, interesting edge case. I'm not well-versed in USB reset / suspend behaviour, but I think the firmware currently does nothing to detect and react to either USB Suspend or reset. In the case of a host reboot I don't think suspend is applicable...

Some of the relevant code is at https://github.com/candle-usb/candleLight_fw/blob/master/src/usbd_conf.c#L91/

After rebooting, is the device still accessible ? i.e. shows up in lsusb, and responds to e.g. "ip link set can0 down"

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

Hi @fenugrec Thanks for replying. The device is still available after reboot but at down state.
The device is visible with $ ifconfig -a and comes back when we do $ ip link set can0 up.
We can not communicate with the bus. candump does not show any packets while on the bus there is an active slave which sends continuous packets.

I will also have a look at the code snippet which we pointed out. If we can detect the USB reset / suspend, then we can do something to solve this problem.

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

That sounds a bit like some issues I had when the device is alone on the CAN bus (hence no ACK, and the peripheral gets stuck repeating always). I assume it doesn't help if you bring the interface down then up again ? (while specifying the bitrate just in case)

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

Interface the up and down did not help.

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

#46 seems to be solved with commit d13b6db

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

I'd really like to reproduce the issue here . I have two identical candlelight devices on 2 separate machines; I bring up "ip link set can0 up...." on both, then do quick "canfdtest" run to make sure they talk, then

  • stop canfdtest
  • reboot machine B
  • I can run canfdtest again without problems ?

Also tried to shutdown B, in which case the device was turned completely off. I did need to "ip link set up. .." again, but it worked.

Can you explain again how you're getting the problem ?

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

Hi @fenugrec
My usecase was as described in the following.
" A CANOpen CiA-402 slave was always on on the bus with pre-configured TPDOs. (This means that whenever the slave device boots up , it starts sending thise predefined PDOs without waiting for master). The host computer was connected to the bus using the candlelight adapter. And if the host computer reboots without rebooting the slave, the adapter goes into the stale state, where it does not receive or send any can packets . But from kernel side, the adapter is still shows as healthy network interface. "

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

Hmm. While your candlelight device was off, there was no other device to generate the ACK on the frames sent by your canopen slave - it worked OK with that ? Without ACK , it should normally switch to a "bus off" state and stop sending frames , then probably go back to pre-operational state ?

That's one thing I wasn't able to reproduce yet - currently I have 2 candlelights , and if one of them is off, the other one cannot continue sending frames since it switches to "bus off" due to no-ACK.

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

@fenugrec
" Without ACK , it should normally switch to a "bus off" state and stop sending frames , then probably go back to pre-operational state ?"

This is not true when the PDOs are mapped as periodic which was the case for me. The slave does not care if there is any other device or master is present on the bus . The slave will keep sending the frames.

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

I know what you mean, but I was thinking of the slave's low-level CAN implementation which should revert to "bus off" state as defined in CAN / iso 11898. But I guess once the other device powers up, the slave eventually returns to active state and continues sending frames ?
I think I have a device here that is more... 'perseverant' than candlelight and continues to send CAN traffic even alone on the bus.

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

@brian-brt have you ever reproduced this issue ? I tried again here with three devices on the same bus :

  • machine 1, cangen can0
  • machine 1, candump can1
  • machine 2, candump can2 (well it's actually "can0" on that machine, but to make things unambiguous)

then I suspend machine 2, resume, bring can2 up again, and everything is back to normal.

I want to duplicate this issue especially if it works "sometimes" as seems to be the case - there may be something else going on that needs to be looked at.

from candlelight_fw.

KeithBoden avatar KeithBoden commented on August 16, 2024

I am running into this (same I think) issue, even on d13b6d. I can reproduce it by:

  1. Automotive device powered on, begins sending on the can bus
  2. Power on PC with canable device, can0 up
  3. Verify that the rx packets/bytes are increasing steadily
  4. reboot -now
  5. can0 up
  6. rx packets/bytes stay at 0
  7. Power off PC and power it back on
  8. can0 up
  9. Same as 3, rx packets/bytes are increasing steadily

Willing to test, let me know how I can help!

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

@KeithBoden thanks for the reminder, I was forgetting to try to reboot . Suspend + resume was workin fine. So I've finally managed to reproduce this... After a reboot, I breaked in with the debugger , by chance in queue_pop_front :

(gdb) i s
#0  0x08001214 in disable_irq () at /home/q/d/can/candleLight_fw/src/util.c:31
#1  0x08001154 in queue_pop_front (q=0x20000a10) at /home/q/d/can/candleLight_fw/src/queue.c:92
#2  0x0800142c in main () at /home/q/d/can/candleLight_fw/src/main.c:109

Ok, single step out of queue_pop_front : it returned 0 ? weird... Looked around a bit at the hCAN flags,

(gdb) p *hCAN.instance 
$16 = {MCR = 0x44, MSR = 0xc08, TSR = 0x1c000000, RF0R = 0x1b, RF1R = 0x0, IER = 0x0, ESR = 0x0, 
  BTR = 0x1c0005, RESERVED0 = {0x40d61921 <repeats 88 times>}, sTxMailBox = {{TIR = 
.... trimmed boring part

So FIFO0 is full and overflowed according to RF0R, but otherwise probably behaving fine (FIFO1 is not used because no hardware filtering is configured, which is the only way to direct frames to FIFO1)

Then, quick look at the queues;

(gdb) p *q_frame_pool 
$22 = {max_elements = 0x40, first = 0x12, size = 0x0, buf = 0x20000908}
(gdb) p *q_from_host
$23 = {max_elements = 0x40, first = 0x0, size = 0x0, buf = 0x20000a28}
(gdb) p *q_to_host
$24 = {max_elements = 0x40, first = 0x0, size = 0x3e, buf = 0x20000b48}
(gdb) 

[EDIT - I originally misunderstood part of the queue mechanism, and changed the following comments]
As I understand, q_frame_pool is "empty" because all the frame buffers are in q_to_host. Not sure why size = 0x3E, some frames got lost somewhere - I'd expect to see 0x40, CAN_QUEUE_SIZE.

from candlelight_fw.

xdaco avatar xdaco commented on August 16, 2024

We are again affected by this issue. But the for our usecase this happening much less

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

I'm digging into this some more. It appears after a reboot, USBD_GS_CAN_DataIn() is no longer called, which is the only place where TxState is cleared. With TxState at 1, then no packets can make it to the host of course.

Not sure why USBD_GS_CAN_DataIn() stops being called though, even though the device re-enumerated properly (my breakpoint was not to blame , it triggered fine before the reboot).
I thought maybe the EP got stuck stalled, but I wasn't able to verify this - conditional breakpoint on USBD_LL_StallEP causes enumeration to fail (probably too much delay).
To be continued...

from candlelight_fw.

GaryWSmith avatar GaryWSmith commented on August 16, 2024

Using the firmware here:
https://canable.io/builds/candlelight-firmware/gsusb_canable_68df7d5.bin

The issue reported above is present.

A "sudo reboot" of the host device as opposed to a hardware power cycle results in a "hung" canable.

The only remedy appears to be a physical detachment and reattachment of the canable followed by the ip link set up command.

from candlelight_fw.

smalik007 avatar smalik007 commented on August 16, 2024

I am also facing this issue, The fix here seems to have reduced the frequency of the issue but not completely eliminated it. Has anyone found a solution yet.
In my application, it is required to restart the host computer(sudo reboot ) after a remote update, but my canable device hung up after a reboot due to this :(

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

@GaryWSmith , @smalik007 , I feel your pain, but you need to help me here.

@GaryWSmith : that 68df7d5 build is old (anything before march 2021 has the old USB stack). We don't have a functional CI setup to provide builds here, you need to compile yourself.

@smalik007 thanks for testing PR #51. But did you try it as-is or applied it to current master ? The PR is also pre-USB update; I have rebased it on a temporary banch on my fork https://github.com/fenugrec/candleLight_fw/tree/rebootfix

from candlelight_fw.

KeithBoden avatar KeithBoden commented on August 16, 2024

@fenugrec I just compiled/flashed/tested 08ab6d2 on the rebootfix branch, but am seeing the same results:

  1. Automotive device powered on, begins sending on the can bus
  2. Power on PC with canable device, can0 up
  3. Verify that the rx packets/bytes are increasing steadily
  4. reboot -now
  5. can0 up
  6. rx packets/bytes stay at 0
  7. Disconnect canable from USB, plug it back in
  8. can0 up
  9. Same as 3, rx packets/bytes are increasing steadily

Happy to test any time and many times!

from candlelight_fw.

smalik007 avatar smalik007 commented on August 16, 2024

@fenugrec , Yes I tested PR #51 on above latest master branch and locally compiled the binaries. Since the issue comes up randomly on rebooting host system, I have written a python script and added it in my cron job on reboot. The script keeps rebooting the host computer if the can msgs are receiving and as soon as the can device hang up it sends me a email. So I am easily able to reproduce the issue in an hour or so.

from candlelight_fw.

fenugrec avatar fenugrec commented on August 16, 2024

Closed by PR #94 !

from candlelight_fw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.