GithubHelp home page GithubHelp logo

Comments (10)

mkeeter avatar mkeeter commented on July 17, 2024

Testing PHY reinitialization with a long wait time (10s) still leaves the system stuck. This leads me to suspect the VSC7448, since it's not being power cycled, but getting ground-truth readings is going to be essential for debugging.

from hubris.

refugeesus avatar refugeesus commented on July 17, 2024

:( ok. Someone will need to probe the board in the office asap

from hubris.

refugeesus avatar refugeesus commented on July 17, 2024

OK so sad news, our osc3 does a thing that makes it output the incorrect frequency sometimes. See this note from microshit:
https://ww1.microchip.com/downloads/en/DeviceDoc/DSC11xx-Family-Silicon-Errata-DS80000982A.pdf

Unfortunately the parts which tri-state are unobtainable or have very long lead times.

from hubris.

refugeesus avatar refugeesus commented on July 17, 2024

This is a happy clock:
image

This is a sad clock:
image

Unfortunately sometimes we see a sad clock...

from hubris.

refugeesus avatar refugeesus commented on July 17, 2024

The signal off freq which is a symptom of the above problems:
image

This is VSC7448 side of our link

from hubris.

nathanaelhuffman avatar nathanaelhuffman commented on July 17, 2024

Unfortunately, I don't think it's wise to plan any more rework on this pass of sidecar. As discussed on the hardware tactical today, given the ~2% boot failure rate, we're proposing that the sidecar power-cycle the qsfp board (software workaround) in the cases where this issue is detected. @arjenroodselaar is signed up to scope out that work.

This isn't awesome and we should consider using a different part in the future.

from hubris.

refugeesus avatar refugeesus commented on July 17, 2024

First, we are going to try and power cycle the Front IO board from Sidecar.

Alternatively, per our huddle, I plan to sever the FPGA's connection to the enable of our current osc (in a reparable way), and we will attempt to work with the VSC when we violate it's sequencing instructions (typically want's power before refclk)

from hubris.

arjenroodselaar avatar arjenroodselaar commented on July 17, 2024

An update on this issue; https://github.com/oxidecomputer/hubris/tree/front_io_bad_osc contains changes across the sequencer task, monorail task and the controller bitstreams to work around this issue. This is currently running in a loop where the system is power cycled and the links are checked afterwards. So far the monorail task has detected two instances where the QSGMII link did not come up and the front IO board needed to be power cycled and the PHY reinitialized to work around the problem. Afterwards the QSGMII link and technician ports worked as intended.

This will take a few days to get through review, but so far a software workaround seems adequate.

from hubris.

arjenroodselaar avatar arjenroodselaar commented on July 17, 2024

This ran overnight and 1464 power cycles of Sidecar were done. During 57 of those cycles monrail-server determined the QSGMII link not functional and requested one or more power cycles of the front IO board from the sequencer. Once the QSGMII link came up ping tests using both technician ports succeeded in all 1464 cycles.

from hubris.

mkeeter avatar mkeeter commented on July 17, 2024

Done in #1449

from hubris.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.