GithubHelp home page GithubHelp logo

Comments (13)

mplzik avatar mplzik commented on July 22, 2024 1

@vonericsen this looks really good; the power state seems to change now:

# ./openSeaChest_PowerControl --checkPowerMode -d /dev/sg2
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.4.0-6_1_1 X86_64
 Build Date: Nov 14 2023
 Today: Wed Nov 15 07:43:10 2023	User: root
==========================================================================================

/dev/sg2 - 004-2M2101 -  - 0125 - ATA
Device is in the PM1: Idle state and the device is in the Idle_c power condition

# find /mnt/disk >/dev/null
^C
# ./openSeaChest_PowerControl --checkPowerMode -d /dev/sg2
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.4.0-6_1_1 X86_64
 Build Date: Nov 14 2023
 Today: Wed Nov 15 07:44:10 2023	User: root
==========================================================================================

/dev/sg2 - 004-2M2101 -  - 0125 - ATA
Device is in the PM1: Idle state and the device is in the Idle_a power condition

Huge thanks for fixing that! When looking at other things that might look off I noticed that openSeaChest_PowerControl --SATInfo is not reporting back the model number, despite of smartctl -a providing one; let me know if it makes sense to fix this and I'll be happy to do any kind of testing.

As for the power management issues, I don't see any kernel messages indicating USB device reset and since the problem also came up when the drive was not mounted and LVM was not set up, I'd suspect the enclosure polling the drive in an way that prevents it from spinning down -- it's a multi-drive enclosure that supports some level of RAID, so I'd guess it might be doing some magic in JBOD mode as well.

Also, hats off for doing all this debugging using a test suite. Hardware is a nasty stuff when it comes to various quirks and suite able to catch majority of odd behavior is definitely an invaluable tool.

from openseachest.

mplzik avatar mplzik commented on July 22, 2024 1

Thanks a lot; with the change, I was able to read the model number correctly in the ATA section of --SATInfo output; attaching the output.

# ./openSeaChest_PowerControl -d /dev/sg2 --SATInfo >SATInfo.txt

SATInfo.txt

from openseachest.

mplzik avatar mplzik commented on July 22, 2024 1

Definitely will report if some something odd shows up; huge thanks for the help. :) Also, as for not getting responses from the disk -- it did not escalate to actually losing connection to the disk itself (regular operations still worked), at least not any way I would notice.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@mplzik,
Thanks for sharing that output!
It looks like there should be a way to get the command completion based on what I see in the results.

I'll work on adding this into our code and push it as soon as I can.
Once I get it in and the CI builds it, I will share a build for you to test out and let me know if it seems to be working properly.

from openseachest.

mplzik avatar mplzik commented on July 22, 2024

Huge thanks. Also, not being familiar with this problem domain -- is this something that should be implemented as any kind of quirk for linux kernel (should a kernel bug be filed as well)?

Also, just out of curiosity, since I noticed this issue when trying to power the disk down when not in use. Is there a chance that the power management parameters are not being configured properly, preventing the drive from entering the deepest sleep for longer time, or there's a fair chance that the enclosure itself emits some commands that are waking up the drive from its deepest sleep levels?

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

is this something that should be implemented as any kind of quirk for linux kernel (should a kernel bug be filed as well)?

Not necessarily. This software (and others like hdparm and smartmontools) are attempting to talk through the adapter to the drive itself rather than letting the adapter talk to the drive by using a special command called SAT ATA passthrough.
While there is a part of the linux kernel that does some ATA passthrough, for the most part it just uses standard SCSI commands to read, write, flush, and identify the drive. ATA passthrough is better thought of as an optional feature rather than a requirement....it is super helpful for diagnostics and data collection among other things though.
In an ideal world the adapter would be capable of all the translations defined in the SAT (SCSI to ATA Translation) specification which would include the ability to configure power mode timers (at least from SAT-3 and later. I can't remember if earlier translations supported this).

There may be part of the Linux kernel that issues SAT ATA passthrough commands, but you would need to provide them more detail about how to work with this adapter, which they may or may not do. I doubt the kernel would be using ATA passthrough to spin a drive down...it's more likely that the SCSI Start-stop-unit command is being used to enter standby or letting the drive's timer expire.
The changes I have made were to adjust how the command results are returned to the software.
What this adapter does is it accepts a bit in the command called "check condition" which is meant to return the ATA drive's command results, but instead this adapter is returning zeroes.
So the workaround in this case is to use the SAT ATA passthrough with the protocol field set to "Return response information" but a secondary workaround is also needed to say "ignore the value of the extend bit" in that response because it is not being set correctly by the adapter.
With both of these changes in place, it should correct the problem with openSeaChest.

The part of that log I asked for that tells all these is at the end:

TURF:11
SCSI Hacks: RW6, RW10, RW16, NLP, NMSP, SUPSOP, REPALLOP, MXFER:1048576
ATA Hacks: SAT, A1, RS, RSTD, RSIE, TPSIU, CHKE, MPTXFER:130560

These are all short-hand ways to describe the workarounds necessary to get the maximum capability from the adapter.
I wrote this test after years of trial and error and manually debugging these issues.
It finds most issues and eliminates most manual debugging at this point, but every now and then I do come across a new adapter that this automated test just will not work correctly with.

Is there a chance that the power management parameters are not being configured properly, preventing the drive from entering the deepest sleep for longer time, or there's a fair chance that the enclosure itself emits some commands that are waking up the drive from its deepest sleep levels?

There are likely multiple factors that could be coming into play here.
If the ATA passthrough commands are going through, then the changes to the timers should be taking effect.
If these are EPC timers (I think so from what you had in the other issue), then these should be able to be saved by the drive to follow. The pre-EPC changes to standby timers and APM are volatile, so they are not saved by the device when it is power cycled.
I have seen and tested some adapters that do query the drive from time to time which cause it to spin back up in some cases.
There are some parts of the OS kernel that may also ping the drive from time to time. It can be for a variety of reasons, but a common one is to check SMART status. I don't think SMART status is the case here though since in general most OS's stick to the SCSI command set and I didn't see a complete SMART translation in the log you shared. So the linux kernel itself may try a passthrough request, but even doing this to check SMART is unlikely to spin up the drive in most cases.
The last thing that can happen is if for some reason the USB adapter, or another one on the same USB controller is causing some kind of trouble, the OS could reset it. This reset will get passed to the adapter and most likely once it is complete the OS starts device discovery again. This always seems to include reading the MBR/GPT partition table to see if a device has anything to mount. This read will definitely cause a drive to spin up.
In Linux, if there was a reset then it would be logged in dmesg.
There could be other causes like some daemons or other software just checking the drive to see if it has a filesystem to write to that are causing the drive to spinup by trying to read it.
There is not enough information here to really know for sure what is causing it to spin back up right now.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@mplzik,

Can you test this attached build and let me know if it corrects the issue you were seeing with your adapter?
openSeaChest-release-Release-23.09-linux-x86_64-portable.tar.xz.zip

I had to zip it to attach it 🙄 , so after unzipping it will be a tar.xz file to decompress, but this should be a portable build.
Test the --checkPowerMode option and feel free to do other additional testing for other things like SMART if you would like and see if it seems to be reporting as you would expect.
If something seems off, can you attach a verbose dump from the tool?
Example:

openSeaChest_Basics -d <handle> -i -v 4 > verboseIdent.txt

As long as the -v 4 is on the command line it will dump a verbose output of the raw commands and results so I can see if anything else looks off.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@mplzik,
Glad to hear that helped!

Huge thanks for fixing that! When looking at other things that might look off I noticed that openSeaChest_PowerControl --SATInfo is not reporting back the model number, despite of smartctl -a providing one; let me know if it makes sense to fix this and I'll be happy to do any kind of testing.

Yeah, this should be fixed too. I might need to tweak a setting or two with what I did due to a possible false positive in the passthrough test. It could be the TPSIU workaround is not correct...that is most likely to cause this, but I think the verbose output could help confirm that.
Can you run this and post the output here?

openSeaChest_PowerControl -d <handle> -i --SATInfo -v 4 > verboseInfo.txt

from openseachest.

mplzik avatar mplzik commented on July 22, 2024

Sure. I'm using the binaries you sent me:

# ./openSeaChest_PowerControl -d /dev/sg2 -i --SATInfo -v 4 > verboseInfo.txt

verboseInfo.txt

nit: w.r.t. my power management issue, looks like I had a service running that would periodically run smartctl -a on the drive, which made it spin up again. After disabling the service, it looks like the drive stays in standby_z until something accesses it.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Thanks for the log.
Some command is causing the adapter to stop responding for some reason.
It starts this issue when attempting to read the extended self-test results log (7h) for some odd reason.
It completed reading the previous log without issues.

Here are two more things to try with that build I shared earlier to help troubleshoot before I make more changes. Unplug the adapter, then plug it back in between each of these runs since the end of the log is showing an error like it completely stopped responding.

openSeaChest_PowerControl -d <handle> -i --SATInfo -v 4 --forceATADMA > verboseInfoFDMA.txt

openSeaChest_PowerControl -d <handle> -i --SATInfo -v 4 --forceATAPIO > verboseInfoFPIO.txt

These force openSeaChest to use a different protocol in the command which may work around this or rule out some weird behavior I've seen on other adapters in the past.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

I've made one more tweak that may or may not help.
If you can give this a try and report the verboseInfo from the --SATInfo again, that would help me to mark this as done/resolved.
I don't like seeing the adapter stop responding like in your last log and want to get the tweaks just right to keep it functioning properly.

openSeaChest-release-Release-23.09-linux-x86_64-portable.tar.xz.zip

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Awesome! I'm glad that worked!
This output looks a lot better and more like what I would expect and most likely resolved the adapter crash from the previous build.

I'll pull this into the release I'm working on currently.
Feel free to use that last build I shared with you for now since it includes all the fixes and features I've pulled in so far.
I'm hoping to get it wrapped up in a couple of weeks.
I'm not aware of any other major issues in it at this time.

If you run into any other issues, please report them and we'll do our best to resolve them!

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Marking this as closed since this code is in the latest release (v23.12) and it has been merged to both master and develop branches.

If you have any other trouble, please reopen this issue or create a new one and we will do our best to resolve it!

from openseachest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.