GithubHelp home page GithubHelp logo

Comments (23)

vonericsen avatar vonericsen commented on July 22, 2024 2

The background activity cannot be stopped.
As far as I know IDD will not affect the running background tasks (other than maybe pausing them while it runs).

Background activity is vendor unique, but it is running this background task to ensure reliability and data integrity.
You can "force" this into the foreground by writing to the drive. Each time you write data to it, the firmware takes care of doing the reliability and data integrity tasks that otherwise run in the background after a fast-format.
Fast format just skips over requiring a full drive overwrite to allow using the disks and writing your own data to it. Waiting for these background tasks to complete is not needed and not recommended. Start writing data and it will take care of everything it needs to protect that data as it is written.
If you were to write the whole drive with zeroes (or any other pattern/data) after a fast-format, then there would be none of this background activity...but then you are waiting to write a drive with zeroes before putting your own data onto it.

The reason manually changing the power mode works is because background tasks will be paused when the host is specifying what the drive needs to do. Starting to read or write the drive will pause these activities until the drive gets idle time to continue (if there is still more that it needs to do). EPC timers are suspended until critical background tasks like these are completed but you can always force the drive into a standby, idle, or sleep state if needed.

In the latest ATA and SCSI standards there is a new feature called Advanced Background Operation which could be used to provide some more control over background tasks. This is a brief summary from the ACS-6 draft I have:

The Advanced Background Operation feature set allows the host to indicate when advanced background operations may be performed while limiting impact to other host initiated activities.
Advanced background operations include both host-initiated and device-initiated advanced background operations

EXAMPLE - Advanced background operations may include NAND block erase operations, media read operations,
and media write operations (e.g., garbage collection), that may impact response time for normal read
requests or write requests from the host.

It defines limits the device must comply with for when to start background tasks itself and allows the host to manually start these tasks as it needs to.
I don't think this feature is currently supported by Seagate firmware, but it may be in the future.

from openseachest.

DebabrataSTX avatar DebabrataSTX commented on July 22, 2024 1

Hi, Thanks for sharing these logs. It helps me to share it to a larger audience. I will update you once I get more info on it.

from openseachest.

Kjubyte avatar Kjubyte commented on July 22, 2024 1

Hi, I have news about this issue.

The EPC timers started working on the first of my two test drives. I put it into my storage server with my LSI HBA and after around 260h power on hours the timers start working again. (I changed the sector size after I received the drive and I didn't check the timers every hour, so it could be more like 200h, the drive was always idle, except for a long smart test)
Also the second drive works again. After one more week in my test server the timers work again. Should be also around 200h idle time after changing sector size.

So this issue is resolved. There is no issue with openSeaChest, but maybe you could add a note somewhere about these not working EPC timers after changeing sector size? I guess other drives have similar "issues" and maybe the time after the timers work again scales with the capacity of the drive? Maybe you could get verification of one of your HDD engineers?

Anyway, thank you for your help. I really appreciate it.

In #111 we had a similar issue reported.

I saw this issue before I posted a new issue. All my HBAs running the most recent firmware. Also I tested this with HBA and directly connected to a SATA controller (in this case AMD Ryzen Zen 3 build-in controller and an older Supermicro board). Maybe the solution in the referenced issue wasn't the controller firmware upgrade itself but also idle time?

Feel free to close this issue now, if you think a note isn't necessary. But such a note somewhere would have saved me a lot of time. :)

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024 1

So this issue is resolved. There is no issue with openSeaChest, but maybe you could add a note somewhere about these not working EPC timers after changeing sector size? I guess other drives have similar "issues" and maybe the time after the timers work again scales with the capacity of the drive? Maybe you could get verification of one of your HDD engineers?

I think this is a good reason to keep this open until we learn a bit more from the firmware team about what could have caused this. If there is a background task after fast format, that would explain why is suddenly started working when it finally finished, but I think our team needs to know the detail of what that task would be doing to set an appropriate message for the user.

I saw this issue before I posted a new issue. All my HBAs running the most recent firmware. Also I tested this with HBA and directly connected to a SATA controller (in this case AMD Ryzen Zen 3 build-in controller and an older Supermicro board). Maybe the solution in the referenced issue wasn't the controller firmware upgrade itself but also idle time?

I guess this is possible. There was not enough info in that issue to know how much time elapsed to see if it's related to the observations in this issue or not...the HBA firmware did fix some phy issues so that could very well have had an impact too. It's hard to say without more information right now.

I'm going to keep this open for now until we can figure out where a message should be displayed about this...and what it should say since this may impact other features as well

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024 1

TL;DR: If your drive doesn't respect EPC timers and doesn't go into idle / standby, try running an IDD short test (openSeaChest_SMART -d $dev --idd short) on it.

Hello, I'm here to add another data point, and maybe hint at a solution.

To give a bit of context, I recently purchased 3 x Seagate IronWolf 8TB, model number ST8000VN004-3CP101. Initially, I plugged them into the motherboard SATA controller, and EPC was working fine on all drives; I configured timers for idle_c and standby_z, so after the largest timer-period of inactivity they were all getting into the Standby_z power condition.

But now I wanted to install Proxmox on the metal, and pass-through (at the controller level) the devices to a VM running a NAS distribution. Although the motherboard has a pretty good IOMMU isolation, the SATA controllers were in the same group with other chipset devices, so I couldn't pass them through.

So I got an LSI SAS9212-4i(4e), and after moving the drives to it, only one of three was still respecting the EPC timers, the other two remained in the dreaded "PM0: Active state or PM1: Idle State". That's when I found out that the controller was in IR ("Raid") mode, and although it was passing through the drives just fine, I thought that maybe this mode interfered with EPC somehow - see this issue for a similar story. Unfortunately, after an entire adventure of flashing the latest IT ("Initiator Target") firmware, the two drives still refused to respect the EPC timers.

As an aside, while trying to figure out the issue, I went headfirst into a red herring - ext4 lazy initialization, which somehow I had no idea about. As I already had data on the filesystem and couldn't easily recreate it, I worked around it and went back to troubleshooting.

As one does, I tried with different power and SATA cables, ports, random commands, both in openSeaChest_* and lsiutil, enabling and disabling settings and reconfiguring things, to no avail - EPC was working fine with the integrated SATA controller, but not for the two drives with the LSI one. I got quite frustrated and switched to other tasks, and after a day or so, noticed that one of the two drives suddenly started working (I think after a disable - enable EPC cycle, but I'm not entirely sure).

So now there was a single drive refusing to go idle. What was even more frustrating, now it didn't even work back with the integrated controller! Looking for a way to "reset" it, I somehow thought about the SMART self tests, and looking at openSeaChest_SMART's help, I stumbled upon --idd - "Start an In Drive Diagnostic (IDD) test on a Seagate drive.".

I ran it, not expecting much:

openSeaChest_SMART -d $dev --idd short
# [...]
# The In Drive Diagnostics (IDD) test will take approximately  2 minutes
# *there's an extra space there, btw                         ^^*
# *approx. 2 minutes later*
# IDD - short - completed without error!

And, lo and behold, immediately after this, it entered the Standby_a mode! 🎉 Maybe it was a one-off, or maybe whatever the IDD test did fixed whatever was keeping the drive from going into idle / standby.

Sorry for the long comment, I just felt like sharing my wild adventures in storage land.

from openseachest.

DebabrataSTX avatar DebabrataSTX commented on July 22, 2024

@Kjubyte, Thanks for sharing this observation with us.
Can you run an identify command (seachest -d /dev/sg0) on the drive with issue and confirm if it has power management enabled. The output shoud be something as follows.

    Features Supported:
            SATA NCQ
            SATA Software Settings Preservation [Enabled]
            HPA
            Power Management
            Security
            SMART [Enabled]
            DCO
            48bit Address

I need a bit more help to identify if the issue is related to tool or drive FW.

  1. Take a good drive with no issue, Make sure the sector size is 512b.
  2. Run identify on that drive and check the drive support power management.
  3. Do the EPC time experiment to show the transition from idle_a to idle_c.
  4. Set the sector size to 4k
  5. Run identify on that drive and check the drive support power management.
  6. Do the EPC time experiment to show the transition from idle_a to idle_c.
  7. Set the sector size to 512b
  8. Run identify on that drive and check the drive support power management.
  9. Do the EPC time experiment to show the transition from idle_a to idle_c.
  10. Change Drive FW
  11. Run identify on that drive and check the drive support power management.
  12. Do the EPC time experiment to show the transition from idle_a to idle_c.
  13. Power cycle the Drive.
  14. Run identify on that drive and check the drive support power management.
  15. Do the EPC time experiment to show the transition from idle_a to idle_c.

Step 1-6 will help us to identify the main issue. We need this info to push it to the right direction.
Step 7-9 will confirm if the issue is reversible with changing sector size. This is optional step.
Step 10-12 will confirm if changing to a different FW version fix the issue. This is also an optional step.
Step 113-15 will confirm if a power cycle fix the issue or not. Again it is an optional step.

from openseachest.

Kjubyte avatar Kjubyte commented on July 22, 2024

Hi, thank you for help me to break down this issue.

Of course I can do that. With "identify command" you are talking about "seachest -d /dev/sg0 -i", right? I don't want to make any mistakes, as I currently have only one drive with working EPC. If you want to add more tests, please let me know. I'll do the tests this evening.

A few questions I can already answer - I already did a lot of testing, but I'll do the tests again later:

Can you run an identify command (seachest -d /dev/sg0) on the drive with issue and confirm if it has power management enabled.

Yes, power management is enabled. In the past I also tried to to disable power management and enable it again with --EPCfeature disable/enable with no effect at all. Manual power transition with --transitionPower idle_c always works.
Snippet of the output:

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
...
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC [Enabled]
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Storage Element Depopulation + Restore
                Field Accessible Reliability Metrics (FARM)
                Seagate In Drive Diagnostics (IDD)
...
root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg1 -i
...
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC [Enabled]
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Storage Element Depopulation + Restore
                Field Accessible Reliability Metrics (FARM)
                Seagate In Drive Diagnostics (IDD)
...
Full output
root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.3.1-4_1_1 X86_64
 Build Date: Mar 27 2023
 Today: Wed Jul  5 12:15:30 2023        User: root
==========================================================================================

/dev/sg0 - ST16000NM000J-2TW103 - ... - SN02 - ATA
        Model Number: ST16000NM000J-2TW103
        Serial Number: ...
        Firmware Revision: SN02
        World Wide Name: 5000C500E53D60E1
        Date Of Manufacture: Week 26, 2022
        Drive Capacity (TB/TiB): 16.00/14.55
        Native Drive Capacity (TB/TiB): 16.00/14.55
        Temperature Data:
                Current Temperature (C): 29
                Highest Temperature (C): 46
                Lowest Temperature (C): 25
        Power On Time:  5 days 21 hours 
        Power On Hours: 141.00
        MaxLBA: 31251759103
        Native MaxLBA: 31251759103
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 7200
        Form Factor: 3.5"
        Last DST information:
                DST has never been run
        Long Drive Self Test Time:  23 hours 39 minutes 
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 6.0
        Annualized Workload Rate (TB/yr): 0.01
        Total Bytes Read (MB): 103.04
        Total Bytes Written (KB): 12.29
        Encryption Support: Not Supported
        Cache Size (MiB): 256.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Disabled
        SMART Status: Good
        ATA Security Information: Supported, Frozen
        Firmware Download Support: Full, Segmented, Deferred
        Specifications Supported:
                ACS-4
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                SATA 3.3
                SATA 3.2
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC [Enabled]
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Storage Element Depopulation + Restore
                Field Accessible Reliability Metrics (FARM)
                Seagate In Drive Diagnostics (IDD)
        Adapter Information:
                Adapter Type: PCI
                Vendor ID: 1022h
                Product ID: 43C8h
                Revision: 0001h

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg1 -i
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.3.1-4_1_1 X86_64
 Build Date: Mar 27 2023
 Today: Wed Jul  5 12:16:08 2023        User: root
==========================================================================================

/dev/sg1 - ST16000NM000J-2TW103 - ... - SN04 - ATA
        Model Number: ST16000NM000J-2TW103
        Serial Number: ...
        Firmware Revision: SN04
        World Wide Name: 5000C500E47616E2
        Date Of Manufacture: Week 11, 2023
        Drive Capacity (TB/TiB): 16.00/14.55
        Native Drive Capacity (TB/TiB): 16.00/14.55
        Temperature Data:
                Current Temperature (C): 27
                Highest Temperature (C): 34
                Lowest Temperature (C): 25
        Power On Time:  1 hour 
        Power On Hours: 1.00
        MaxLBA: 31251759103
        Native MaxLBA: 31251759103
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 7200
        Form Factor: 3.5"
        Last DST information:
                DST has never been run
        Long Drive Self Test Time:  23 hours 13 minutes 
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 6.0
        Annualized Workload Rate (TB/yr): 0.08
        Total Bytes Read (MB): 8.59
        Total Bytes Written (B): Not Reported
        Encryption Support: Not Supported
        Cache Size (MiB): 256.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Disabled
        SMART Status: Good
        ATA Security Information: Supported, Frozen
        Firmware Download Support: Full, Segmented, Deferred
        Specifications Supported:
                ACS-4
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                SATA 3.3
                SATA 3.2
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC [Enabled]
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Storage Element Depopulation + Restore
                Field Accessible Reliability Metrics (FARM)
                Seagate In Drive Diagnostics (IDD)
        Adapter Information:
                Adapter Type: PCI
                Vendor ID: 1022h
                Product ID: 43C8h
                Revision: 0001h

Step 7-9 will confirm if the issue is reversible with changing sector size. This is optional step.
Step 10-12 will confirm if changing to a different FW version fix the issue. This is also an optional step.
Step 113-15 will confirm if a power cycle fix the issue or not. Again it is an optional step.

It's not reversible. At least I couldn't find a way. I tried changing the sector size back to 512b. I tried different firmware versions (I updated the test drive sg0 to SN04 and back to SN02) and I also did multiple power cycles. But I'll do the tests again, so we have a proof.

from openseachest.

Kjubyte avatar Kjubyte commented on July 22, 2024

I have one question about step 10:

Change Drive FW

My only drive with working power management already came with the most recent firmware version SN04. Should I install SN04 again or downgrade to SN03 or SN02?

from openseachest.

DebabrataSTX avatar DebabrataSTX commented on July 22, 2024

Yes with "identify command" I was talking about "seachest -d /dev/sg0 -i". I was asking for it just to make sure that FW is not disabling Power management after setting the sector size to 4k. But if you confirm Power management is always enabled then we can skip that step.
With this data it will be easier to include FW team.
On changing the FW just downgrade it to SN03. That should do.

from openseachest.

Kjubyte avatar Kjubyte commented on July 22, 2024

I just finished all the tests. I did all of them with the new working drive I used in my first post. You'll see, just after setting the sector size to 4096b the power management doesn't work anymore. No idle_a, no idle_b, of course no idle_c, it's just always active. That's the exact same I experienced with multiple other 16TB drives (Ironwolf Pro, Exos X16 and X18).

I cut the following code blocks, so this posts isn't too long. For the complete output with all executed commands look into this archive: tests.tar.gz

  1. Take a good drive with no issue, Make sure the sector size is 512b.
  2. Run identify on that drive and check the drive support power management.
  3. Do the EPC time experiment to show the transition from idle_a to idle_c.
  4. Set the sector size to 4k
  5. Run identify on that drive and check the drive support power management.
  6. Do the EPC time experiment to show the transition from idle_a to idle_c.

Before setting the sector size to 4096b the drive goes into Idle_c. After setting the sector size to 4096b, EPC is still enabled but power management isn't working anymore.

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
...
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Features Supported:
                EPC [Enabled]

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 12:02:13 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM1: Idle state and the device is in the Idle_a power condition

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 12:16:19 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM1: Idle state and the device is in the Idle_c power condition

root@Nebula:/opt/openSeaChest# ./openSeaChest_Format -d /dev/sg0 --setSectorSize 4096 --confirm this-will-erase-data-and-may-render-the-drive-inoperable
Successfully set sector size to 4096

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
        Logical Sector Size (B): 4096
        Physical Sector Size (B): 4096
        Features Supported:
                EPC [Enabled]

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 12:26:05 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 12:41:01 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM0: Active state or PM1: Idle State
  1. Set the sector size to 512b
  2. Run identify on that drive and check the drive support power management.
  3. Do the EPC time experiment to show the transition from idle_a to idle_c.

Setting sector size to 512b makes no difference. EPC is still enabled but not working.

root@Nebula:/opt/openSeaChest# ./openSeaChest_Format -d /dev/sg0 --setSectorSize 512 --confirm this-will-erase-data-and-may-render-the-drive-inoperable
Successfully set sector size to 512

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Features Supported:
                EPC [Enabled]

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 12:47:45 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 13:04:55 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
Device is in the PM0: Active state or PM1: Idle State
  1. Change Drive FW
  2. Run identify on that drive and check the drive support power management.
  3. Do the EPC time experiment to show the transition from idle_a to idle_c.

After changing the drive FW, I need to enable the idle_c timer again (as expected), but power management still doesn't work anymore.

root@Nebula:/opt/openSeaChest# ./openSeaChest_Firmware -d /dev/sg0 --downloadFW /root/EvansBPExosX18SATA-STD-512E-SN03.LOD 
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN04 - ATA
......
Firmware Download successful
Firmware Download time (s): 5.17
Average time/segment  (ms): 42.96
Activate Time          (s): 3.52
New firmware version is SN03

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Features Supported:
                EPC [Enabled]

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --idle_c enable
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Successfully configured the requested EPC settings.

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 13:30:24 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 14:07:21 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Device is in the PM0: Active state or PM1: Idle State
  1. Power cycle the Drive.
  2. Run identify on that drive and check the drive support power management.
  3. Do the EPC time experiment to show the transition from idle_a to idle_c.

I power cycled the entire PC. I even remove the power cord for a few seconds. No difference.

root@Nebula:/opt/openSeaChest# uptime
 14:11:26 up 0 min,  2 users,  load average: 1,09, 0,41, 0,15

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 -i
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Features Supported:
                EPC [Enabled]

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 14:12:55 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sg0 --checkPowerMode
 Today: Fri Jul  7 14:40:37 2023        User: root
/dev/sg0 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Device is in the PM0: Active state or PM1: Idle State

from openseachest.

DebabrataSTX avatar DebabrataSTX commented on July 22, 2024

This is what I receive from another team. They are suspecting background operations keeping the drive busy. I am trying to get more information on estimated time for this background task. I would recommend check the power mode of these drives after a few hours (or a day). Let me know if you see a change in Power Mode after a longer wait time.


It's mostly likely that the drive is operating as expected.

The openSeaChest_PowerControl -i command simply requests the device information regarding the state of the Extended Power Conditions feature set. This command does not tell the drive to transition into a power condition.

Issuing the checkPowerMode command just tells you the current power condition state of the drive be it active, Idle_A, Idle_B, Idle_C, Standby.

If the drive is performing any background maintenance tasks,for example an EPC timer-based transition to Idle_A or Idle_C will be delayed ensuring device health and data integrity first.
I would suggest first dumping the current EPC power condition timer configuration to ensure the timers are still enabled and set. If that’s the case, then the result is most likely due to not moving to the power conditions of Idle_A or Idle_C based upon the drive handling higher priority background tasks to ensure health of the storage device. Use the openSeaChest_PowerControl –showEPCSettings command to view the current state of the power conditions associated their timer transition value, and the enabled/disabled state of the timer.
To force a transition apart from timer-based control into a power condition the openSeaChest_PowerControl –transitionPower [active | idle | idleUnload | standby | idle_a | idle_b | idle_c | standby_y | standby_z | sleep] command can be used to force an immediate transition.


from openseachest.

Kjubyte avatar Kjubyte commented on July 22, 2024

Hi,
they might be right - sort of. I just discovered the drives in my storage server going into idle_a again. And also idle_c works. I changed the sector size months ago. But I'm sure it must have taken weeks or even months until EPC timers worked again. I recently did change the HBA, because I thought maybe the HBA is the issue, but it didn't change anything at this time. Now magically it works again. So I'm really not sure if it's just "time".

Power management on my two test drives used in this issue still doesn't work. It's now running since 50h as you can see in the next code block. Also one drive (sg0 from my first post) already had >100 Power_On_Hours after changing sector size when I opened this issue. It's still the same PC with a fresh Debian 12 and no load. I would really like to know, how long those background tasks last - if there are any.

If anything changes, I'll post an update.

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sda --checkPowerMode
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.3.1-4_1_1 X86_64
 Build Date: Mar 27 2023
 Today: Fri Jul 14 14:47:44 2023        User: root
==========================================================================================

/dev/sg1 - ST16000NM000J-2TW103 - ... - SN03 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# ./openSeaChest_PowerControl -d /dev/sdb --checkPowerMode
==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.3.1-4_1_1 X86_64
 Build Date: Mar 27 2023
 Today: Fri Jul 14 14:47:47 2023        User: root
==========================================================================================

/dev/sg0 - ST16000NM000J-2TW103 - ... - SN02 - ATA
Device is in the PM0: Active state or PM1: Idle State

root@Nebula:/opt/openSeaChest# uptime 
 14:47:53 up 2 days,  2:13,  2 users,  load average: 0,00, 0,00, 0,00

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

In #111 we had a similar issue reported.
The solution in that case was updating the HBA firmware for the LSI controller that they were using. Something in that firmware update corrected some controller behavior that was keeping the drive awake. This was not listed in any release notes for the updates, but some other phy issue that was reported as fixed happened to fix this issue for that user too.
If you are still using an LSI/Avago/broadcom HBA, have you tried updating the firmware on it yet?

from openseachest.

knirski avatar knirski commented on July 22, 2024

I am in a similar situation with four Seagate ST16000NM000J-2TW103 drives. 3 days after using fast format feature to change sector size to 4096, EPC still doesn't seem to be working and probably due to that the disks are noisy even when they're idle.

When I invoke

./openSeaChest_PowerControl -d /dev/sg0 --EPCfeature enable

the feature shows as enabled in disk info listing

./openSeaChest_Basics -d /dev/sg0 -i | grep EPC
                EPC [Enabled]

but it doesn't seem to work - --checkPowerMode always returns Device is in the PM0: Active state or PM1: Idle State.

When I power cycle the drive, EPC seems to be disabled again:

./openSeaChest_Basics -d /dev/sg0 -i | grep EPC
                EPC

Should I wait patiently until the disks finish their background processes (and what are they)? Can I use them normally before that? Thanks in advance.

from openseachest.

knirski avatar knirski commented on July 22, 2024

I have an update - after about 5 days after changing the sector size, the disks started transitioning to idle_a and idle_b modes. This seems consistent with @Kjubyte 's story and indeed it looks like there was some kind of long-running background process triggered, preventing EPC timers from working correctly. Also, the audible chirping stopped as well - it seems to be directly connected to the change.

I ran some tests before this fixed itself, including --idd short, and nothing seemed to stop this mysterious background process, so in this matter it goes against @lbogdan 's story. EDIT: or maybe IDD did help, only with a substantial delay (several hours)? I really can't tell.

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024

Indeed, looks like two different issues; in your case I guess the background process of changing sector size should run to completion and can't be stopped, and while it's running, as I suppose it continuously writes metadata to the disk, it makes sense that's stopping it from going into idle / standby.

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024

Unfortunately it looks like my storage saga is not over yet - after managing to get EPC working on all 3 drives, I copied a few TB of data over from a decommissioned array, restarted the machine, and... now all 3 drives won't go into idle / standby.

They just sit in "Device is in the PM0: Active state or PM1: Idle State" indefinitely, even though EPC is enabled and timers for all idle states are set - I didn't change any settings since EPC worked fine. I also tried to move them to the motherboard's integrated SATA controller, same behavior.

I can manually force them into a sleep state with --transitionPower, and they remain there, except for idle_a, which after a few seconds goes back into "Device is in the PM0: Active state or PM1: Idle State" (which seems weird?).

For now I worked around it by writing a tiny script that reads device stats from /sys/block/sd*/stat and manually puts the drives into idle when there's no activity, but this is still quite frustrating, as they're supposed to do that automatically.

@vonericsen Thanks for the detailed comment, any suggestion of how I can troubleshoot this further?

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@lbogdan,

I will have to ask around to see if I can get some more information from a firmware engineer as to what is going on from your description since I'm not sure what would be happening.

I can manually force them into a sleep state with --transitionPower, and they remain there, except for idle_a, which after a few seconds goes back into "Device is in the PM0: Active state or PM1: Idle State" (which seems weird?).

idle_a is a state where the heads are still loaded above the media, and this is backwards compatible to the idle state of drives before EPC, so it's possible that this state is handled a little differently by the firmware as a time to start some background activity since the heads are already loaded above the media....This is just a guess. I'll see if I can get a better answer

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@lbogdan,

I've asked around to a few people about some ideas with what may be the causing the background activity.
There is not really a good way to know the exact cause, but there are some different factors that may be part of this.

From your description, the most likely cause is that by writing a bunch of files, there may be some amount of fragmentation of the files, so the drive is doing some background updates after copying these files over. This can vary a lot by what kind of filesystem it is and how it chooses to write files.
For example, if you copied a lot of small files, maybe they are spaced out slightly, so this left some amount of "unchanged" portion to the disk so to update all its background reliability/data integrity areas, it has to go back and read both the written data and unwritten areas to make these updates. This can also vary by the sizes of these files as well. Copying a really large file (say a 2hr long 4k video) would be a very large sequential write and may not need any background activity after it is written, but copying lots of small documents may require more work since they may be somewhat fragmented when written. Some filesystems also track files/file changes in metadata which may be at a different location of the medium and by writing that small update, it triggers some background activity to run in that area of the disk.
This is just a potential case, so I'm not sure how likely it is. But if this is the case the background activity will only apply to where the recently written files are located on the medium and it should not take days to address like after a fast format where the drive has to update across all of its available space.

I did ask about idle_a and my guess above was correct. Since the heads are still above the medium and it's not actively receiving other requests, the firmware will start performing some background activity in this state if it has anything it needs to do. Any lower power states that unload the heads will not allow background activity to run from what I was told.

There are other background activities that the drive will do, some are timer based and will start when in a power state that it's allowed to start this activity (active or idle_a), and others get scheduled as the drive is read or written on an as-needed basis depending on what the firmware is seeing for performance, reliability, and data integrity factors.
The firmware can also reschedule/delay certain background activities if the drive is forced into a lower power mode, or receives certain commands that are processing with a higher priority (reads, writes, even SMART self-tests can delay background activity).

There is not really a good way to know for sure which of these background activities happened as there are too many variables to know for sure.

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024

@vonericsen Thanks a lot (again!) for the detailed explanation.

So the working hypothesis is that the copying of those few TB of data triggered some (internal) background operation(s) that keep the drives from going into idle / standby? The weird thing about that is that putting my ear near the drives, it doesn't sound like there's any physical read/write activity.

From what you're saying, I also understand there's currently no way to introspect these background operations, right?

For now I'll just leave the drives in the active state and follow up in a few days.

UPDATE: So I put the drives in active before starting to write this, and while I was writing 2 (out of 3) of them started going into idle again! I'll follow up with the status of the third drive, hopefully it will follow suit soon.

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024

As (hopefully) expected, the third drive started going into idle after about one hour of writing the previous comment. So all three drives are now working as expected in regard to EPC.

So, for others reading this, if your EPC timers don't seem to work (more so after an write-intensive operation, like changing the sector size, or writing a lot of data), just leave the drives in the active state for some time, and they will hopefully start going into idle again, eventually.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@lbogdan,

Thanks for providing your updates!

So the working hypothesis is that the copying of those few TB of data triggered some (internal) background operation(s) that keep the drives from going into idle / standby? The weird thing about that is that putting my ear near the drives, it doesn't sound like there's any physical read/write activity.

Yeah, and it is very case-by-case on how much data was written and where it went to on the medium and which file system the drive has on it.
If the sector size has not been changed or the drive has already finished the post sector size change background work, then it should not take as long for the background activity to complete since it will only be running the background activity around the recently written data.

From what you're saying, I also understand there's currently no way to introspect these background operations, right?

This is correct, there is not currently a way to modify them or get more information about them.

So, for others reading this, if your EPC timers don't seem to work (more so after an write-intensive operation, like changing the sector size, or writing a lot of data), just leave the drives in the active state for some time, and they will hopefully start going into idle again, eventually.

This matches the advice I got from the people I was asking internally. Just give it some time to complete; it will complete. It's the initial background activity after a format that takes the longest, but after that it should be quicker.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Marking this as closed since I added a note/warning after set sector size and the fast-format options complete to inform the user that EPC timers may be ignored while other background activity is running. This message is in the latest release (v23.12) and it has been merged to both master and develop branches.

If you have any other trouble, please reopen this issue or create a new one and we will do our best to resolve it!

from openseachest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.