Comments (15)
They're still not going to sleep, but since I posted they did spend 12h a day (overnight) in standby_z (manually transitioning them). The first 3 days I left them in active mode hoping they would finish whatever they were busy, but since them I put a script to put them to standby_z manually for the night. (And they do stay in that mode so the host is definitely not issuing any stray commands)
Model Number: ST6000VN001-2BB186
Firmware Revision: SC60
I did reboot for sure. I think I did a full power off but I'm not 100% sure anymore. I will retry that when I get the chance.
(ATM the machine they are in is being used, It'll probably have to wait until next weekend)
from openseachest.
It's a small NAS exposing volumes through NFS and Samba.
- All drives are connected directly on SATA port of the mother board. It's a "MSI MPG B560I GAMING EDGE WIFI " running BIOS 7D19v18.
- OS is Debian Bookworm stable
- Drives are split, some part running RAID0, some part running RAID1 and some parts running RAID5. They are all hosting encrypted btrfs volume. There is a write-through bcache layer over them to a SSD to speed up access and also to avoid having to wake up the drives when reading small often accessed files.
- No out of band management whatsoever.
- There is a smartd running. I increased it's poll period to make sure it can at least go into
idle_c
between two poll (and then once it sees it's inidle_c
, then smartd stops polling ). Just to make sure it wasn't the problem I also tried making short timeout for idle_b / idle_c and stopping smartd but that didn't help. - I also put the drive in standby_z and they remain there (unless I go and access the exposed share from a client obviously).
from openseachest.
Some other info :
- On one of the drive I ran an "internal short self test" as in the other issue someone said it might have helped. Didn't notice any change from that
- On another one I ran a short self test to see if it would make it snap out of it ... no change either.
from openseachest.
B560 is an Intel Chipset ๐ (here running an i3-10105). Yeah B550 is AMD, B560 is Intel confusing I know ...
==========================================================================================
openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
openSeaChest_PowerControl Version: 3.4.0-6_2_0 X86_64
Build Date: Dec 1 2023
Today: Mon Jan 29 18:00:14 2024 User: root
==========================================================================================
/dev/sg0 - ST6000VN001-2BB186 - ZR11NESD - SC60 - ATA
---Low Level tDevice information---
---Drive Info---
media type: HDD
drive type: ATA
interface type: IDE/ATA
zoned type: not zoned
---adapter info---
PCI/PCIe:
VendorID: 8086h
ProductID: 43D2h
Revision: 0011h
---driver info---
driver name: ahci
driver version string: 3.0
major ver: 3
minor ver: 0
---ata flags---
SCSI Version: 7
---Passthrough Hacks---
Passthrough type: SAT/system/none
---SCSI Hacks---
---NVMe Hacks---
---ATA Hacks---
---OS Info---
handle name: /dev/sg0
friendly name: sg0
minimum memory alignment: 8
---Linux Unique info---
FD is valid
Second Handle name: /dev/sda
Second Handle friendly name: sda
SG Driver Version:
Major: 3
Minor: 5
Revision: 36
OS read-write recommended: false
last recorded error: 22
File system Info:
No active file systems detected
from openseachest.
We have an existing thread on a similar issue. Please have a look into that, it might help.
#117
from openseachest.
Thanks for your response.
I've already looked at that thread and was thinking it could maybe be background activity related to the SMART test somehow, but there is no activity from the host so I don't know if there is something still running on the drives? Any way to check?
It seems odd that it magically starts working again after a certain amount of hours. However I'm not really able to let them run full tilt all the time until it fixes itself due to temperature and power constraints.
from openseachest.
Hi @luukrijnbende,
I have been asking around about this to try and get an idea what is happening.
There is not enough information to know for sure, but the best guess is that the SMART self-test (long DST) paused some background activity that normally runs based on timing, so once the drive finished the SMART self-test the drive has been trying to catch up and finish that background work when it can.
There are lots of different kinds of background tasks in the drive, some are run periodically in allowed power states (active, idle_a) and others get scheduled as needed when the drive is used (reads, writes, etc). Background activity can be scheduled if it is for health monitoring, performance, reliability, or data-integrity reasons, so it is also entirely possible that something else triggered it that may not even be related to the SMART self-test.
Background activity can be paused for reads, writes, and even a long self-test since the background activity is considered lower-priority than servicing these host requested commands/operations.
Once the drive has enough idle time between these requests, it will attempt to do these background tasks.
Lower power modes that unload the heads (idle_b, idle_c, standby_y, standby_z) are not allowed to start background activity which will also pause anything the drive has scheduled to run in the background.
from openseachest.
@luukrijnbende I also hit this, after copying a few TB of data to some new Seagate IronWolf drives.
Did the issue eventually fix itself for you? (see my comment here)
from openseachest.
I have a similar situation ...I ran extended self test that took 12h to complete and disk are no longer going to idle.
They've been in active mode for > 72h now and no sign of them resuming normal EPC operation ...
from openseachest.
Sorry I did not see your comment sooner.
Is this still an issue for you after a few more days? This seems like a lot of additional time to get back to "normal" again.
Have you tried power cycling the drive (like shut down, then power back up after 30 seconds or so)?
Can you also share your MN and FW revision?
And if anyone else has seen this issue, has is occurred after a short DST? Or only after a Long DST?
from openseachest.
I shutdown the machine. Even unplugged it from the wall and let it sit for a bit.
Then let the drives for 24h and still don't go to sleep according to their EPC timers.
from openseachest.
@smunaut,
Thank you for this update.
We have been trying to repeat it internally but have not been able to so far.
Would you mind telling me more about the system hardware? Anything you can share would be helpful for us to try and figure out what is causing this.
Some things that are helpful are:
- controller/HBA is being used.
- If that HBA has a firmware, what firmware version (if you can find that).
- operating system and operating system version
- If you installed a driver for the HBA, which version was installed
- Which filesystem the drive is formatted with
- Any management software or out of band controllers in the system, if any
- Any background services/cronjobs that might be running (We know smartd can sometimes prevent going into low power modes if it is pinging the drive too frequently, but maybe there is another one out there affecting this too)
I know that not all of these things can be shared, however any additional system information that you can share could be helpful to see if it is something we can try repeating the issue on.
from openseachest.
Thank you for that info!
The original issue also listed AMD hardware. I will see if I can repeat this on any AMD hardware...maybe it's related, maybe it's not.
Would you mind sharing the output of openSeaChest_Basics -d <handle> --llInfo
?
This outputs some information like what was detected as the driver, the version of the driver, and if our code detected any filesystem that is mounted on this device, and some other info on how commands get routed through our utility.
from openseachest.
B560 is an Intel Chipset ๐ (here running an i3-10105). Yeah B550 is AMD, B560 is Intel confusing I know ..
๐คฆ ...I have an MSI motherboard at home with an almost identical name....MSI MPG B550 Gaming Edge Wifi
Naming these things so similarly makes it difficult to keep track of.
Anyways, thanks for the additional info. So it does not seem to be a hardware unique issue and looks like the standard AHCI driver is in use, so nothing specific to a driver that I have heard of before.
We will keep trying to figure out how we can repeat this issue and see if we can figure out the root cause of the issue.
from openseachest.
I have noticed that the EPC timers have started working again. When this first started happening I wrote a little script to monitor ZFS activity and put the drives into idle states if there is none, now after a reboot that script didn't come up again but EPC did work.
I have no idea what the trigger was, maybe they finished their background tasks? Though I did notice today that they were active for a few hours without activity and just now they returned to idle_c and are happy there.
from openseachest.
Related Issues (20)
- Exos X16 fails to change sector size on a Supermicro server HOT 5
- tracking "unkown command" HOT 7
- Prebuilt EFI binaries? HOT 3
- Firmware update failing - Firmware Download Failed - Download Microcode returning: ABORTED HOT 30
- I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intelยฎ Optaneโข Memory HOT 26
- How do we scan, find new firmware and update it? I am on windows HOT 3
- Make Package for ESXi HOT 2
- openSeaChest_Format: unknown option --showSupportedSectorSizes HOT 1
- Linux openSeaChest reports drive in `standby_z` state regardless of the actual state. HOT 13
- issue HOT 2
- Settings do not take effect HOT 5
- Some detailed changelog HOT 2
- openSeaChest .deb package misses export PATH variable HOT 3
- multiple ST20000NM007D EXOS X20 20TB firmware SC03 fails warranty claim HOT 2
- openSeaChest_PowerControl standby time ignored HOT 3
- Add option to ignore drives in standby HOT 2
- Code Security Report: 6 high severity findings, 16 total findings HOT 1
- Support QNAP TR-004 4-bay enclosure HOT 1
- Issue with Setting Sector Size on Seagate ST24000NM002H after Interruption, Drive Bricked HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openseachest.