GithubHelp home page GithubHelp logo

Comments (15)

smunaut avatar smunaut commented on July 22, 2024 1

They're still not going to sleep, but since I posted they did spend 12h a day (overnight) in standby_z (manually transitioning them). The first 3 days I left them in active mode hoping they would finish whatever they were busy, but since them I put a script to put them to standby_z manually for the night. (And they do stay in that mode so the host is definitely not issuing any stray commands)

	Model Number: ST6000VN001-2BB186
	Firmware Revision: SC60

I did reboot for sure. I think I did a full power off but I'm not 100% sure anymore. I will retry that when I get the chance.
(ATM the machine they are in is being used, It'll probably have to wait until next weekend)

from openseachest.

smunaut avatar smunaut commented on July 22, 2024 1

It's a small NAS exposing volumes through NFS and Samba.

  • All drives are connected directly on SATA port of the mother board. It's a "MSI MPG B560I GAMING EDGE WIFI " running BIOS 7D19v18.
  • OS is Debian Bookworm stable
  • Drives are split, some part running RAID0, some part running RAID1 and some parts running RAID5. They are all hosting encrypted btrfs volume. There is a write-through bcache layer over them to a SSD to speed up access and also to avoid having to wake up the drives when reading small often accessed files.
  • No out of band management whatsoever.
  • There is a smartd running. I increased it's poll period to make sure it can at least go into idle_c between two poll (and then once it sees it's in idle_c, then smartd stops polling ). Just to make sure it wasn't the problem I also tried making short timeout for idle_b / idle_c and stopping smartd but that didn't help.
  • I also put the drive in standby_z and they remain there (unless I go and access the exposed share from a client obviously).

from openseachest.

smunaut avatar smunaut commented on July 22, 2024 1

Some other info :

  • On one of the drive I ran an "internal short self test" as in the other issue someone said it might have helped. Didn't notice any change from that
  • On another one I ran a short self test to see if it would make it snap out of it ... no change either.

from openseachest.

smunaut avatar smunaut commented on July 22, 2024 1

B560 is an Intel Chipset ๐Ÿ˜ (here running an i3-10105). Yeah B550 is AMD, B560 is Intel confusing I know ...

==========================================================================================
 openSeaChest_PowerControl - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_PowerControl Version: 3.4.0-6_2_0 X86_64
 Build Date: Dec  1 2023
 Today: Mon Jan 29 18:00:14 2024	User: root
==========================================================================================

/dev/sg0 - ST6000VN001-2BB186 - ZR11NESD - SC60 - ATA

---Low Level tDevice information---
	---Drive Info---
		media type: HDD
		drive type: ATA
		interface type: IDE/ATA
		zoned type: not zoned
		---adapter info---
			PCI/PCIe:
			VendorID: 8086h
			ProductID: 43D2h
			Revision: 0011h
		---driver info---
			driver name: ahci
			driver version string: 3.0

				major ver: 3
				minor ver: 0
		---ata flags---
		SCSI Version: 7
		---Passthrough Hacks---
			Passthrough type: SAT/system/none
				---SCSI Hacks---
				---NVMe Hacks---
				---ATA Hacks---
	---OS Info---
		handle name: /dev/sg0
		friendly name: sg0
		minimum memory alignment: 8
		---Linux Unique info---
			FD is valid
			Second Handle name: /dev/sda
			Second Handle friendly name: sda
			SG Driver Version:
				Major: 3
				Minor: 5
				Revision: 36
		OS read-write recommended: false
		last recorded error: 22
		File system Info:
			No active file systems detected

from openseachest.

DebabrataSTX avatar DebabrataSTX commented on July 22, 2024

We have an existing thread on a similar issue. Please have a look into that, it might help.
#117

from openseachest.

luukrijnbende avatar luukrijnbende commented on July 22, 2024

Thanks for your response.
I've already looked at that thread and was thinking it could maybe be background activity related to the SMART test somehow, but there is no activity from the host so I don't know if there is something still running on the drives? Any way to check?

It seems odd that it magically starts working again after a certain amount of hours. However I'm not really able to let them run full tilt all the time until it fixes itself due to temperature and power constraints.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Hi @luukrijnbende,

I have been asking around about this to try and get an idea what is happening.

There is not enough information to know for sure, but the best guess is that the SMART self-test (long DST) paused some background activity that normally runs based on timing, so once the drive finished the SMART self-test the drive has been trying to catch up and finish that background work when it can.

There are lots of different kinds of background tasks in the drive, some are run periodically in allowed power states (active, idle_a) and others get scheduled as needed when the drive is used (reads, writes, etc). Background activity can be scheduled if it is for health monitoring, performance, reliability, or data-integrity reasons, so it is also entirely possible that something else triggered it that may not even be related to the SMART self-test.
Background activity can be paused for reads, writes, and even a long self-test since the background activity is considered lower-priority than servicing these host requested commands/operations.
Once the drive has enough idle time between these requests, it will attempt to do these background tasks.
Lower power modes that unload the heads (idle_b, idle_c, standby_y, standby_z) are not allowed to start background activity which will also pause anything the drive has scheduled to run in the background.

from openseachest.

lbogdan avatar lbogdan commented on July 22, 2024

@luukrijnbende I also hit this, after copying a few TB of data to some new Seagate IronWolf drives.

Did the issue eventually fix itself for you? (see my comment here)

from openseachest.

smunaut avatar smunaut commented on July 22, 2024

I have a similar situation ...I ran extended self test that took 12h to complete and disk are no longer going to idle.
They've been in active mode for > 72h now and no sign of them resuming normal EPC operation ...

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@smunaut,

Sorry I did not see your comment sooner.
Is this still an issue for you after a few more days? This seems like a lot of additional time to get back to "normal" again.
Have you tried power cycling the drive (like shut down, then power back up after 30 seconds or so)?
Can you also share your MN and FW revision?

And if anyone else has seen this issue, has is occurred after a short DST? Or only after a Long DST?

from openseachest.

smunaut avatar smunaut commented on July 22, 2024

I shutdown the machine. Even unplugged it from the wall and let it sit for a bit.
Then let the drives for 24h and still don't go to sleep according to their EPC timers.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

@smunaut,
Thank you for this update.
We have been trying to repeat it internally but have not been able to so far.

Would you mind telling me more about the system hardware? Anything you can share would be helpful for us to try and figure out what is causing this.
Some things that are helpful are:

  • controller/HBA is being used.
  • If that HBA has a firmware, what firmware version (if you can find that).
  • operating system and operating system version
  • If you installed a driver for the HBA, which version was installed
  • Which filesystem the drive is formatted with
  • Any management software or out of band controllers in the system, if any
  • Any background services/cronjobs that might be running (We know smartd can sometimes prevent going into low power modes if it is pinging the drive too frequently, but maybe there is another one out there affecting this too)

I know that not all of these things can be shared, however any additional system information that you can share could be helpful to see if it is something we can try repeating the issue on.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

Thank you for that info!

The original issue also listed AMD hardware. I will see if I can repeat this on any AMD hardware...maybe it's related, maybe it's not.

Would you mind sharing the output of openSeaChest_Basics -d <handle> --llInfo?
This outputs some information like what was detected as the driver, the version of the driver, and if our code detected any filesystem that is mounted on this device, and some other info on how commands get routed through our utility.

from openseachest.

vonericsen avatar vonericsen commented on July 22, 2024

B560 is an Intel Chipset ๐Ÿ˜ (here running an i3-10105). Yeah B550 is AMD, B560 is Intel confusing I know ..

๐Ÿคฆ ...I have an MSI motherboard at home with an almost identical name....MSI MPG B550 Gaming Edge Wifi
Naming these things so similarly makes it difficult to keep track of.

Anyways, thanks for the additional info. So it does not seem to be a hardware unique issue and looks like the standard AHCI driver is in use, so nothing specific to a driver that I have heard of before.
We will keep trying to figure out how we can repeat this issue and see if we can figure out the root cause of the issue.

from openseachest.

luukrijnbende avatar luukrijnbende commented on July 22, 2024

I have noticed that the EPC timers have started working again. When this first started happening I wrote a little script to monitor ZFS activity and put the drives into idle states if there is none, now after a reboot that script didn't come up again but EPC did work.

I have no idea what the trigger was, maybe they finished their background tasks? Though I did notice today that they were active for a few hours without activity and just now they returned to idle_c and are happy there.

from openseachest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.