then I am mistaken and what can be done is to issue a reboot using CTRL-ALT-DEL when on the terminal screen.
Announcement
Collapse
No announcement yet.
One of my DSS200s is constantly verifying the RAID State
Collapse
X
-
"Shutdown" is not there, but "reboot" is. Hence issuing that command and then pulling the plug during the POST sequence, before any HDD access has happened, is likely the safest way to shut a DSS server down. This can be done either by ctrl+alt+F1 followed by ctrl+alt+delete, or ctrl+alt+F1, logging in to the administrator account, and entering "reboot."
Originally posted by Caleb WilliamsFrom a software/OS perspective, the DSS line of servers are designed using an overlay file system with read-only components. The base system and the cinema software reside on two separate partitions that are combined in an overlay during the boot process, then a third and fourth partition hold the database and the dcp assets, and are read+write. The RAID 5 protects the entire system (aside from the bootloader, which is a non-volatile IDE memory component) and the entire configuration is extremely resilient against sudden power loss.
Comment
-
Originally posted by Leo Enticknap View PostThe problem comes if a read or write operation is in progress, and the head stack of a HDD is actually in motion at the moment the plug is pulled. If that happens, the risk of bad sectors is significant, and a full scale head crash cannot be ruled out. Even if you're lucky and neither of these takes place, then as the original poster has seen, there will be a full RAID verification/check after the next startup, which will slow down ingestion quite a bit and for several hours.
The issue is if data is "not at rest". While the file system structure of the DSS200 is pretty resilient, what happens when I'm in an ingest operation and I just switch the power off? Are all those transactions completely ACID compliant? Does the software recover from any given failure state? I doubt you can guarantee that for any system built on top of a rather complex Linux base-system.
Like Leo and I mentioned, in this case, it triggers an otherwise unnecessary RAID verification, which will cause quite some extra strain on your disks.
- Likes 1
Comment
-
Originally posted by Marcel Birgelen View Post
It's not really a mechanical problem anymore. That problem used to be a real issue in real old hard drives (think IBM XT-era). I remember the "PARK" command in DOS, that would issue a park-command to the hard drive. That problem has been solved ever since the earliest IDE disks came around. They all park their heads safely away once the power is turned off. The heads are spring-loaded and the disks still provide a sufficient cushion effect with the remaining rotational energy.
The issue is if data is "not at rest". While the file system structure of the DSS200 is pretty resilient, what happens when I'm in an ingest operation and I just switch the power off? Are all those transactions completely ACID compliant? Does the software recover from any given failure state? I doubt you can guarantee that for any system built on top of a rather complex Linux base-system.
Like Leo and I mentioned, in this case, it triggers an otherwise unnecessary RAID verification, which will cause quite some extra strain on your disks.
I should've clarified a little more. The base system is resilient because of its redundancy and because it uses overlayfs. That means that the system will almost always boot with no problem. That protection does not directly extend to the data partitions. I think that was the designer's intent. Even though you might end up with some corrupted content, the server will always boot and you can recover or re-ingest. Obviously no system is perfect, but the system design of the DSS is superior to what we have now because of its reliability.
On the point of verification, I think there is definitely a problem with the server that is the topic of this post, they should not be constantly verifying the array on every bootup, but I don't think that adds as much strain as you think. For a regular 10-plex, when we had DSS200 servers running we pushed terabytes of data through those servers in any given week, both ingesting and playback. A DSS200 hdd will see far more activity in its lifespan than a lot of other systems, and that's what those drives were designed for. After we've converted DSS200 servers over to content management systems of our own, I'd like to think they're semi-retired and "living the quiet life" compared to what they were put through before.
- Likes 1
Comment
-
We just retired our oldest DSS200 (2010 vintage, although I think the sticker on it actually said 2009) which was still working perfectly but sadly isn't compatible with the new laser projector so we've kept it as a spare for a slightly newer DSS200.
It ran constantly during that time, only being restarted if it rarely had an issue or appeared to be running slow (front end only, we literally never had a problem during playback), or if there was a power outage. When restarting it was always by pulling the plugs as there is no simple (any?) way to shut them down. It never gave us an issue but would always verify the raid when it powered back on. I'm 99% that it why it appears like they are constantly verifying for the opening poster. Just leave them on like they're designed for.
- Likes 1
Comment
-
Originally posted by Ryan Gallagher View Postshutdown may not exist, but do any of these other flavors work on DSS200 (with sudo of course)...
halt
halt -p
poweroff
systemctl poweroff
Comment
-
Originally posted by Philip JonesJust leave them on like they're designed for.
The age of the cat862 rechargeable battery might equal a stronger argument for leaving a DSS200 on for the remainder of its service life now than power saving is for shutting it down, especially given that (a) its remaining service life is very unlikely to be more than a year or two, and (b) it is no longer depreciating in capital value, so you're likely saving more in depreciation on a new server than you're spending in extra electricity.
Comment
-
Originally posted by Caleb Williams View Post
Looking at logs after running these commands doesn't seem to indicate that they gracefully power off the Cat862. The only way to deal with the media block would be to actually unplug the server. That's probably why Dolby's recommendation to just unplug is what it is.
- Likes 1
Comment
-
Originally posted by Caleb Williams View PostOn the point of verification, I think there is definitely a problem with the server that is the topic of this post, they should not be constantly verifying the array on every bootup, but I don't think that adds as much strain as you think. For a regular 10-plex, when we had DSS200 servers running we pushed terabytes of data through those servers in any given week, both ingesting and playback. A DSS200 hdd will see far more activity in its lifespan than a lot of other systems, and that's what those drives were designed for. After we've converted DSS200 servers over to content management systems of our own, I'd like to think they're semi-retired and "living the quiet life" compared to what they were put through before.
A RAID verification job means that the entire RAID is being "surface checked", checksummed and verified. If correctable errors are found, they will be repaired. While the disks themselves are constantly spinning, the little actuator(s) moving the heads is not and it will wear out due to increased activity. Over the years I've clearly witnessed that the read/write load you put onto disks, clearly matters for their longevity, even if they're constantly spinning. This is not like SSDs, where you only "pay" for the writes, but get the reads "essentially for free". On rotating rust, you also "pay" for the reads.
- Likes 1
Comment
-
Originally posted by Ryan Gallagher View Postshutdown may not exist, but do any of these other flavors work on DSS200 (with sudo of course)...
halt
halt -p
poweroff
systemctl poweroff
Comment
-
Originally posted by Marcel Birgelen View Post
I've dealt with lots of those 3Ware cards in all kinds of servers, also hundreds of other servers than the DSS200. They were very popular "entry level" RAID cards for a while, known to be both low-cost, performant and pretty darn reliable. That was before LSI and Broadcom came (I think I even forgot another party in between there) and screwed it. It's normal behavior for them to do a RAID array verification on every unclean shutdown. In order to prevent that, you can install a battery on the 3Ware card, that will cleanly shutdown your RAID and flush all memory to the disks. Without this battery, the RAID is in tainted mode, every time the operating system doesn't explicitly flush all the disk queues.
A RAID verification job means that the entire RAID is being "surface checked", checksummed and verified. If correctable errors are found, they will be repaired. While the disks themselves are constantly spinning, the little actuator(s) moving the heads is not and it will wear out due to increased activity. Over the years I've clearly witnessed that the read/write load you put onto disks, clearly matters for their longevity, even if they're constantly spinning. This is not like SSDs, where you only "pay" for the writes, but get the reads "essentially for free". On rotating rust, you also "pay" for the reads.
It makes sense that the RAID would verify on every boot, since the DSS line wasn't equipped with a battery. The DSL servers that I decommissioned did have a battery so it makes sense I didn't see them verify very often.
Comment
Comment