Announcement

Collapse
No announcement yet.

DSS100 replacement drive recommendations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DSS100 replacement drive recommendations

    With the drives in my DSS100 coming up to 5 years old, I am considering replacing them all, one at a time, and then using the old drives in my spare DSS100 which I was able to repair with a replacement motherboard.

    They are WD1005 1TB drives which, of course, were folded into the HA10 series as advised by Steve last year. The drives are still working perfectly but I guess it is better to change them now than wait for something to go wrong.

    Before I go out and purchase four new drives, I wanted to check as to what is currently considered to be the best value and most reliable 1TB drive to use as a replacement. WD may not be the best choice so I am open to suggestion. As it has been over a year since I was advised on replacement drives, I imagine current recommendations may differ from the information provided at that time.

    (I know that Carsten would tell me just to dump the DSS100 but I am not in a position to upgrade my server.)

    I assume that it is still just a matter of taking out drive 1 at the end of the night and putting in the new drive, then repeating the process with drive 2 the next night and so on until they have all been replaced.

    In respect of the second DSS100, do the old drives from the first unit need to be formatted before going through the same process or will the server recognise the different drive and format it accordingly? Actually, it won’t be in the same order as there is already a bad drive in the unit, so I will have to start with that one or lose the data.

  • #2
    My understanding is that the server storage is configured as a RAID server device. RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple HDDs (hard disk drives) to protect data in the case of a single drive failure. Whenever you replace one HDD, you will need to rebuild the entire RAID, which is very time consuming. My understanding of best practices, is to replace all of the HDDs at the same time, so that you only need to rebuild the RAID once. I would also consider upgrading from 1TB HDD to 2TB HDD, for minimal additional cost, but double the storage capacity. All of the HDDs need to be the same storage size, otherwise the server will downgrade all the HDDs to the lower storage capacity of the smallest sized HDD.

    If all the drives are working perfectly, and you don't need extra storage capacity, I'm not sure I'd bother with replacing any of the HDDs. Also, whenever you rebuild the RAID, all of the HDDs are automatically erased during the process of building the RAID.​

    Comment


    • #3
      Thanks Rick,

      The drives were installed in 2018 which makes them nearly 5 years old. I believe that it is recommended that drives be replaced every 5 years. This also enables me to repopulate the RAID in the backup unit, which is living on borrowed time with its mix of old drives, one of which has failed. And, of course, the drives that I will be using are still working perfectly.

      All of the original drives were 500gb (which were failing at an alarming rate) and by upgrading to 1tb drives I ended up with a total capacity of around 2 TB. I am quite happy to stay with 1TB drives. It was a nerve wracking time, running shows with only 3 working drives in the array. My plan is to reduce the risk of drives failing due to age. The server runs 24 hours a day even when the cinema is closed for the winter.

      Contrary to what you say, by adding one drive at a time, rather than replacing all at once, there is no loss of data and the new drive is integrated quite quickly, usually in about half an hour tops. it might take several days to replace all 4 drives. The last time I did this, the RAID rebuilt itself after all the drives were replaced which did actually take several hours but was done outside normal operating days.

      I have never lost data as a result of rebuilding the array.

      Comment


      • #4
        You can use the current Dolby drive recommendations as your guide...even on the 1TB or 2TB drives. The HA210 remains a valid drive choice for the DSS100. Dolby is listing some Toshiba drives now too. I've tried them and they seem to be speedier.

        Me, personally, I do not change drives out just because they've hit 5-years. I look at the reallocated sectors and in the RAID logs to see if there are any ATA errors. If all is clean and clear...I "let it ride."

        Comment


        • #5
          5 years is not a huge amount of time for WD drives - my WD RED (NAS) drives do come with 5 years warranty from WD. I am pleased with them BTW.

          However, if you decided to go ahead, I would not replace them one by one but all in one go. Yes, you'd need to reinstall but with proper backups and a little time you'd be fine. The reason is that each "rebuild" is putting a massive amount of stress on those old hard drive. One of the reasons RAID5 is not so popular is that it often fails when you re-build it. What happens is that your drives get old - like in your case - one drive fails and you replace it. Suddenly the remaining drives face a 10 hours rebuild process where a single data error can and will fail the whole RAID, as there is no parity anymore. I think someone did some statistics and it turned out that a failed re-build is a pretty likely occurrence just looking at the datasheets

          So more often than not, RAID's fail when rebuilding.

          I mildly disagree with Steve here (gulp!) and I'm not against preventative replacement of the drives - see above, you let them die and when that happens the whole RAID fails during re-build! At that point you lose shows and/or you have to call engineers to fix. Four drives are not a massive investment and I feel it could be a good idea - particularly because you're re-using the older drives.

          Comment


          • #6
            Originally posted by Rick Cohen
            If all the drives are working perfectly, and you don't need extra storage capacity, I'm not sure I'd bother with replacing any of the HDDs
            I would, because if two of them fail during a show, that's the end of the show; and because if one of them fails and then another does before you can replace it and rebuild (or, as Marco notes, a second one fails during the rebuild), you've lost content. If you have a set in the server that is from around the same batch, manufacture time, and installation time, the odds are high that multiple drives will fail at around the same time.

            The one disadvantage of replacing all of them in one fell swoop is that you'll need to reinstall the software and reingest the content (meaning that you'll have to outgest it onto external media or via FTP onto a network storage device first if you don't have the original source drives anymore) afterwards, but personally, I think this is worth doing.

            IMHO, the Dolby approved drive list is only of major significance if the server is under warranty and you need to avoid violating its conditions. One aspect of the DSS100 and 200 that has been noted in the past is that these models are remarkably tolerant of whatever you throw at them: the 3Ware RAID cards in them adapt to different drive characteristics very well. I have never experienced any issues with WD Reds in these servers, and that's what I'd be inclined to go with.

            Comment


            • #7
              There is a lot of advice regarding hard drives and how to deal with their lifecycle out there. I regularly look at the data Backblaze releases, as they have a few hundred thousand rotating disks "under management" and are pretty open about their statistics.

              In general, failure rates seem to be mostly correlated to make and model, although there is a clear long-time trend:

              drive-failure-by-quarter.png

              Backlbaze also runs their drives in a stressful environment, with a lot of constant writing happening, so that's probably a lot more stressful than your average cinema server, but nevertheless, in the absense of better data, those statistics are relevant in our use-cases too.

              There is a clear valley in the failure rate after 2 years leading up to 4 years, but you can clearly see a dramatic increase in failure rates after the 5 year mark. Based on this data alone, you could maybe conclude that it's not a bad decision to replace all your drives with new ones.

              But... On the other hand, how big is the chance of two or more drives failing in one show, causing you to miss that particular show? Also, replacing all drives at once may expose you to the risk of having bought into a series that prematurely fails and brings down your RAID that way.

              One thing is clear though: If your drives hit the 5-year mark, you should be prepared to replace them, but if it's better to have spares on the shelf and replace them one by one or replace them all in one big shot? To be honest, I think we're missing some data to make the ultimate informed judgement here. :P

              Comment


              • #8
                Points all taken. Building on them, we're looking at a different scenario if we have four drives in a server, all of the same model, batch and manufacture date, and installed at the same time (e.g. the factory original ones), compared to if we have mix 'n match drives in a RAID set. If they're all of the same batch, they will all hit the different risk points on that graph at about the same time; and if you're unlucky enough to buy drives from a bad batch, the fault will affect all four, as Marcel notes. These could be arguments in favor of going the mix 'n match route (i.e. replacing drives individually as they fail), as against which is the risk of failure during rebuild, and knowing that if you replace the lot in one fell swoop, you don't have to worry about drive maintenance for the next five years.

                Comment


                • #9
                  Well that certainly gave me something to think about.

                  With the outdoor screening season ending in a couple of months, I thought I would wait until then to do anything with the server. In the meantime I shall purchase a couple of 2TB drives and replace at least the bad drive in the backup server. That gives me a spare in the event that a drive does go bad in the main server.

                  Once the season is over I shall copy over all the content I need to retain to the spare server and then buy a couple more drives and replace all the drives in the main server. If that is successful I should gain a couple of TB in the rebuild process.

                  Once that is done I shall replace all the drives in the second server with the 5 year old ones.

                  I really wanted to replace the drives in the back up server so putting new drives in the main server and using its drives for the backup seemed logical (at the time!).

                  Comment


                  • #10
                    Keep in mind if you put 2TB drives into the machine one at a time (letting it rebuild between each new drive swap), you will not miraculously gain storage space. The server will see those drives as 1TB drives, so your storage will not change.

                    The only way to increase storage is to swap them all out at once and run a clean "install" disc. This isn't terribly difficult to do, and whenever we have a drive fail we actually PREFER to go in and wipe the raid bios and do a clean install over a raid rebuild. Many will disagree that is overkill, but it sure does get rid of any and all issues in one bang instead of possibly rebuilding bits of old corrupt data from drive to drive.

                    The problem with doing a clean install though that nobody has mentioned is not only do you need to reload all of your content, but you need to know how to re-configure the server as well. If you don't know how to do this or aren't comfortable with it, then a one-at-a-time drive swap with rebuild is your way to go.

                    As far as the 5 year mark, we ALWAYS change the drives at that point. Anything over 5 years is living on borrowed time and that's always when things start to get super quirky in a raid, regardless of the server.

                    Comment


                    • #11
                      The problem with doing a clean install though that nobody has mentioned
                      Hey, I did! And I also agree with the full reinstall which takes 30 minutes and brings you to a known-good state.

                      If really a whole new set of drives is not an option, at least one should get ready for a failure: install disk, backups, basic set of CPLs needed for the server. So if the server really decides to bite the dust, you can restore very quickly (warning, it might not be super-user friendly even though I never tested the latest backup and restore features, I used to do that manually, editing XML files!). It was one of the advantages of the DSS series: full reinstall was a USB stick (or CD) away.

                      Comment


                      • #12
                        The latest backup and restore features would probably not work with the DSS100 software version.
                        Yet, reinstalling the system is the cleanest way to go, if one has the opportunity.
                        As far as I remember, the way to backup and restore GPIO was explained on the manual (DSS100 or 200 wouldn't make a difference) and a photo of the serial commands and settings would be enough to do again.
                        When I was coming upon an unknown DSS, I would run configuration and take note of all settings, pressing escape at the final stage (no save).

                        If the loss of DCPs was my (only) concern, I would go with replacing one by one the drives, then using the old ones to export/backup all the material (since the drives are not failing yet), rebuild the raid array, install and ingest everything back. All material safe, higher capacity, newly installed software and database.

                        Comment


                        • #13
                          Agreed with all that one of the few major flaws with the DSS server line is the absence of the ability to download all configuration settings into a single file, and re-upload that file into another unit, or one that has been nuked and reinstalled. The way I handle this is to write down all the settings in the config wizard that are not left on factory defaults into a text file, and take screenshots of from the VNC browser of all the settings that are configured in Show Manager (most importantly, obviously, the serial automation settings). There is then ./exportAutomationConfiguration.sh method of getting the serial automation settings out into a file, but unless you have a ridiculously large number of cues, using it is no quicker than entering them manually from a screenshot; and even that doesn't capture the RS232 configuration (e.g. 8N1).

                          IP automation isn't relevant here, as, IIRC, that function was only introduced in 4.8 or 4.9, and the DSS100 won't run later than 4.7.

                          Comment


                          • #14
                            I think Peter runs an outdoor screen, I would assume he does not use much of the automation capabilities, so I guess his backup/restore needs are limited.

                            Comment

                            Working...
                            X