User:Mjb/Make Windows report disk failures

From Offset
Jump to navigationJump to search

After setting up mirrored drives (software RAID) in Disk Manager, I wondered what would happen if a mirrored drive failed. This is what I found out:

When a drive hosting a mirrored volume fails, all of that drive's volumes, including the mirrored ones, go "offline". They just disappear, as if the drive were disconnected. This is why it is risky to mirror your boot drive's volumes; if one fails, you won't be able to boot!* True RAID 1 would keep the volume online, i.e. run just on the good drive.

With Windows-based mirroring, you can run on just the one drive, but first you have to go back into Disk Manager and bring the volume back online manually. The idea is that you will resolve the problem with the failed drive before bringing the volume back online. But some problems are temporary and don't require replacement of the drive; you can just bring the volume back online and the failed drive will begin resyncing and soon you'll have the mirror fully functional with no further intervention.

Anyway, this may seem strange, but apparently in Windows, drives are expected to come online and go offline all the time, and every user doesn't need to be notified of such changes. If a volume is available, it's available. If not, it's not, and you find out when you need it. Consequently, you won't get any indication the failure of a drive and its mirrored volumes being taken offline.

So I used the Task Scheduler to make a task that periodically runs this command (as an argument to cmd.exe /C):

echo list volume | diskpart | findstr /i "failed risk" && eventcreate /l "System" /so "Disk Check Script" /id 1 /t WARNING /d "A drive has failed. Check Disk Manager."

I figure once a day is good enough; what are the odds that both drives would fail on the same day? Don't answer that.

All that does, though, is make an event in the System log. Another task needs to be triggered by that event in order to send a notification to the desktop or email you. (Alternatively, you can have it be triggered by the many possible events in a drive failure, but that can get complicated.)

I set up a desktop notification in Task Scheduler, but it didn't work very well; apparently the window appears underneath all the other windows and doesn't stick around for long. I saw a StackExchange tip that a better option is to run a powershell script to create the desktop alert.

I think you can also do it by installing the XML code (below) with the following command (replacing xmlfile.xml with the actual file name, of course):

  • schtasks /create /xml "xmlfile.xml"

You should replace XXX and YYY in these examples with a valid machine name and user name, although I'm not sure if anything in the RegistrationInfo section really matters. (If it doesn't work, try just removing the whole RegistrationInfo element and its contents, or replacing it with an empty element, like this: <RegistrationInfo/>.)

Report disk failures in System log.xml

<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.3" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
  <RegistrationInfo>
    <Date>2015-01-21T19:36:03.4744179</Date>
    <Author>XXX\YYY</Author>
    <Description>If any drives are "at risk" or "failed", add an event to the System log</Description>
  </RegistrationInfo>
  <Triggers>
    <CalendarTrigger>
      <StartBoundary>2015-01-21T19:45:00</StartBoundary>
      <Enabled>true</Enabled>
      <ScheduleByDay>
        <DaysInterval>1</DaysInterval>
      </ScheduleByDay>
    </CalendarTrigger>
  </Triggers>
  <Principals>
    <Principal id="Author">
      <UserId>S-1-5-18</UserId>
      <RunLevel>LeastPrivilege</RunLevel>
    </Principal>
  </Principals>
  <Settings>
    <MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>
    <DisallowStartIfOnBatteries>false</DisallowStartIfOnBatteries>
    <StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>
    <AllowHardTerminate>true</AllowHardTerminate>
    <StartWhenAvailable>true</StartWhenAvailable>
    <RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>
    <IdleSettings>
      <StopOnIdleEnd>true</StopOnIdleEnd>
      <RestartOnIdle>false</RestartOnIdle>
    </IdleSettings>
    <AllowStartOnDemand>true</AllowStartOnDemand>
    <Enabled>true</Enabled>
    <Hidden>false</Hidden>
    <RunOnlyIfIdle>false</RunOnlyIfIdle>
    <DisallowStartOnRemoteAppSession>false</DisallowStartOnRemoteAppSession>
    <UseUnifiedSchedulingEngine>false</UseUnifiedSchedulingEngine>
    <WakeToRun>false</WakeToRun>
    <ExecutionTimeLimit>P3D</ExecutionTimeLimit>
    <Priority>7</Priority>
  </Settings>
  <Actions Context="Author">
    <Exec>
      <Command>cmd.exe</Command>
      <Arguments>/C "echo list volume | diskpart | findstr /i "failed risk" && eventcreate /l "System" /so "Disk Check Script" /id 1 /t WARNING /d "A drive has failed. Check Disk Manager.""</Arguments>
    </Exec>
  </Actions>
</Task>

Alert the user in the event of a disk problem.xml

<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.3" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
  <RegistrationInfo>
    <Date>2015-01-21T19:51:16.8376592</Date>
    <Author>XXX\YYY</Author>
    <Description>Display a message on the Desktop to alert the user when the System log contains a disk problem as reported by the "Report disk failures in System log" task</Description>
  </RegistrationInfo>
  <Triggers>
    <EventTrigger>
      <Enabled>true</Enabled>
      <Subscription><QueryList><Query Id="0" Path="System"><Select Path="System">*[System[Provider[@Name='Disk Check Script'] and EventID=1]]</Select></Query></QueryList></Subscription>
    </EventTrigger>
  </Triggers>
  <Principals>
    <Principal id="Author">
      <UserId>S-1-5-18</UserId>
      <RunLevel>LeastPrivilege</RunLevel>
    </Principal>
  </Principals>
  <Settings>
    <MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>
    <DisallowStartIfOnBatteries>false</DisallowStartIfOnBatteries>
    <StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>
    <AllowHardTerminate>true</AllowHardTerminate>
    <StartWhenAvailable>true</StartWhenAvailable>
    <RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>
    <IdleSettings>
      <StopOnIdleEnd>true</StopOnIdleEnd>
      <RestartOnIdle>false</RestartOnIdle>
    </IdleSettings>
    <AllowStartOnDemand>true</AllowStartOnDemand>
    <Enabled>true</Enabled>
    <Hidden>false</Hidden>
    <RunOnlyIfIdle>false</RunOnlyIfIdle>
    <DisallowStartOnRemoteAppSession>false</DisallowStartOnRemoteAppSession>
    <UseUnifiedSchedulingEngine>false</UseUnifiedSchedulingEngine>
    <WakeToRun>false</WakeToRun>
    <ExecutionTimeLimit>PT1H</ExecutionTimeLimit>
    <Priority>7</Priority>
    <RestartOnFailure>
      <Interval>PT5M</Interval>
      <Count>3</Count>
    </RestartOnFailure>
  </Settings>
  <Actions Context="Author">
    <Exec>
      <Command>powershell</Command>
      <Arguments>-WindowStyle hidden -Command "& {[System.Reflection.Assembly]::LoadWithPartialName('System.Windows.Forms'); [System.Windows.Forms.MessageBox]::Show('It looks like a disk drive was reported "at risk" or "failed". Please check the Disk Manager ASAP!','Disk Check Script (scheduled task)')}"</Arguments>
    </Exec>
  </Actions>
</Task>

Footnote

  • To successfully mirror a boot drive, you need to make sure to mirror the C: volume and the system reserved partition. There's an extensive list of steps to follow at https://support.microsoft.com/en-us/help/814070/how-to-establish-and-boot-to-gpt-mirrors-on-64-bit-windows ... If the primary drive fails, you must disconnect it, then at boot time you will be given the option of booting from the "secondary plex" drive. You can then go into Disk Manager and break or remove the mirror to get rid of the dependency on the failed drive. If the secondary drive fails, then at boot you will be told you are "missing hardware" and your only option (I think) is to replace the drive before you can boot again...or maybe attach the primary drive (not booting from it) to a different Windows machine in order to run Disk Manager and see if it's possible to break/remove the mirror. I have not tried any of this; I'm relying on answers.microsoft.com posts.