CHAPTER 7

                           MANAGING FAULT TOLERANCE

 

 

Fault Tolerance is the ability of a computer or operating system to respond to a catastrophic event,

such as power outage or a hardware failure, so that no data I lost, and that work in progress is not corrupted.

 

Microsoft Windows NT Server provides fault tolerance through a system called Redundant Array

of Inexpensive Disks (RAID).  Data protected by fault tolerance can be recovered and restored. 

 

 

 

Lesson 1:  Fault Tolerance

 

Fault tolerance is the ability of a system to continue functioning when part of the system fails.  Fault

tolerance is designed to combat problems with disk failures, power outages, or corrupted operating

systems, which can include boot files, the operating system itself, or system files.  Fully fault-tolerant

systems include redundant disk controllers, power supplies, and uninterruptible power supplies

(UPSs) which safeguard against local power failure.

 

 

RAID Systems

 

Windows NT Server provides a software implementation of a fault tolerance technology known as RAID. 

RAID technology is standardized and categorized in levels.  Each level offers various mixes of

performance, reliability, and cost.  Windows NT Server supports RAID levels 1 and 5 to provide fault

tolerance.  Using RAID old 10 MG Disks and connecting them makes use of these old disks instead of

throwing them out.

 

How does RAID operate to protect your system?  RAID provides fault tolerance by implementing data

redundancy.   With data redundancy, data is written to more than one disk in a manner that allows

recovery of the data in the event of a single hard disk failure.

 

RAID  0 = Disk Striping

RAID 1 = Disk Mirroring and Duplexing

RAID 2 = Disk Striping with ECC

RAID 3 = ECC Stored as Parity

RAID 4 = Disk striping with Large Blocks

RAID 5 = Striping with Parity

RAID 10 = Mirrored Drive Arrays

 

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 2                                                2001/11/09

 

 

 

NOTE:  for more information on RAID levels refer to the Networking Essentials Book, page 434 to 437.

 

 

Hardware and Software Implementations of RAID

 

RAID can be implemented as either a hardware or software fault tolerance system:

 

Workstation does not provide fault tolerance.

of a failed drive without shutting down the system.

 

NOTE:  Whether or not you choose Hardware or Software Fault tolerance the need for regular

backups does not diminish.

 

 

 

Hardware Implementations of RAID

 

In a hardware solution, the disk controller interface handles the creation and regeneration of redundant

 information.  Some hardware vendors implement RAID data protection directly into their hardware,

as with disk array controller cards.    Because these methods are vendor-specific and bypass the fault

tolerance software drivers of the operating system, they usually offer performance improvements over

software implementations of RAID.  RAID 0-5, do not worry about RAID 2-4. RAID 10 is also

available, but is fiberoptic.

 

 

Software Implementation of RAID

 

Windows NT Server supports two software implementations of RAID:  RAID1, mirror sets and

RAID5, stripe sets with parity.  RAID can only be on Windows NT Server, not Windows Workstation.

 

 

RAID 1:  Mirror Sets

 

Mirror sets (RAID 1)  use the Windows NT fault tolerance driver (Ftdisk.sys) to simultaneously write

he same data to two physical drives.  Through duplication, or mirroring, RAID 1 helps to ensure the

survival of data in case of failure.   RAID 1 will only work on system and boot partitions other levels

of RAID will not work on boot or system partitions.

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 3                                                  2001/11/09

 

 

 

 

In terms of cost per megabyte, mirror sets is more expensive than other forms of fault tolerance because

disk space use is only 50%.

 

Mirror sets can provide enhanced read performance because the fault tolerance driver can read from

both members of the mirror set at the same time.  There can be a slight decrease in write performance

when writing to a mirror set because the fault tolerance driver must write to both members simultaneously.

 

 

Disk Duplexing:

 

A member of a mirror set is one of the physical disk partitions that make up the set.  If both physical disks

that comprise a mirror set are controlled by the same disk controller, and the disk controller fails, both

members of the mirror set are inaccessible.  However, a second controller can be installed in the

computer so that each disk in the mirror set has its own controller.  This arrangement is called disk

duplexing.  Duplexing also reduces bus traffic and potentially improves read performance.

 

NOTE:  Disk duplexing is a hardware enhancement to a Windows NT Server mirror set.  No

additional software configuration is necessary.

 

 

 

RAID 5:  Stripe Sets with Parity

 

Data written in stripe sets can be protected by the fault tolerance mechanism, stripe sets with parity

(RAID 5),  which is supported by Windows NT Server.

 

Parity is a mathematical method of verifying data integrity.  Fault tolerance is achieved by adding a

parity-information stripe to each disk partition in the volume.  In a stripe set with parity, 3 to 32 disks

are supported.    The parity stripe block is used to reconstruct data for a filed physical disk.   If a

single disk fails, data is not lost because the Windows NT Server fault tolerance driver has spread

the information across the remaining disks.  The data can be completely reconstructed.

 

All normal write operations on a stripe set with parity are substantially slower than writing to stripe

sets without parity due to the parity calculation.

Stripe sets with parity can offer better read performance than mirror sets, especially with multiple

controllers, because data is distributed among multiple drives.  Stripe sets with parity can offer a

cost advantage over mirror sets because disk utilization is optimized.  For example, if there are four

disks in a stripe set with parity, the disk space overhead is 25%, compared to 50% disk space

overhead with mirror sets.

 

 

===================================================================

 

wntsup7.html                                               PAGE 4                                                  2001/11/09

 

 

 

NOTE:  Neither the boot partition nor the system partition, can be part of the Windows NT

implementation of a stripe set with parity.

 

Levels 2, 3, and 4 we will never be tested on in class, but review for MCSE exam.

 

 

RAID 1 vs. RAID 5

 

The main differences are hardware requirements, performance and cost. See the following chart to

compare RAID 1 and RAID 5:

 

=====================================================================

RAID 1 (Mirror Sets)                                    RAID 5 (Stripe Sets with Parity)

=====================================================================

Supports FAT and NTFS                     Support FAT and NTFS

 

Can mirror system or boot partition       Cannot stripe system or boot partition

 

Can mirror system or duplex the

system, the same thing.

 

Requires 2 hard disks                            Requires minimum of 3 hard disks

 

Advantage:  if the system goes

down you have duplication

 

If you setup a mirror you need

to have fault tolerance boot

disk with these files:

 

ntldr

Ntdetect.com

Boot.ini

Bootsect.dos

 

Has higher cost per megabyte               Has lower cost per megabyte

(50% utilization) the space is

split between the two disks

 

Has good read and write                       Has moderate write performance

Performance, but the write                    Has excellent read performance

performance can be a little

slower

 

Uses Less system memory                    Requires more system memory

 

                                                            Supports 3- 32 hard disks.

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 5                                                  2001/11/09

 

 

 

Implementing RAID 1 and RAID 5

 

Mirror sets and stripe sets with parity can coexist on the same computer.  Because a stripe set

with parity cannot include the system or boot partition, consider mirroring the system and boot

partitions, and protecting the remaining data in stripe sets with parity.  See p246.  The system and

boot partition are located on Drive C, which is part of a mirror set.  The remaining data on Drive

D is part of a stripe set with parity.

 

 

Considerations when Creating and Deleting a Stripe Set with Parity

 

The free spaces that are combined to create a stripe set with parity must be the same size.  If they

are not Disk Administrator makes each partition of the set approximately the same size and leaves

the unused portions of the partition as unusable free space.

 

NOTE:  It is necessary to shut down and then restart the computer after a stripe set with parity is created.

 

CAUTION:  Deleting a mirror set or stripe set with parity deletes all the information stored in that volume.

 

*****   Do the exercises on Page 247 *****

 

 

RAID 0

 

The following requirements are for Raid 0:

 

 

 

   64 K

 

 

       64K

 

 

       64K

 

 

 

 

 

 

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 6                                                 2001/11/09

 

 

 

Lesson 2:  Recovering from Hard Disk Failure

 

Fault tolerance duplicates system and user data in case of disk failure.

 

When a member of a mirror set or a stripe set with parity fails (as may occur in a power loss or

hardware failure), the fault tolerance driver directs all I/O to the remaining members of the fault-tolerant

volume.  This ensures continuous service.  If you have used Server Manager to configure the computer

to send administrative alerts, and the Alerter service is running, then an alert message is sent to notify

the specified accounts that this failure has occurred.

 

If the failed disk is part of a mirror set that contains the boot partition, and if the failed disk is the

primary physical drive, then a fault tolerance boot disk will be the required to restart the system.

 

Regenerating a Stripe Set with Parity

 

If a member of a stripe set with parity fails, the computer continues to operate and to gain access

to all data.  However, as data is requested, the Windows NT server fault tolerance driver uses

the parity bits to regenerate the missing data in RAM.  When this happens, system performance

slows down.

 

Use the Disk Administrator to select an area of free space to replace the failed member and then

lick the Regenerate command on the Fault Tolerance menu.  If you do not have the sufficient free

pace, replace the failed drive and then regenerate your data.

 

The fault tolerance driver reads the parity information from the stripes on the other member disks. 

It then recreates the data of the missing member and writes the data to the new members.

 

Recovering from Mirror Set Failure

 

Because of the data duplication involved in mirror sets, the system continues to function when a

member of the mirror set fails.  In order to replace the failed member, an administrator must first

break” the mirror set.

 

Break the mirror set relationship to isolate the remaining working partition as a separate

volume.  When the Mirror set is broken, Disk Administrator assigns the next available

drive letter to the mirrored volume.

 

drive letter that was previously assigned to the complete mirror set to the working

member of the mirror set.  For example, if the disk has any shared resources, or if a

shortcut points to a location on a particular drive letter, you would need to reassign the

drive letter to maintain your computer’s functionality.

 

 

===================================================================

 

wntsup7.html                                               PAGE 7                                                2001/11/09

 

 

 

to determine which partition failed.

 

 

NOTE:  Replacing a failed member is not the only reason that you would

Break a mirror set.  You would also break the mirror set if you wanted to reclaim the disk

space for other purposes.

 

 

Creating a Fault Tolerance Boot Disk

 

When making a mirror set for the boot partition or system partition of a computer running

Windows NT Server, it is important to create a fault tolerance boot disk for use in case of

physical disk failure.

 

Remember that in software implementation of RAID, the system and boot partitions cannot be

members of a stripe set with parity, only mirror sets can provide fault tolerance for the system

and boot partitions.

 

The following illustration outlines the steps involved in creating a fault tolerance boot disk:

 

1.    Format a Disk using Windows NT Server            =         Boot Disk

 

2.    Copy the Necessary Files                                    =         Boot Disk

 

3.     Modify Boot.ini                           

 

4.    Test the Boot Disk                                               =         Boot Disk

 

 

What Happens in DOS when your file name is too Long:  (page 32 Lab book)

 

If you have a file name that is too long, DOS creates an alias and shortens it automatically for you. 

It also does not recognize spaces in names and will compress them.  The following file name can

be entered, but the system will shorten it for you automatically:

 

This is the l etter I wrote  to Aunt Martha.txt

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 8                                                2001/11/09

 

 

DOS will take every 13 characters and make a directory entry for this long file name.  Do not

store file names on the roots, make a directory.

 

DOS  will also shorten a name to LongFI~1.exe, see the Lab boot Page 39.

 

 

MCSE EXAM Sample Questions:

 

If you have a Volume set of the following, what is the largest size.

 

                        400             500               600            700

                        700             700

                       

Largest Volume Set:

 

The answer is 3,600, with volume sets you simply add them all up, and don’t forget there is no

parity and you are not restricted by the smallest disk.

 

Largest Stripe Set with parity:

 

There are two ways to get the same answer:

 

400 X 6 = 2,400MG – 400 (loose one disk for parity) = 2.0MG

500 X 5 = 2,500MG-500 (eliminate the 400 which is the smallest

and go to the next up which is 500)  then –500 (loose one for

parity) = 2.0MG          

                       

Largest Stripe Set without parity:

 

(Don’t  forget you are always limited by the smallest size, unless you want to eliminate it). 

You cannot just jump up to the 700MG and ignore the lower ones, it does not make sense.   

The largest without parity is eliminate the 400MG and 500X 5 = 2.5 MG.

 

(try Testmaster//instructor 9)

 

 

 

How to Create a Fault Tolerance Boot Disk in Detail:

 

1.         Format a floppy disk on a computer running Windows NT Server.  This writes

Information to the boot track of the disk so that if looks for the appropriate loader file

when the system is started.

 

NOTE:  A fault tolerance boot disk must be formatted on a computer running Windows NT Server.

 

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 9                                                2001/11/09

 

 

 

2.         Copy the following files from the primary partition of the computer running Windows NT Server

to the boot disk.  Several of the files are ordinarily hidden in the root folder.  Use Windows NT

Explorer to display the hidden files.

 

======================================================================

X86-based computers                                                RISC-based computers

======================================================================

Ntldr                                                                Osloader.exe

 

Ntdetect.com                                                   Hal.dll

 

Ntbootdd.sys (for SCSI disks not                     *.pal (contains PAL code,

Using SCSI BIOS)*                                         software subroutines that

                                                                        provide an operating system

                                                                        with direct control of the

                                                                        processor)

 

Boot.ini

 

*The Ntbootdd.sys file appears only on SCSI systems in which the SCSI

  BIOS is not used.

 

3.         On Intel x86-based computers, edit Boot.ini to change your operating system entry point to the

mirrored copy of the boot partition.

 

On RISC-based computers, modify the firmware variables shown in the

Following table:

 

 

======================================================================

Variable                                              Value

======================================================================

OSLOADER                           multi(0)disk(0)fdisk(0)Osloader.exe

 

SYSTEMPARTITION            Multi(0)disk(0)fdisk(0)

 

OSLOADPARTION               path to the secondary mirrored partition

 

OSLOADFILENAME path to the Windows NT Server root directory

 

 

4.         Test the boot disk to ensure that it works and boots using data from the mirrored copy of the

boot partition; doing this by shutting down Windows NT Server, inserting the fault tolerance

boot disk, and then restarting the computer.

 

Whenever partition path information has been changed, it is important to update the Boot.ini

file on the fault tolerance boot disk.

 

 

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 10                                                2001/11/09

 

 

 

EXERCISE:  Create a Windows NT boot disk:   (page 253)

 

1.         Format a floppy disk using the Windows NT explorer, or go to a command

            Prompt and use the format.exe command.

2.         Copy the following files form the root of C to the root of drive A:

 

 

3.         Do not remove the diskette from Drive A:

 

4.         Shut down and then restart your computer using the Windows NT boot disk you just created.

 

Did Windows NT start successfully?

 

            Yes it works, but you need the Bootsect.dos if booting to WIN98.

 

5.         Remove the floppy from the drive.

 

 

 

Understanding ARC Paths

 

Creating a fault tolerance boot disk for recovery of a mirrored boot or system partition requires

editing the Advanced RISC computing ARC names in the Boot.ini file.  You need to change the

ARC path so that it points to the secondary or mirrored partition rather than the primary or boot

or system partition.  The ARC paths in the Boot.ini file use a different naming convention, than

the C:\ or D:\ or E:\ drives.Disk = SCSI, and Rdisk = Multi

 

 

When you install Windows NT, a Boot.ini file is generated.  The Boot.ini file contains the ARC

paths used to point to locations of the operating system files.  The following represents an ARC

path in the boot.ini file:

 

multi(0)disk(0)rdisk(1)partition(2)

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 11                                                2001/11/09

 

 

 

====================================================================

Convention                             Description

====================================================================

 

Multi/scsi                    Identifies the hardware adapter/disk controller as either multi or

                                    SCSI.

 

                                    SCSI indicates SCSI controller on which SCSI BIOS is not

                                    Enabled.

           

                                    All other adapters or disk controllers are represented by multi.

                                    These include SCSI disk controllers with the BIOS enabled

                                    so that the SCSI disk is accessed by the SCSI BIOS.  For

                                    Windows NT Server, this could be a disk supported by the

                                    Atdisk, Abiosdsk, or Cpqarray drivers.

                                   

                                    Remember, use multi in all cases except when Ntbootdds.sys

                                    is on your system.  The Ntbootdd.sys file appears only on SCSI

                                    systems in which the SCSI BIOS is not used.

 

(x)                                Ordinal number of the hardware adapter.  For example, if there

                                    are two SCSI adapters in the system, the first two to load and

                                    initialize is assigned the ordinal number 0 and the next adapter

                                    is assigned for ordinal number 1.

 

disk (y)                        SCSI bus number.  For settings of multi, this value is always 0.

 

rdisk (z)                       Ordinal number of the disk (ignored for SCSI controllers).

 

partition(a)                  Ordinal number of the partition.

 

 

 

=====================================================================

 

RISC-based computers use the SCSI naming convention.  In both multi and SCSI conventions,

multi/scsi, disk and drdisk numbers are assigned starting with (0), while partition numbers are

assigned starting with (1).  All non-extended partitions are assigned numbers first, followed by

all logical drives in extended partitions.

 

 

NOTE:  SCSI and multi ARC naming conventions are similar except that the SCSI

Notation varies the disk () parameter for successive disk on one controller, while the multi

format varies the rdisk () parameter.

 

***** See page 255 for examples of the ARC.*********

 

 

 

===================================================================

 

wntsup7.html                                               PAGE 12                                                2001/11/09

 

 

 

Summary:

 

1)         Fault tolerance is the ability of a system to protect against loss of data when part of the system fails.  

2)         Both hardware and software fault tolerance solutions are available.  While hardware solutions can

provide enhanced performance over software solutions, they can also be more costly.

3)         Windows NT Server supports two software fault tolerance methods, mirror sets and stripe sets

with parity. 

4)         The factors that influence whether to use either one or both methods include costs, performance

and reliability.

5)         When you implement fault tolerance, keep the following considerations in mind:

6)         Fault tolerance is not a substitute for regular backups.

7)         Following a failure, fault tolerance has not been achieved until the fault is repaired and the data is

restored.

8)         Create a fault tolerance boot disk to restore the system in the case of a physical disk failure of a mirror

set containing the boot or system partitions.