CHAPTER 7
MANAGING FAULT TOLERANCE
Fault Tolerance is the ability of a computer or operating system to respond to a catastrophic event,
such as power outage or a hardware failure, so that no data I lost, and that work in progress is not corrupted.
Microsoft Windows NT Server provides fault tolerance through a system called Redundant Array
of Inexpensive Disks (RAID). Data protected by fault tolerance can be recovered and restored.
Lesson 1:
Fault Tolerance
Fault tolerance is the ability of a system to continue functioning when part of the system fails. Fault
tolerance is designed to combat problems with disk failures, power outages, or corrupted operating
systems, which can include boot files, the operating system itself, or system files. Fully fault-tolerant
systems include redundant disk controllers, power supplies, and uninterruptible power supplies
(UPSs) which safeguard against local power failure.
RAID Systems
Windows NT Server provides a software implementation of a fault tolerance technology known as RAID.
RAID technology is standardized and categorized in levels. Each level offers various mixes of
performance, reliability, and cost. Windows NT Server supports RAID levels 1 and 5 to provide fault
tolerance. Using RAID old 10 MG Disks and connecting them makes use of these old disks instead of
throwing them out.
How does RAID operate to protect your system? RAID provides fault tolerance by implementing data
redundancy. With data redundancy, data is written to more than one disk in a manner that allows
recovery of the data in the event of a single hard disk failure.
RAID 0 = Disk Striping
RAID 1 = Disk
Mirroring and Duplexing
RAID 2 = Disk
Striping with ECC
RAID 3 = ECC Stored
as Parity
RAID 4 = Disk
striping with Large Blocks
RAID 5 = Striping
with Parity
RAID 10 = Mirrored
Drive Arrays
===================================================================
wntsup7.html PAGE
2
2001/11/09
NOTE: for more information on RAID levels refer to the Networking Essentials Book, page 434 to 437.
Hardware and Software Implementations of RAID
RAID can be implemented as either a hardware or software fault tolerance system:
Workstation does not provide fault tolerance.
of a failed drive without shutting down the system.
NOTE: Whether or not you choose Hardware or Software Fault tolerance the need for regular
backups does not diminish.
Hardware Implementations of RAID
In a hardware solution, the disk controller interface handles the creation and regeneration of redundant
information. Some hardware vendors implement RAID data protection directly into their hardware,
as with disk array controller cards. Because these methods are vendor-specific and bypass the fault
tolerance software drivers of the operating system, they usually offer performance improvements over
software implementations of RAID. RAID 0-5, do not worry about RAID 2-4. RAID 10 is also
available, but is fiberoptic.
Software Implementation of RAID
Windows NT Server supports two software implementations of RAID: RAID1, mirror sets and
RAID5, stripe sets with parity. RAID can only be on Windows NT Server, not Windows Workstation.
RAID 1: Mirror Sets
Mirror sets (RAID 1) use the Windows NT fault tolerance driver (Ftdisk.sys) to simultaneously write
he same data to two physical drives. Through duplication, or mirroring, RAID 1 helps to ensure the
survival of data in case of failure. RAID 1 will only work on system and boot partitions other levels
of RAID will not work on boot or system partitions.
===================================================================
wntsup7.html
PAGE 3 2001/11/09
In terms of cost per megabyte, mirror sets is more expensive than other forms of fault tolerance because
disk space use is only 50%.
Mirror sets can provide enhanced read performance because the fault tolerance driver can read from
both members of the mirror set at the same time. There can be a slight decrease in write performance
when writing to a mirror set because the fault tolerance driver must write to both members simultaneously.
Disk Duplexing:
A member of a mirror set is one of the physical disk partitions that make up the set. If both physical disks
that comprise a mirror set are controlled by the same disk controller, and the disk controller fails, both
members of the mirror set are inaccessible. However, a second controller can be installed in the
computer so that each disk in the mirror set has its own controller. This arrangement is called disk
duplexing. Duplexing also
reduces bus traffic and potentially improves read performance.
NOTE: Disk duplexing is a hardware enhancement to a Windows NT Server mirror set. No
additional software configuration is necessary.
RAID 5: Stripe Sets
with Parity
Data written in stripe sets can be protected by the fault tolerance mechanism, stripe sets with parity
(RAID 5), which is supported by Windows NT Server.
Parity is a mathematical method of verifying data integrity. Fault tolerance is achieved by adding a
parity-information stripe to each disk partition in the volume. In a stripe set with parity, 3 to 32 disks
are supported. The parity stripe block is used to reconstruct data for a filed physical disk. If a
single disk fails, data is not lost because the Windows NT Server fault tolerance driver has spread
the information across the remaining disks. The data can be completely reconstructed.
All normal write operations on a stripe set with parity are substantially slower than writing to stripe
sets without parity due to the parity calculation.
Stripe sets with parity can offer better read performance than mirror sets, especially with multiple
controllers, because data is distributed among multiple drives. Stripe sets with parity can offer a
cost advantage over mirror sets because disk utilization is optimized. For example, if there are four
disks in a stripe set with parity, the disk space overhead is 25%, compared to 50% disk space
overhead with mirror sets.
===================================================================
wntsup7.html PAGE 4
2001/11/09
NOTE: Neither the boot partition nor the system partition, can be part of the Windows NT
implementation of a stripe set with parity.
Levels 2, 3, and 4 we will never be tested on in class, but review for MCSE exam.
RAID 1 vs. RAID 5
The main differences are hardware requirements, performance and cost. See the following chart to
compare RAID 1 and RAID 5:
=====================================================================
RAID 1 (Mirror Sets) RAID 5
(Stripe Sets with Parity)
=====================================================================
Supports FAT and NTFS Support FAT and NTFS
Can mirror system or boot partition Cannot stripe system or boot partition
Can mirror system or duplex the
system, the same thing.
Requires 2 hard disks Requires minimum of 3 hard disks
Advantage: if the system goes
down you have duplication
If you setup a mirror you need
to have fault tolerance boot
disk with these files:
ntldr
Ntdetect.com
Boot.ini
Bootsect.dos
Has higher cost per megabyte Has lower cost per megabyte
(50% utilization) the space is
split between the two disks
Has good read and write Has moderate write performance
Performance, but the write Has excellent read performance
performance can be a little
slower
Uses Less system memory Requires more system memory
Supports 3- 32 hard disks.
===================================================================
wntsup7.html
PAGE 5
2001/11/09
Implementing RAID 1 and RAID 5
Mirror sets and stripe sets with parity can coexist on the same computer. Because a stripe set
with parity cannot include the system or boot partition, consider mirroring the system and boot
partitions, and protecting the remaining data in stripe sets with parity. See p246. The system and
boot partition are located on Drive C, which is part of a mirror set. The remaining data on Drive
D is part of a stripe set with parity.
Considerations when Creating and Deleting a Stripe Set with
Parity
The free spaces that are combined to create a stripe set with parity must be the same size. If they
are not Disk Administrator makes each partition of the set approximately the same size and leaves
the unused portions of the partition as unusable free space.
NOTE: It is necessary to shut down and then restart the computer after a stripe set with parity is created.
CAUTION: Deleting a mirror set or stripe set with parity deletes all the information stored in that volume.
***** Do the exercises on Page 247 *****
RAID 0
The following requirements are for Raid 0:
64 K |
|
64K |
|
64K |
|
|
|
===================================================================
wntsup7.html
PAGE 6
2001/11/09
Lesson 2:
Recovering from Hard Disk Failure
Fault tolerance duplicates system and user data in case of
disk failure.
When a member of a mirror set or a stripe set with parity fails (as may occur in a power loss or
hardware failure), the fault tolerance driver directs all I/O to the remaining members of the fault-tolerant
volume. This ensures continuous service. If you have used Server Manager to configure the computer
to send administrative alerts, and the Alerter service is running, then an alert message is sent to notify
the specified accounts that this failure has occurred.
If the failed disk is part of a mirror set that contains the boot partition, and if the failed disk is the
primary physical drive, then a fault tolerance boot disk will be the required to restart the system.
Regenerating a Stripe Set with Parity
If a member of a stripe set with parity fails, the computer continues to operate and to gain access
to all data. However, as data is requested, the Windows NT server fault tolerance driver uses
the parity bits to regenerate the missing data in RAM. When this happens, system performance
slows down.
Use the Disk Administrator to select an area of free space to replace the failed member and then
lick the Regenerate command on the Fault Tolerance menu. If you do not have the sufficient free
pace, replace the failed drive and then regenerate your data.
The fault tolerance driver reads the parity information from the stripes on the other member disks.
It then recreates the data of the missing member and writes the data to the new members.
Recovering from Mirror Set Failure
Because of the data duplication involved in mirror sets, the system continues to function when a
member of the mirror set fails. In order to replace the failed member, an administrator must first
“break” the mirror set.
Break the mirror set relationship to isolate the remaining working partition as a separate
volume. When the Mirror set is broken, Disk Administrator assigns the next available
drive letter to the mirrored volume.
drive letter that was previously assigned to the complete mirror set to the working
member of the mirror set. For example, if the disk has any shared resources, or if a
shortcut points to a location on a particular drive letter, you would need to reassign the
drive letter to maintain your computer’s functionality.
===================================================================
wntsup7.html
PAGE 7
2001/11/09
to determine which partition failed.
NOTE: Replacing a failed member is not the only reason that you would
Break a mirror set. You would also break the mirror set if you wanted to reclaim the disk
space for other purposes.
Creating a Fault Tolerance Boot Disk
When making a mirror set for the boot partition or system partition of a computer running
Windows NT Server, it is important to create a fault tolerance boot disk for use in case of
physical disk failure.
Remember that in software implementation of RAID, the system and boot partitions cannot be
members of a stripe set with parity, only mirror sets can provide fault tolerance for the system
and boot partitions.
The following illustration outlines the steps involved in creating a fault tolerance boot disk:
1. Format a Disk using Windows NT Server = Boot Disk
2. Copy the Necessary Files = Boot Disk
3. Modify Boot.ini
4. Test the Boot Disk = Boot Disk
What Happens in DOS when your file name is too Long: (page 32 Lab
book)
If you have a file name that is too long, DOS creates an alias and shortens it automatically for you.
It also does not recognize spaces in names and will compress them. The following file name can
be entered, but the system will shorten it for you automatically:
This is the l etter I wrote to Aunt Martha.txt
===================================================================
wntsup7.html PAGE 8
2001/11/09
DOS will take every 13 characters and make a directory entry for this long file name. Do not
store file names on the roots, make a directory.
DOS will also shorten a name to LongFI~1.exe, see the Lab boot Page 39.
MCSE EXAM Sample Questions:
If you have a Volume set of the following, what is the largest size.
400 500 600 700
700 700
Largest Volume Set:
The answer is 3,600, with volume sets you simply add them all up, and don’t forget there is no
parity and you are not restricted by the smallest disk.
Largest Stripe Set with parity:
There are two ways to get the same answer:
400 X 6 = 2,400MG – 400 (loose one disk for parity) = 2.0MG
500 X 5 = 2,500MG-500 (eliminate the 400 which is the smallest
and go to the next up which is 500) then –500 (loose one for
parity) = 2.0MG
Largest Stripe Set without parity:
(Don’t forget you are always limited by the smallest size, unless you want to eliminate it).
You cannot just jump up to the 700MG and ignore the lower ones, it does not make sense.
The largest without parity is eliminate the 400MG and 500X 5 = 2.5 MG.
(try Testmaster//instructor 9)
How to Create a Fault Tolerance Boot Disk in Detail:
1. Format a floppy disk on a computer running Windows NT Server. This writes
Information to the boot track of the disk so that if looks for the appropriate loader file
when the system is started.
NOTE: A fault tolerance boot disk must be formatted on a computer running Windows NT Server.
===================================================================
wntsup7.html
PAGE 9 2001/11/09
2. Copy the following files from the primary partition of the computer running Windows NT Server
to the boot disk. Several of the files are ordinarily hidden in the root folder. Use Windows NT
Explorer to display the hidden files.
======================================================================
X86-based computers RISC-based
computers
======================================================================
Ntldr Osloader.exe
Ntdetect.com Hal.dll
Ntbootdd.sys (for SCSI disks not *.pal (contains PAL code,
Using SCSI BIOS)* software subroutines that
provide an operating system
with direct control of the
processor)
Boot.ini
*The Ntbootdd.sys file appears only on SCSI systems in which the SCSI
BIOS is not used.
3. On Intel x86-based computers, edit Boot.ini to change your operating system entry point to the
mirrored copy of the boot partition.
On RISC-based computers, modify the firmware variables shown in the
Following table:
======================================================================
Variable Value
======================================================================
OSLOADER multi(0)disk(0)fdisk(0)Osloader.exe
SYSTEMPARTITION Multi(0)disk(0)fdisk(0)
OSLOADPARTION path to the secondary mirrored partition
OSLOADFILENAME path to the Windows NT Server root directory
4. Test the boot disk to ensure that it works and boots using data from the mirrored copy of the
boot partition; doing this by shutting down Windows NT Server, inserting the fault tolerance
boot disk, and then restarting the computer.
Whenever partition path information has been changed, it is important to update the Boot.ini
file on the fault tolerance boot disk.
===================================================================
wntsup7.html
PAGE 10
2001/11/09
EXERCISE: Create a Windows NT boot disk: (page 253)
1. Format a floppy disk using the Windows NT explorer, or go to a command
Prompt and use the format.exe command.
2. Copy the following files form the root of C to the root of drive A:
3. Do not remove the diskette from Drive A:
4. Shut down and then restart your computer using the Windows NT boot disk you just created.
Did Windows NT start successfully?
Yes it works, but you need the Bootsect.dos if booting to WIN98.
5. Remove the floppy from the drive.
Understanding ARC Paths
Creating a fault tolerance boot disk for recovery of a mirrored boot or system partition requires
editing the Advanced RISC computing ARC names in the Boot.ini file. You need to change the
ARC path so that it points to the secondary or mirrored partition rather than the primary or boot
or system partition. The ARC paths in the Boot.ini file use a different naming convention, than
the C:\ or D:\ or E:\ drives.Disk = SCSI, and Rdisk = Multi
When you install Windows NT, a Boot.ini file is generated. The Boot.ini file contains the ARC
paths used to point to locations of the operating system files. The following represents an ARC
path in the boot.ini file:
multi(0)disk(0)rdisk(1)partition(2)
===================================================================
wntsup7.html
PAGE 11
2001/11/09
====================================================================
Convention Description
====================================================================
Multi/scsi Identifies the hardware adapter/disk controller as either multi or
SCSI.
SCSI indicates SCSI controller on which SCSI BIOS is not
Enabled.
All other adapters or disk controllers are represented by multi.
These include SCSI disk controllers with the BIOS enabled
so that the SCSI disk is accessed by the SCSI BIOS. For
Windows NT Server, this could be a disk supported by the
Atdisk, Abiosdsk, or Cpqarray drivers.
Remember, use multi in all cases except when Ntbootdds.sys
is on your system. The Ntbootdd.sys file appears only on SCSI
systems in which the SCSI BIOS is not used.
(x) Ordinal number of the hardware adapter. For example, if there
are two SCSI adapters in the system, the first two to load and
initialize is assigned the ordinal number 0 and the next adapter
is assigned for ordinal number 1.
disk (y) SCSI bus number. For settings of multi, this value is always 0.
rdisk (z) Ordinal number of the disk (ignored for SCSI controllers).
partition(a) Ordinal number of the partition.
=====================================================================
RISC-based computers use the SCSI naming convention. In both multi and SCSI conventions,
multi/scsi, disk and drdisk numbers are assigned starting with (0), while partition numbers are
assigned starting with (1). All non-extended partitions are assigned numbers first, followed by
all logical drives in extended partitions.
NOTE: SCSI and multi ARC naming conventions are similar except that the SCSI
Notation varies the disk () parameter for successive disk on one controller, while the multi
format varies the rdisk () parameter.
***** See page 255 for examples of the ARC.*********
===================================================================
wntsup7.html PAGE 12
2001/11/09
Summary:
1) Fault tolerance is the ability of a system to protect against loss of data when part of the system fails.
2) Both hardware and software fault tolerance solutions are available. While hardware solutions can
provide enhanced performance over software solutions, they can also be more costly.
3) Windows NT Server supports two software fault tolerance methods, mirror sets and stripe sets
with parity.
4) The factors that influence whether to use either one or both methods include costs, performance
and reliability.
5) When you implement fault tolerance, keep the following considerations in mind:
6) Fault tolerance is not a substitute for regular backups.
7) Following a failure, fault tolerance has not been achieved until the fault is repaired and the data is
restored.
8) Create a fault tolerance boot disk to restore the system in the case of a physical disk failure of a mirror
set containing the boot or system partitions.