Operating System

 

 

Active Directory Disaster Recovery

White Paper

Abstract

The Active Directory™ service and the systems required for its successful operation are the core of the Windows® 2000 Server operating system. System administrators must understand how to keep these crucial systems functional and what to do in the event of a failure.

Domain controllers can assume numerous roles within an Active Directory infrastructure—global catalogs (GCs), operations masters (OMs), and simple domain controllers. The steps to recover the Active Directory database after a failure are described in this white paper, together with the particular requirements necessary to restore a server to a special role.

The steps outlined in this document have been verified through recovery operations staged in the Compaq QTEST Windows 2000 organization. QTEST is a worldwide deployment of Windows 2000-based servers used by Compaq consultants to verify and test different deployment scenarios.

 

 

 

 

 

 

 


The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

 

© 2000 Microsoft Corporation. All rights reserved. Microsoft, BackOffice, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Other product and company names mentioned herein may be the trademarks of their respective owners.

Microsoft Corporation • One Microsoft WayRedmond, WA 98052-6399USA

7/2000


Contents


Acknowledgements.......................................................................... 1

Introduction...................................................................................... 2

Overview of Active Directory............................................................ 3

Active Directory Database                                                                     4

Active Directory Servers and Roles                                                        4

Global Catalogs                                                                               5

Operations Master Servers                                                                5

Active Directory Replication                                                                   7

Active Directory Backup.................................................................... 8

System State                                                                                      8

Types of Backup                                                                                  9

What is a Good Backup?                                                                      9

Contents                                                                                         9

Age                                                                                               10

Rights Required                                                                                  10

Backup Performance                                                                          11

Active Directory Disaster Recovery Flowcharts............................... 12

Types of Disaster                                                                               12

Recovering Active Directory............................................................ 16

Restore Through Re-Installation                                                           16

Considerations for Restoring a DC Through Re-Installation                  16

Steps Required to Restore a DC Through Re-Installation                    17

Restore From Backup                                                                         19

Non-Authoritative Restore                                                                20

Authoritative Restore                                                                       20

Required Rights for Active Directory Restore                                     21

Considerations for Restoring a DC From Backup                               22

Steps Required for a Non-Authoritative Restore                                 25

Steps Required for an Authoritative Restore                                      26

Verification of a Successful Restore                                                 29

Recovery of a Global Catalog Server.............................................. 32

Steps Required for the Restore of a Global Catalog Server                      32

Steps Required to Assign a New GC                                                    32

Recovery of  an Operations Master................................................. 33

Seizing an Operations Master Role                                                      33

Steps Required to Seize an Operations Master                                     33

Recovery of the Schema Master                                                          34

Impact on Environment                                                                    34

Considerations for Performing a Seizure On a Schema Master           35

Recovery of the Domain Naming Master                                               35

Impact on Environment                                                                    35

Considerations for Performing a Seizure on a Domain Naming Master 36

Recovery of the RID Master                                                                 36

Impact on Environment                                                                    36

Considerations for Performing a Seizure on a RID Master                   37

Recovery of the PDC Emulator                                                             37

Impact on Environment                                                                    38

Considerations for Performing a Seizure on a PDC Emulator Master   39

Recovery of the Infrastructure Master                                                    40

Impact on Environment                                                                    40

Considerations for Performing a Seizure on the Infrastructure Master   41

Summary........................................................................................ 42

For More Information                                                                       42

Appendix I...................................................................................... 43

Database Integrity Testing                                                                   43

Performing a Soft Recovery of the Log Files                                      44

Ensuring File Integrity                                                                     44

Ensuring Database Integrity                                                             48

Performing Semantic Database Analysis                                          49

Appendix II..................................................................................... 51

Database Repair                                                                                 51

Appendix III.................................................................................... 53

Database: Location, Move, and Offline Defragmentation                         53

Determining the Location of Database Files and Log Files                  53

Moving the Database                                                                      53

Offline Defragmentation                                                                   54

Appendix IV.................................................................................... 56

Useful tools for Active Directory Disaster Recovery                                56

Windows 2000 Domain Manager (NetDom.exe)                                 56

Replication Diagnostics Tool (Repadmin.exe)                                    56

Active Directory Diagnostics Tool (Ntdsutil.exe)                                 56

Windows 2000 Backup Utility (Ntbackup.exe)                                   56

ADSI Edit                                                                                      57

 



Stephen Craike and Karl Robinson from Compaq have helped with the development of this white paper. Most of the tests for this paper have been performed at the Compaq Qtest labs.

 


This paper discusses the steps for recovering a domain controller from a disaster such as a database malfunction caused by hardware or software failure. Such a disaster generally renders the domain controller useless and prevents the machine from booting normally. Another cause of disaster is the human kind, in which an error is involved and erroneous data is replicated to other domain controllers in the enterprise.

This paper provides information about recovering a domain controller running Active Directory and no other services. If other services are installed on the machine, such as Domain Name System (DNS) or Internet Information Service (IIS), some other steps may be required, but they are not included in this paper.

Most of the examples in this paper are based on the Windows 2000 backup utility (ntbackup.exe), which is the default backup application that ships with Windows 2000. More information on this tool can be found in Appendix IV. Users may have their own favorite backup applications, but the contents of this paper still apply.

This paper does not discuss troubleshooting problems with Active Directory. Instead it deals with cases in which all troubleshooting has failed and Active Directory is unable to function, which may preclude the user from booting the domain controller into normal mode.

This paper assumes that the user has some prior knowledge of Active Directory and the components surrounding it. For information about Active Directory, please read the Distributed Systems Guide book in the Windows 2000 Server Resource Kit.

A system administrator can use the information in this paper to make disaster recovery plans, but this paper must be augmented with specific information from the organization’s internal environment and existing disaster recovery policies.

 


The Active Directory service is the directory service for Windows 2000. It is a core component of the operating system and provides essential data to both the enterprise and other components within the OS.

Active Directory provides a central service for administrators to organize network resources, manage users, computers, and applications.

Many different objects can be stored in the Active Directory, including:

  • Users.
  • Groups.

·         Security credentials such as certificates.

  • System resources such as computers (or servers) and printers.
  • Replication components, settings are themselves objects in the Active Directory.
  • COM component configuration, which was stored in the registry in Windows NT, is now stored in the class store in the Active Directory.
  • Rules and policies to control the working environment.

The illustration in Figure 1 below depicts many different objects stored centrally in the Active Directory.

 


Figure 1. Many different objects can be stored in the Active Directory.

 

Active Directory Database

Active Directory is a transacted database system that uses log files to support rollback semantics to ensure that transactions are committed to the database. The files associated with Active Directory are:

  • Ntds.dit – the database.
  • Edbxxxxx.log – transaction logs.
  • Edb.chk – checkpoint file.
  • Res1.log & Res2.log – reserved log files.

Ntds.dit grows as the database fills up. However, the logs are of fixed size (10 MB). Any change made to the database is also appended to the current log file, and its disk image is always kept up to date.

Edb.log is the current log file. When a change is made to the database, it is written to the Edb.log file. When the Edb.log file is full of transactions, it is renamed to Edbxxxxx.log. (It starts at 00001 and continues to increment using hexadecimal notation.) Since Active Directory uses circular logging, old log files are constantly deleted, once they have been written to the database. At any point in time, you will find the edb.log file, and maybe one or more Edbxxxxx.log files.

Res1.log and Res2.log are "placeholders" — designed to reserve (in this case) the last 20 MB of disk space on this drive. This is designed to give the log files sufficient room for a graceful shutdown if all other disk space is consumed.

The Edb.chk file, stores the database checkpoint, which identifies the point where the database engine needs to replay the logs, generally at the time of recovery or initialization.

For performance reasons, the log files should be located on a different disk than the database to reduce disk contention.

At the time of taking a backup, a new log file may be created. This log file would be deleted (like regular old log files) due to circular logging, as stated above.

Active Directory Servers and Roles

A domain controller (DC) is a server that hosts a domain database and performs authentication services. In Windows 2000 Server, the domain database is a part of the Active Directory database. In Windows 2000, object changes can be made on any DC within the environment instead of just a primary domain controller (PDC), as in Windows NT Server 4.0.

DCs must initiate and perform replication operations to ensure that all DCs in the environment host a current and accurate version of the directory. In addition, all of the domain controllers in a particular forest host a copy of the forest configuration and schema containers.

Domain controllers can also be global catalogs or hold special roles as described below. In case of failure, it is important to know if the particular DC was a GC or operations master role holder so that appropriate action can be taken.

Global Catalogs

 

The global catalog’s primary function is to provide fast and efficient searches that extend across the entire Active Directory forest. A GC holds a read/write full replica of all objects within the domain for which it is a member and a read-only partial replica (all objects but only a partial attribute set) of every other domain within the forest. The global catalog, therefore, makes directory structures within a forest transparent to end users, creating a search mechanism that makes finding objects in the directory uncomplicated and efficient.

In addition, the global catalog is also required for the enumeration of universal group memberships and user principal names (UPNs) in a native Windows 2000 domain. As a result, if a DC cannot contact a GC at the point of client logon, cached local logon credentials are all the client will receive, and access to remote resources will be denied.

Note: To know if a DC is a global catalog server, look at the properties of its ntdsDSA object in the Sites and Services snap-in. (Right-click on the Ntds-settings of the domain controller and select properties.) If the global catalog checkbox is checked, the DC is a global catalog server. You can view the snap-in on any live DC to check if the failed DC was a GC.

Operations Master Servers

Active Directory supports multi-master updates. Each DC hosts a read/write version of its directory partition. Therefore, the Active Directory must allow for the possibility of conflicting changes, such as changes made simultaneously to the same object within the directory but on different DCs. The Active Directory uses a well-defined conflict resolution method and eventually all DCs converge to the same value.

Even with this well-defined method, it is sometimes better to prevent conflicts than to resolve them after the event. Operations masters in Active Directory prevent conflicting updates in cases where conflict resolution is unsuitable.

 

Active Directory defines five operations master roles:

  • schema master
  • domain naming master
  • relative identifier (RID) master
  • primary domain controller emulator (PDCE)
  • infrastructure master

The schema master and domain naming master are per-forest roles, meaning that there is only one schema master and one domain naming master in the entire forest. The other operations master roles are per-domain roles, meaning that each domain in a forest has its own RID master, PDCE, and infrastructure master.

To check which DC owns the domain naming master role, open the Domain and Trusts snap-in. To check the schema master, open the Schema snap-in. For any of the per-domain roles, check the Users & Computers snap-in. At each snap-in, at the very top container (in the left pane), right click and select operations master.

The Schema snap-in is not one of the default MMC snap-ins provided with Windows 2000 Server. To make it appear in the list of available snap-ins, you must install the admin tools package (Adminpak.msi) from the Windows 2000 Server CD.

To register the Schema snap-in, type Regsvr32 schmmgmt.dll at the command prompt or at the Run command on the Start menu.

Schema Master

The DC that holds the schema master role is the only DC that can update the directory schema. Those schema updates are replicated from the schema master to all other domain controllers in the forest.

Domain Naming Master

The DC that hosts the domain naming master role is the only DC that can do the following:

·         Add new domains to the forest.

  • Remove existing domains from the forest.
  • Add or remove cross-reference objects describing external directories.

Relative Identifier (RID) Master

This operations master manages the allocation of RID pools to other DCs.  Only one server performs this task. When a security principal such as a user, group, or computer is created, it requires a RID to be combined with a domain wide identifier, to create a unique security identifier (SID).

Every Windows 2000 DC receives a pool of RIDs (default 512) it can use to create objects. The RID master ensures that these IDs remain unique on every DC by assigning different pools. All object moves between domains of the same forest are also done via the RID master.

PDCE

The primary domain controller emulator provides the following major functions:

  • Backward compatibility to down-level clients and servers allowing Windows NT4.0 backup domain controllers (BDCs) to participate in the new Windows 2000 environment.

·         Native Windows 2000 environments forward password changes to the PDCE. Each time a DC fails to authenticate a password, it contacts the PDCE to see whether the password can be authenticated there, perhaps as a result of a change that has not yet replicated to the authenticating DC.

  • Time synchronization—the PDCEs of the domains within the forest will synchronize with the PDCE in the root domain of the forest.

Infrastructure Master

The infrastructure master ensures consistency of objects for all inter-domain operations. When an object from another domain is referenced, this reference contains the globally unique identifier (GUID), the security identifier (SID) and the distinguished name (DN) of that object. If the referenced object moves, the DC holding the infrastructure master role in a domain is responsible for updating the SIDs and DNs in cross-domain object references in that domain.

Active Directory Replication

As Windows 2000 DCs hold a replica of all objects belonging to their domain and have full read/write access to these objects, administration of the domain can be done via any DC participating within that domain. These operations affect the state of an object and must therefore be replicated to the other DCs.

Replication is the process of propagating object updates among DCs.

The replication of changed objects does not occur immediately. Replication is triggered after a period of time, gathering all changes and providing them to other DCs in collections. As a result, in normal operation the Active Directory on any DC can be regarded as always being in a state of loose consistency. That is, the information on all DCs within a Windows 2000 environment is likely to be different as replication changes may be on the way from other DCs or waiting to be triggered. Eventually the changes arrive and DCs synchronize with each other.

Replication and loose consistency are important concepts when considering the recovery techniques of Active Directory.


An important component of an Active Directory Disaster Recovery plan is an understanding of the implications and considerations around the backup of Active Directory.

System State

Active Directory is backed up as part of System State, a collection of system components that depend on each other. These components must be backed up (and restored) together.

Components that make up the System State on a domain controller include:

System Start-up Files (boot files). These are the files required for Windows 2000 to boot. They are automatically backed up as part of the System State.

System registry. The contents of the registry are automatically backed up when you back up System State data. In addition, a copy of your registry files are saved in the folder %SystemRoot%\Repair\Regback allowing you to restore the registry without doing a complete restore of the System State.

Class registration database of COM+. The Component Object Model (COM) is a binary standard for writing component software in a distributed systems environment. The Component Services Class Registration Database is backed up and restored with the System State data.

SYSVOL. The system volume provides a default Active Directory location for files that must be shared for common access throughout a domain. The SYSVOL folder on a domain controller contains the following:

  • Net Logon shares. (These usually host logon scripts and policy objects for non-Windows 2000–based network clients.)
  • File system junctions.
  • User logon scripts for Windows 2000 Professional–based clients and clients that are running Windows 95, Windows 98, or Windows NT 4.0.
  • Windows 2000 Group Policy.
  • File replication service (FRS) staging directories and files that are required to be available and synchronized between domain controllers.

Active Directory.  This includes:

  • Ntds.dit. The Active Directory database.
  • Edb.chk. The checkpoint file.
  • Edb*.log. The transaction logs; each 10 MB in size.
  • Res1.log and Res2.log. Reserved transaction logs.

Note: If you have an Active Directory-integrated DNS, the zone data will be backed up as part of the Active Directory database. If you do not have an Active Directory-integrated DNS, the zone files will have to be backed up explicitly. However if you backup the system disk along with the System State, this data will be backed up as part of the system disk.

If you have Cluster Service or Certificate Services installed on your domain controller, they are backed up as part of System State. Details of these components are not discussed in this paper.

Types of Backup

The backup tool in Windows 2000 supports multiple types of backup including:

  • Normal
  • Copy
  • Incremental
  • Differential
  • Daily

However, since Active Directory is backed up as part of System State, the only type of backup available for Active Directory is normal. A normal backup creates a backup of the entire System State while the domain controller is online. In addition it marks each file as having been backed up, which clears the archive attribute of the file.

What is a Good Backup?

To ensure a successful restore from backup, it is important to know what defines a “good backup.” For Active Directory, two things must be considered:

  • Contents
  • Age

Contents

The first important aspect of a backup is its contents. A good backup will include at least the System State, the contents of the system disk, and the SYSVOL folder (if not located on the system disk). As described above, the System State includes many key files and settings to restore a domain controller. Backing up the system disk and SYSVOL folder structure will ensure that all the required system files and folders are in place to initiate a successful restoration.

Note: Best performance practice states that the Active Directory’s log and database files should be on separate spindles (disks). If you have configured your DCs in this manner you will have Active Directory components spread out on multiple drives, such as D:\Winnt\NTDS for your logs and E:\Winnt\NTDS for your database.

Because the Active Directory log files and database are backed up as part of System State, you will still only have to backup the system disk and System State in order to ensure a good backup, even under this distributed installation.

Age

If the backup is older than the tombstone age set in Active Directory, then it is not considered to be a good backup.

When an object is deleted in Windows 2000, the DC from which the object was deleted informs the other DCs in the environment about the deletion by replicating what is known as a tombstone.

A tombstone is a representation of an object that has been deleted but not fully removed from the directory. The tombstone will eventually be removed based on the tombstone lifetime setting, which by default is set to 60 days. If a DC is restored to a state prior to the deletion of an object, and the tombstone for that object is not replicated to the restored DC before the tombstone expires, the object remains present only on the restored DC, resulting in an inconsistency.  Thus it is important that the DC be restored prior to expiration of the tombstone, and that inbound replication from a DC containing the tombstone to the restored DC is completed prior to expiration of the tombstone.

Active Directory protects itself from restoring data older than the tombstone lifetime by disallowing the restore. As a result, the useful life of a backup is equivalent to the "tombstone lifetime" setting for the enterprise.

Given this, the backup interval should be at least once within the tombstone lifetime. However, Microsoft strongly recommends that administrators backup the System State and system disk more often to ensure, at any given time, a backup is available that holds a recent version of the data.

Important: Backup data from a DC can only be used to restore that DC. You cannot use a backup of one DC to restore another. To have your environment completely backed up, you would need to have a backup of every domain controller. This should be kept in mind while developing your backup strategy. The minimum requirement should be to backup all the OM role holders and GCs. Also the first domain controller in the root domain should always be backed up.

Rights Required

In Windows 2000, backup and restore rights are independent of each other. To back up Active Directory, you must be a member of either the Backup Operators Group or the Administrators Group.

Backup Performance

Understanding the time required to backup an active DC is an important component in determining the best backup strategy for your business. To assist with this, the graph in Figure 2 below shows some indicative times required to backup various sized Active Directory databases.

Since backup of the domain controller takes place while the DC is online, time shouldn’t be a major concern. However it is advisable not to schedule backups at peak hours, because that could affect the performance of the domain controller for other activities.

The data backed up in the tests represented below is for the System State only. As the definition of a good backup also includes the backup of the system disk and SYSVOL, a good backup will take slightly longer depending on the size of the additional files. Further, your results may vary depending on the speed of your tape drive and system configuration.


Figure 2: Graph indicating backup times for different size Active Directory databases.

 

 


The remainder of this white paper will guide you through the steps and considerations required to implement the concepts involved with Active Directory Disaster Recovery planning. The flowcharts in Figures 3, 4, 5 and 6 help illustrate the steps. The steps are described in more detail in the pages that follow.

Note: The flowcharts below do not depict every disaster situation. Rather, they help illustrate the options available and their appropriate use.

Types of Disaster

When faced with a disaster, you must first determine the type of disaster. This paper focuses on troubleshooting two possible types of disasters:

  • Database corruption—a situation in which one of the following occurs:
    • Disks become corrupted, such as when the writeback cache is not saved due to a power failure and bad batteries.
    • The domain controller has suffered a severe hardware failure and needs to be replaced.

o        Software failure prevents the machine from booting in normal mode.

·        

Data corruption—defined as a situation in which an administrator or someone with the appropriate permissions has accidentally deleted an object and the deletion has replicated to other DCs within the environment.

Figure 3. Disaster recovery options

                Figure 4. Disk corruption disaster recovery steps.


 

Figure 5. Domain controller hardware failure disaster recovery steps.

 

Figure 6. Data corruption disaster recovery steps.


There are two primary methods for restoring a Windows 2000 DC:

  • Restore through re-installation.
  • Restore from backup.

This section will detail the steps to perform these operations and the considerations associated with them.

Restore Through Re-Installation

This method relies on Active Directory replication to restore a DC to a working state, and is only valid if another healthy DC exists in the same domain. Once the Windows 2000 operating system is installed, the machine is once again promoted to a DC in the domain it existed in before the failure. During this process the replication that occurs during the normal (promotion process) DCPROMO operation will ensure that the DC has an accurate and up-to-date copy of the Active Directory database. This method is only recommended when a good backup of the domain controller is not available.

Considerations for Restoring a DC Through Re-Installation

Bandwidth Considerations

The primary consideration when recovering a DC via replication is bandwidth. The bandwidth required to restore a DC via replication is directly proportional to the size of the Active Directory database and the time in which the DC is required to be at a functioning state. 

The chart in Figure 7 below represents the time needed to replicate a new DC into an existing domain over various network speeds. The Active Directory database ntds.dit used in the testing was 2 GB in size.

Note: The systems used to gather the data were Compaq Proliant 1600s with Dual Pentium II 266Mhz processors, 256 MB RAM, and a single hard drive. Using different systems may affect the results, but the overall trend should remain consistent.

 


Figure 7: Graph showing the time required to replicate a 2 GB database over varying network bandwidth.

 

Steps Required to Restore a DC Through Re-Installation

Recovering through re-installation is the same process as creating a new DC. At least one functional DC in the target domain must exist to restore via re-installation. Ideally this DC should be located in the same Active Directory site as the replicating DC (new DC) in an attempt to reduce the network impact and restore times associated with this method. For a more detailed look at the effect of bandwidth on this form of restoration please see Figure 7 above.

The steps involved in this process are:

  1. Cleanup operation, such as removing the failed DC object from Active Directory.
  2. Installing a fresh copy of Windows 2000 Server.
  3. Running DCpromo.exe (AD installation tool) to promote this machine to the domain controller role.

The cleanup operation is documented below. Step 2 and Step 3 are assumed knowledge for this paper. More information on the promotion process can be obtained in the Windows 2000 Server Resource Kit’s Distributed Systems Guide. The cleanup operation depends on whether the new DC is given the same name as the failed machine.

 

If the new DC receives the same name as the failed DC, you must remove the ntdsDSA object of the failed DC:

 

  1. At the command line, type ntdsutil.
  2. At the prompt ntdsutil:, type metadata cleanup and press Enter.
  3. You need to now connect to an existing domain controller on which you want to remove the ntdsDSA object of the failed DC.
  4. At the metadata cleanup prompt, type connections and press Enter.
  5. Type connect to server <servername> and press Enter. Where <servername> is the DC that will be used to clean the metadata from (any functional DC in the same domain).
  6. Type quit and press Enter.  This will return you to the metadata cleanup menu.
  7. Type select operation target and press Enter.
  8. Type list domains and press Enter.  This lists all domains in the forest with a number associated with each.
  9. Type select domain <number> and press Enter where <number> is the number corresponding to the domain in which the failed server was located.
  10. Type list sites and press Enter.
  11. Type select site <number> and press Enter where <number> refers to the number of the site in which the DC was a member.
  12. Type list servers in site and press Enter. This will list all servers in that site with a corresponding number.
  13. Type select server <number> and press Enter where <number> refers to the DC to be removed.
  14. Type quit and press Enter. The Metadata cleanup menu is displayed.
  15. Type remove selected server and press Enter. 

At this point you should receive confirmation that the DC was removed successfully. If you receive an error that the object could not be found, it might have already been removed from the Active Directory.

     16. Type quit, and press Enter repeatedly to return to the command prompt.

 

Note: Because this procedure requires modifying the configuration naming context, it requires Enterprise Administrator permissions.

If the new DC receives a different name than the failed DC, you should perform the following additional steps:

·         Removal of the failed server object from the Sites & Services snap-in:

  1. Open the Sites & Services snap-in.
  2. Select the appropriate site.
  3. Delete the server object associated with the failed DC.

·         Removal of the failed computer account from the Users & Computers snap-in:

1.   Open the Users & Computers snap-in.

  1. Select the domain controllers container.
  2. Delete the computer object associated with the failed DC.

 

WARNING:

Do not perform the additional steps above if the new machine will have the same name as the failed machine. Make sure that hardware failure wasn’t the cause of the problem. If the faulty hardware isn’t changed, then restoring via reinstallation may not help.

 

Restore From Backup

This method relies primarily on the last good backup taken of the DC before the failure. A restore process can be initiated using either the Windows 2000 backup utility or a supported third party utility selected by your organization. The restore process will return the DC to its state at the time of backup; the DC will then query it’s replication partner(s) for any updates since that time. If there are changes, they will be replicated, ensuring the DC has an accurate and up-to-date copy of the Active Directory database.

In addition, by restoring Active Directory from backup you have two further options available:

·         Non-Authoritative Restore

·         Authoritative Restore

These two methods allow you to manipulate two important components of the System State during the restore process, the Active Directory and SYSVOL. Although these components are restored together, they are discussed separately here.

Non-Authoritative Restore

Active Directory

Non-authoritative restore is the default method for the restoration of Active Directory, and it will be used for the majority of restore operations. Using this method, settings and entries that existed in the domain, schema, configuration, and optionally the global catalog naming contexts maintain the version number they had at the time of backup.

After a non-authoritative restore, the DC is updated using normal replication techniques. That is if the version number of an attribute is less than the version number of the same attribute stored in its replication partners database (indicating the object has changed since it was last backed up). The object on the restored server will be updated with the changes that were made to that object since the time of the last backup. This ensures an up-to-date version of the database.

SYSVOL

By restoring the SYSVOL non-authoritatively, the local copy that is held on the restored DC will be compared with that of its replication partners (using MD5 Checksums). Once the DC reboots, it will contact its replication partner(s), compare SYSVOL information, and replicate the necessary changes, bringing it up to date with the other DCs within the domain.

This method should be used when there is at least one other functioning DC in the domain. This is the default SYSVOL restoration method and will occur automatically if a non-authoritative restore of the Active Directory is carried out.

If there is no other functioning DC in the domain, a PRIMARY restore of the SYSVOL should be done. A primary restore builds a new ntfrs (Windows NT File Replication Service) database by loading the data present under SYSVOL on the local DC. This method is the same as non-authoritative except that the SYSVOL should be marked PRIMARY.

 

Authoritative Restore

Active Directory

An authoritative restore is in essence an extension of the non-authoritative restore process. It requires all the steps of a non-authoritative restore before it can be initiated. The primary difference between the two is that an authoritative restore has the ability to increment the version number of the attributes of all objects in an entire directory, all objects in a subtree, or an individual object (provided that it is a leaf object) to make it authoritative in the directory.

As with a non-authoritative restore, once a DC is back online it will contact its replication partners to see what has changed since the time of the last backup. However, because the version number of the object attributes you wish to be authoritative will be higher than the existing instances of the attribute held on replication partners, the objects on the restored DC will appear to be more recent and therefore be replicated out to the rest of the DCs within the environment. 

Unlike a non-authoritative restore, an authoritative restore requires the use of a separate tool (ntdsutil.exe) to make it work. No backup utilities—including the native Windows 2000 utility—can perform an authoritative restore.

An authoritative restore should be used when human error is involved such as when an administrator has accidentally deleted a number of objects; that change has replicated to all the DCs, existence of those objects is removed from the domain; and the administrator is unable to easily recreate these objects.

An authoritative restore will not overwrite new objects that have been created after the backup was taken. It can only be carried out on objects from the configuration and domain contexts. Authoritative restores of schema naming contexts are not supported.

SYSVOL

By restoring the SYSVOL authoritatively, you are specifying that the copy of SYSVOL that was restored from backup is authoritative for the domain. Once the necessary configurations have been made, the local SYSVOL will be marked as authoritative and be replicated out to the other DCs within the domain.

Similarly to the Active Directory authoritative restore, this method will typically be used when human error is involved and the error has propagated out to other domain controllers. For example, when an administrator has accidentally deleted an object that resides in SYSVOL such as a Group Policy object.

The authoritative restore of SYSVOL does not occur automatically after an authoritative restore of Active Directory, additional steps are required.

The exact process for all these methods of restoration will be discussed later in this white paper.

Required Rights for Active Directory Restore

To restore the System State data, the person performing the procedure must be a Local Administrator.

Considerations for Restoring a DC From Backup

An obvious advantage of restoring a DC from backup instead of replication is the faster restore times available. To illustrate this, the graph in Figure 8 below shows the time taken to restore a DC from backup with databases varying from 500 MB to 2 GB) in size. The machine in the test was a Compaq Proliant 800, with a 400MHz processor, 256 MB RAM and a 4/8 GB DAT Drive.


Figure 8: Graph representing times to restore varying sized DIT files from backup.

 

Note: This graph represents the time taken to restore System State only; the restore of a good backup as defined earlier will take longer depending on the size of your system disk.

Useful Life of Active Directory Backup

Ensure that the backup you are restoring was taken within the tombstone lifecycle, by default this is set to 60 days. For more information on the useful life of an Active Directory backup, see the What is a Good Backup? section of this paper.

Restore Backup onto Different Hardware

It is possible to restore a DC onto different hardware. However there are some issues to consider before doing this.

  • Different HALs—By default, the Hal.dll is not backed up as part of System State, however the Kernel32.dll is. Therefore if you are trying to restore a backup onto a machine that requires a different HAL—to support a multiprocessor environment, for example—you will run into compatibility issues with the new HAL and the original Kernel32.dll. The only workaround for this situation is to explicitly copy the Hal.dll from the original machine and install it on the new machine. The limitation is that the new machine will now be bound to using only a single processor.
  • Incompatible Boot.ini File—If you backup and restore the boot.ini file, you may have some incompatibility with your new hardware configuration, resulting in a failure to boot. Before restore, ensure that the boot.ini file is correct for your new hardware environment.
  • Different Network or Video Cards—If your new hardware has a different video adapter or multiple network adapters, uninstall them before you restore data. When you restart the computer; the normal Plug and Play functionality will make the necessary changes.

 

Disk Space and Partition Configuration

In addition to the issues with restoring a DC onto different hardware, it is also important that the partitions on the new machine match those on the original machine. Specifically, all the drive mappings must be the same and the partition size must be at least the same as on the original machine.

Additional Considerations for an Authoritative Restore

In addition to the above considerations for restoring a DC from backup, the following points should be considered specifically when performing an authoritative restore.

 Impact on Group Membership

The impact on group memberships is the most significant issue stemming from the authoritative restore method of disaster recovery. You risk possible loss of group membership information.

Because group membership is a multi-valued attribute, and because of the way links, back links and deletions are dealt with in Active Directory, the results of an authoritative restore on the representation of group membership may vary. These variations are based on which objects replicate first after an authoritative restore: the User object or the Group object.

If the un-deletion of the user replicates first, then the group membership information of both the group (the members it contains) and the user (the groups he/she belongs to) will be represented correctly.

If the un-deletion of the group replicates first, the replication partners will drop the addition of the (locally) deleted user from the group membership. The only exception to this is the user’s primary group, which is always represented correctly both from the user and group reference.

Unfortunately there is no way to define which objects replicate first after an authoritative restore has been carried out. If your environment is affected by this situation, the only option you have is to modify the group membership attribute of the effected groups on the DC where the authoritative restore was carried out.

This issue stems not from the integrity of the restored data, but from the way in which the data is replicated. By looking at this DC, administrators can view the way the directory should look and take steps to replicate the accurate directory information out to the other DCs within the domain.

The best way to do this is to add a dummy user and then delete that same dummy user to/from each group that was involved in the authoritative restore.

The definition of “involved” in this context means any group that was either authoritatively restored itself or which had members restored who did not have that group defined as their “Primary Group.”

By doing this, you will force the correct group membership information to be replicated out from the source DC (the DC that the original authoritative restore was carried out on) and update the group membership information on its replication partners. These updated objects will reflect the correct memberships and will also correct the information represented in the Member of  tab of the restored user objects.

You MUST make sure that no additions are made to group membership (for the effected groups and users) on any of the other DCs within the environment.

If you do not adhere to this, you risk the accurate version of the directory (held on the DC where the restore took place) being corrupted by the incorrect membership information. Once this occurs, you must either update group membership manually or perform another authoritative restore of the objects using the verinc option, and perform the process defined above again.

 

Impact on Trusts and Computer Accounts

In Windows 2000, trust relationships and computer account passwords are negotiated at a specified interval (by default 30 days for trust relationships and computer passwords).

When using the authoritative restore method, previously used passwords for the objects in the Active Directory that maintain trust relationships and computer accounts may be restored.

In the case of trust relationships, this may impact communication with other domain controllers from other domains, manifesting in permissions errors when trying to access resources across domain boundaries. To rectify this, NTLM trust relationships to Windows 2000 or down level domains must be removed and recreated.

In the case of a computer account password, this could impact communications between the member workstation or server and a DC of its domain. This will usually manifest itself in a user on Windows NT or Windows 2000 machine having issues with authentication due to an invalid machine account.

To help with both the recreation of trusts and the resetting of computer account passwords, use the NETDOM utility included in the support tools on the Windows 2000 CD.

Note: A Windows NT 4.0 machine will change its password every seven days.

 

Bandwidth Considerations of SYSVOL Replication

When performing an authoritative restore of the Active Directory, you should also perform an authoritative restore of the SYSVOL. By doing this, you are telling the other DCs in the domain that the SYSVOL information on the restored DC is authoritative. As a result the entire SYSVOL contained on the restored DC will be replicated out to all other DCs in the domain (via replication partners).

The bandwidth associated with such replication will only be a consideration in a domain where there is an extensive use of large Group Policies and scripts. In addition, unlike Active Directory replication, FRS replication is not compressed between sites.

Steps Required for a Non-Authoritative Restore

To perform a non-authoritative restore using the Windows 2000 native backup utility, follow the steps in this section.

 

WARNING: If reinstalling the operating system, you may or may not join the machine to the domain and could give any name to the machine during setup of the operating system. Do not promote the machine to a domain controller. After reinstalling the operating system, go directly to Step 4 below.

 

  1. Reboot target system into Directory Services Restore Mode by pressing the F8 key upon system startup.
  2. Select Directory Services Restore Mode (Windows 2000 domain controllers only).
  3. Select the operating system that you wish to start in restore mode.
  4. Log in as Administrator (local system account, no domain selection is available).
  5. Run the Windows 2000 Backup utility and select the Restore Wizard button.
  6. Select the appropriate backup location and ensure that at least the System disk and System State containers are checked.
  7. Click the Advanced button and make sure you are restoring junction points (step 9). If you do not go through the advanced menu the restore process will not be successful.
  8. Select Original Location in the “Restore Files to” drop down box.
  9. In the Advanced Restore Options window, check the boxes for Restore security; Restore junction points, and restore file and folder data under junction points to the original location; Preserve existing volume mount points. See Figure 9 below for an illustration.


Figure 9.

10. Click on the Finish button.

11. Once complete click YES to restart the computer.

 

The system will now reboot and replicate any new information since the last backup with its replication partners.

By executing a non-authoritative restore on Active Directory, you automatically execute a non-authoritative restore of SYSVOL—no additional steps are required. For a primary restore of SYSVOL, make sure to check a fourth box in the Advanced Restore Options dialog next to:

When restoring replicated data sets, mark the restored data as the primary data for all replicas.

A primary restore is only required if the DC you are restoring is the only DC in the domain.

Steps Required for an Authoritative Restore

You can authoritatively restore the entire directory, a subtree, or an individual object (provided it is a leaf object) in the directory. The examples outlined below will detail how to restore both the entire directory and a subtree of a directory.

 

Authoritative Restore – Entire Directory

Authoritative restore of the entire directory is a major operation and should only be carried out after consultation with a Microsoft Support professional. Authoritative restore of the entire directory should not be performed if a DC is the only DC in the domain.

Follow the first 10 steps for a non-authoritative restore. When asked to restart the computer, refuse to do so. This is because you would have to restore the System State again to an alternate location. To continue, follow the steps below:

  1. Click on the Restore Tab
  2. Ensure Alternate Location is selected in the “Restore Files to” drop down list.

Selecting Alternate Location will restore the System State to an alternate location. You need not restore the system disk to an alternate location, so you should check the box only for System State. Restoring System State to an alternate location will only restore the SYSVOL, boot files and registry to an alternate location (not Active Directory). This is done to enable an authoritative restore of the SYSVOL. Once the entire directory is restored, you could delete the files in the alternate location.

 

  1. When the restore process has finished, close the backup application.
  2. Open a command prompt and type ntdsutil, press Enter.
  3. At the next prompt, type authoritative restore and press Enter.
  4. At the next prompt, type restore database.
  5. At the “Authoritative Restore Confirmation Dialog” box, click OK.
  6. Type Quit, and repeat until you exit out of the application.
  7. Restart the server.

The server is now the authoritative DC for the domain. Changes will be replicated to the other DCs within the environment.

  1. Once the system has been rebooted and after the SYSVOL share is published, ( it may take a few minutes before the SYSVOL share and its sub-folders appear on the domain controller) copy the required files/folders from the SYSVOL directory that was copied to the alternate location to the original location. By doing this, the files that were overwritten, are replicated out to the other domain controllers, so that the SYSVOL is the same as that which was present at the time of backup.

Below is an example of copying the SYSVOL from the alternate location to the original location. Depending on your system, your drive and folder information may vary.

Copy the contents of the scripts directory from:

c:\<Alternate Sysvol Location>\sysvol\c_\winnt\Sysvol\Domain\scripts\

And add it to:

c:\Winnt\SYSVOL\Sysvol\domain\scripts\

Copy the contents of the policies directory from:

c:\<Alternate Sysvol Location>\sysvol\c_\winnt\Sysvol\Domain\policies\

And add it to:

c:\Winnt\SYSVOL\Sysvol\domain\policies\

By restoring the SYSVOL authoritatively, the files on the restored DC will be authoritative for the domain and will replicate to other DCs. Changes made to any policy after the backup will be lost.

For example, a Group Policy by the name of Finance Policy existed at the time of the last backup, and was referenced by a folder in the SYSVOL directory as:

C:\WINNT\SYSVOL\Sysvol\Domain.com\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}

However, shortly after the last backup, an administrator edited the Finance Policy, and although the properties of the policy changed, the GUID of the GPO remained the same. As a result, the policy was still referenced by the same directory name {31B2F340-016D-11D2-945F-00C04FB984F9}.

When it came time to authoritatively restore the directory, the folder {31B2F340-016D-11D2-945F-00C04FB984F9} from the alternate SYSVOL location was copied to the original SYSVOL location. This replaced the old folder and thus the changes the administrator had made after the backup were lost. This step is necessary, however, to maintain the synchronization between Active Directory and SYSVOL.

Authoritative Restore–Subtree

This method of authoritative restore will restore specific components of Active Directory and mark them as authoritative for the directory. It is expected that this will be the most common form of authoritative restore because there are few occasions when the entire directory needs to be restored.

To perform the authoritative restore of a subtree, follow the steps in the Authoritative Restore of the Entire Directory section of this paper, but replace Step 6 with the step below:

6. Type RESTORE SUBTREE <path> E.g. RESTORE SUBTREE OU=Sales,OU=Sydney,DC=Whitepaper,DC=com

This server is now the authoritative Active Directory domain controller for the path specified. Changes will be replicated out to the other DCs within the domain.

Because you are restoring only a portion of the Active Directory, you need not perform an authoritative restore of the SYSVOL. However, if the subtree or object that was authoritatively restored contained elements from the SYSVOL, such as a Group Policy, you should also restore that portion of the SYSVOL authoritatively.

If this is required you must complete Step 10  in the Authoritative Restore of the Entire Directory section of this paper.

 

Note: You can authoritatively restore an individual object in the directory only if it is a leaf object. The object and its children are authoritatively restored together.

Even if you restored a subtree of the directory authoritatively, you have to still undergo a complete restoration of the whole directory, using the non-authoritative method, before you can use the ntdsutil tool to mark that subtree authoritative.

Verification of a Successful Restore

Although there could be a number of tests to see if a domain controller is functioning correctly after a restore, the following are two of the basic tests one should perform to verify the success of a restore.

Reboot in Normal Mode

If the domain controller is able to successfully boot into normal mode, the directory is able to successfully initialize. This is true especially if the DC wasn’t able to boot to normal mode before it was restored.

Check the Directory Service Event Log for Any Error Messages

You should check the Directory and System event logs for any error messages pertaining to the restore process.

Check if Domain Controller is Able to Authenticate With Its Neighbors

Use the repadmin tool to verify this. This is primarily used to check if the restored domain controller can authenticate with another domain controller and replicate in changes to update its copy of the directory. Being able to do so is critical to its task of being a domain controller.

 

The first check is to obtain the inbound partners for the restored domain controller. The options used are:

 

D:\>repadmin /showreps

testlab\test-machine3

DSA Options : (none)

objectGuid  : a07b44e6-76ba-4f03-80c9-5a4a256347bb

invocationID: 6037d0c3-2194-4f27-95ed-578b38861414

 

==== INBOUND NEIGHBORS ======================================

 

CN=Schema,CN=Configuration,DC=testdom,DC=nttest,DC=microsoft,DC=com

    testlab\test-machine1 via RPC

        objectGuid: 465848f9-5446-4176-a504-59629c7a8fd8

        Last attempt @ 2000-09-08 14:10.16 was successful.

 

CN=Configuration,DC=testdom,DC=nttest,DC=microsoft,DC=com

    testlab\test-machine1 via RPC

        objectGuid: 465848f9-5446-4176-a504-59629c7a8fd8

        Last attempt @ 2000-09-08 14:10.16 was successful.

 

DC=testdom,DC=nttest,DC=microsoft,DC=com

    testlab\test-machine1 via RPC

        objectGuid: 465848f9-5446-4176-a504-59629c7a8fd8

        Last attempt @ 2000-09-08 14:10.16 was successful.

 

In the above case, the restored DC is test-machine 3. The test has been run on the restored machine, but the tool can be used on any domain controller. In the example above, test-machine 1 is its inbound partner.

 

The next command attempts to sync the schema naming context on the restored domain controller with changes from its inbound neighbor:

 

D:\>repadmin /sync <naming context> <destination DSA> <source DSA GUID>

 

D:\>repadmin /sync "CN=Schema,CN=Configuration,DC=testdom,DC=nttest,DC=microso

ft,DC=com" test-machine3 465848f9-5446-4176-a504-59629c7a8fd8

Sync from 465848f9-5446-4176-a504-59629c7a8fd8 to test-machine3 completed successfully.

 

You could then try the same command to sync the inbound neighbor with the restored domain controller to check if outbound replication is working:

 

D:\>repadmin /sync "CN=Schema,CN=Configuration,DC=testdom,DC=nttest,DC=microso

ft,DC=com" test-machine1 a07b44e6-76ba-4f03-80c9-5a4a256347bb

Sync from a07b44e6-76ba-4f03-80c9-5a4a256347bb to test-machine1 completed successfully.

 

If the sync is successful, then the domain controller is able to authenticate with a neighboring domain controller and receive changes from it. If not, then it could mean that the machine account password on the restored domain controller is old, and the inbound partner cannot authenticate the restored domain controller. You should use the netdom tool to set the password on the inbound replica domain controller and restored domain controller to be the same. For more information on the netdom tool, see the Windows 2000 Support Tools on the Windows 2000 CD.

Additional Verification for an Authoritative Restore

In addition to the above steps,  you should verify that the objects you authoritatively restored appear in the directory.

You can also use the Repadmin command-line tool to verify that the authoritative restore was successful by checking the version number increase on the directory or subtree. Do this by carrying out the /showmeta command followed by the exact distinguished name of the directory or subtree that you authoritatively restored.


To recover the global catalog server you can either restore it from backup, or assign a new GC to compensate for the loss of the original.

Steps Required for the Restore of a Global Catalog Server

Refer to the section Restore From Backup in this paper. Restoring from backup is the only way that a DC (that was functioning as a GC at the time of backup) can automatically be restored to the role of GC. Restoring a DC using the Restore Through Reinstallation method will not automatically reinstate the GC role. While restoring a GC from backup, keep in mind that the time required is greater than for restoring a normal domain controller, in a multi-domain environment.

Steps Required to Assign a New GC

As there are no real detrimental effects in configuring multiple GCs you may wish to create a new GC in your environment if you anticipate an extended downtime for the GC that has failed. This would be particularly relevant if the user community that was being serviced by the original GC no longer had access to a GC, or if the requirement for the GC service was significant in your environment, such as if you were running Exchange 2000.

To enable a new GC, follow the steps outlined in the To enable or disable a global catalog section in the Windows 2000 Server Help.

Note: Having multiple GCs in a forest increases the availability of the system at the expense of increased replication traffic and database size.  If you do reinstall the failed DC and maintain its role as a GC, you may wish to remove any additional GCs you may have configured during its absence.

 


Once an operations master (OM) becomes unavailable, the only way the role can be reinstated is by carrying out one of the following procedures:

  • Restore the failed operations master from backup.

·         Seize the role to another DC within the environment. This must be done only when the original role holder will not be restored from backup.

Note: Restoring an OM server via re-installation will not restore its original role status. After re-installation however, the role could be gracefully transferred back from another DC holding the role.

OMs are also referred to as FSMOs (Flexible Single Master operation) role holders.

Seizing an Operations Master Role

Seizing, or force transfer as it is sometimes referred to, is a process that is carried out without the cooperation of the original role holder. In other words, when the original role holder has suffered a disaster, you can seize the role, forcing it to be moved to another DC within the domain/forest.

Although the process required to seize an OM role is similar for all 5 roles, the considerations around their seizure differ. These are discussed later in the white paper.

Note: The graceful transfer of an OM role will not be discussed in this paper. If this process can be carried out, it means the original role holder is active and is not involved in a disaster recovery situation.

Steps Required to Seize an Operations Master

The DC that will be seizing the role must be fully synchronized with respect to updates performed on the previous role holder. This is why it is strongly recommended that the standby role holders specified in your environment be within the same site and direct replication partners with the existing role holder.

To ensure that the standby OM that you have selected for your environment is the most appropriate, use the Repadmin tool (included as part of the support tools on the Windows 2000 CD) to check its status.

To demonstrate this, suppose that server SYD01 is the operations master of domain Whitepaper.com.au, SYD02 is the specified standby OM, and MEL01 is the only other DC of domain Whitepaper.com.au.

Type the following two commands:         

C:\>repadmin /showvector dc=whitepaper,dc=com,dc=au  SYD02.whitepaper.com.au
Sydney\SYD01               @ USN 4023
Melbourne\MEL01           @ USN 4087

C:\>repadmin /showvector dc=whitepaper,dc=com,dc=au MEL01.whitepaper.com.au
Sydney\ SYD01               @ USN 4018
Sydney\SYD02                @ USN 5017

Because SYD01 was the originating operations master, these are the only Update Sequence Numbers (USN) we are concerned with. The USN on SYD02 (4023) is higher than the USN on MEL01 (4018), therefore SYD02 is more up-to-date than MEL01 and is the more appropriate candidate to assume the role.

Now that you have determined the best candidate to take on the OM role, follow the steps below to seize the OM role:

  1. Open a command prompt.
  2. Type NTDSUTIL.
  3. At the ntdsutil prompt, type: roles
  4. At the FSMO maintenance prompt, type: connections
  5. At the server connections prompt, type: connect to server <FQDN of Server>.  For example: connect to server syd02.whitepaper.com.au
  6. At the server connections prompt, type: quit
  7. At the FSMO maintenance prompt, type: seize <operations master>. For example: seize schema master
  8. At the popup window, click Yes to verify the seizure.
  9. At the FSMO maintenance prompt, type: quit
  10. At the ntdsutil prompt, type: quit

Recovery of the Schema Master

Before deciding whether to seize the schema master role, consider the following:

Impact on Environment

The first thing you must understand is the impact that a failed schema master will have on your environment. The main issues you will see are:

Unable to Make Changes to the Schema

When the schema master is unavailable, changes to the Schema will not be possible. If changes to the schema are attempted, a message will be displayed such as the one illustrated in Figure 10 below.


Figure 10.

Note: The exact message displayed will depend on the method you are using to make the change.  The message above was experienced when attempting to install Exchange 2000 while the schema master was unavailable.

In most production environments, changes to the schema should be infrequent and planned well in advance, so a schema master outage should not pose any immediate problems.

Considerations for Performing a Seizure On a Schema Master

The primary consideration for deciding to seize the schema master role is the longevity or permanence of the outage. Because of the chance of duplicate schema alterations being propagated throughout the environment, a seizure of the schema master role should only be carried out if the failed role holder will never come back online.

In most environments, due to both the infrequent requirement for the schema master role and the implications of a seizure, it is likely that you will live with the outage for the period of time it takes to restore the DC holding the role. However, if for some reason you require the immediate use of the schema master role or the original role holder will never be brought back into the Windows 2000 environment, a seizure can be carried out.

Recovery of the Domain Naming Master

Before deciding whether to seize the domain naming master role, consider:

Impact on Environment

The first thing you must understand is the impact that a failed domain naming master will have on your environment. The main issues you will see are:

Domains Cannot Be Added to the Forest

When this master is not available, domains cannot be added to the Active Directory. Figure 11 below shows the error message received when attempting to run DCPROMO to add a domain when the domain naming master (Syd01.Whitepaper.com) is not available:


Figure 11.

In a stable production environment, this should not be a significant issue. But in a development, testing or growing production environment, the failure of this master can halt growth until it is restored or seized.

Domains Cannot be Removed From the Forest

A similar message is received when attempting to run DCPROMO to remove a domain when the domain naming master (Syd01.Whitepaper.com) is not available. In that case, for example, the message will read: Binding to serve syd01.Whitepaper.com using the supplied credentials failed. The RPC server is unavailable. Again this should be of limited concern to the vast majority of environments.

Considerations for Performing a Seizure on a Domain Naming Master

The primary consideration for deciding to seize the domain naming master role is the longevity or permanence of the outage. Because of the chance of duplicate domain alterations being propagated throughout the environment, a seizure of the domain naming master role should only be carried out if the failed role holder will never come back online.

In most environments, due to both the infrequent requirement for the domain naming master role and the implications of a seizure, it is likely that you will deal with the outage for the period of time it takes to restore the DC holding the role. However, if for some reason you require the immediate use of the domain naming master role or the original role holder will never be brought back into the Windows 2000 environment, a seizure can be carried out.

Recovery of the RID Master

Before deciding whether to seize the RID master role, consider:

Impact on Environment

The first thing you must understand is the impact that a failed RID master will have on your environment. The main issues you will see are:

Inability to Create Security Object

The primary issue that you will face in this situation is the inability to add any new security objects, such as users, groups and computers to the domain, resulting in the error message: Windows cannot create the object because: The directory service has exhausted the pool of relative identifiers.

In addition, you will receive an error in the event log with an event ID of 16645 on the DC in which you attempted to create the object. The error will explain that the maximum number of account identifiers allocated to this domain controller have been assigned.

This issue will only surface once the RID pool (512 individual RIDs in a pool) on each of the domain controllers within the domain are depleted (as objects can be created on any DC within the domain).

Therefore if you have a domain with 5 remaining DCs you could still theoretically have 2560 (5 x 512) RIDs available to you once the RID master fails. In a typical environment this would provide you with ample RIDs for the creation of security principles until the RID master was repaired/restored using the methods discussed earlier. If however you were in the middle of a mass creation of security objects or inter-domain object moves that required more RIDs than you have in your existing pools, a role seizure could be performed.

Failure to Move Security Principles Between Domains

You will not be able to move security principals to a new domain if the RID master in the target domain is not operational. Unlike the above issue, all cross-domain moves would fail immediately, due to the unavailability of the RID master.

Considerations for Performing a Seizure on a RID Master

Performing a seizure on a RID master is not something you want to do without due consideration. Because of the risk of duplicate RIDs on the network, the server that originally hosted the RID master role should never be brought back online if the seizure is performed. Instead, the original role holder should be completely rebuilt before being introduced back into the production Windows 2000 environment.

If, after understanding all the implications of a RID master seizure your situation still requires you to have an active RID master immediately, follow the steps given to seize an operations master role.

Recovery of the PDC Emulator

Before deciding whether to seize the PDC emulator role, consider:

Impact on Environment

The first thing you must understand is the impact that a failed PDCE will have on your environment. The main issues you will face are:

Mixed Mode Environment

If the PDCE role suffers a disaster in a mixed mode environment where Windows NT Server 4.0 backup domain controllers are active, you will witness the same issues you did in Windows NT Server 4.0 when the native PDC was unavailable For example, if you try to administer the NT 4.0 domain using the native NT 4.0 User manager for domains tool, you will receive the error message: Could not find domain controller for this domain.

If you try to administer the Windows NT Server 4.0 domain using the native Server Manager tool, you will receive the error message: Cannot find the Primary DC for MYDOMAIN. You may administer this domain, but certain domain-wide operations will be disabled.

Native Mode Environment

The issues that occur in a native mode environment will also be present (on the Windows 2000 side) in a mixed mode environment. The primary issues you will face when the PDCE suffers a failure in a Windows 2000 environment are:

Possible increase in incidents of logon failure. If a user password is reset, for example when the user forgets the password and an administrator resets it on a DC that is not the authentication DC, that user must wait until the password is replicated to the authentication DC before he or she is able to logon.

Although the user’s local authentication DC will try and contact the PDCE to see if the password has changed since the last replication, it will fail, because the PDCE is offline. Therefore, the authenticating DC will have no choice but to resort to its local copy of Active Directory, which will still reflect the original forgotten password.

In this situation you will see no obvious errors, either from the client side or the authenticating DC.

Although this can be an issue, the problem can be easily overcome by making the password alteration on the users authenticating DC.

Error when attempting to edit a Group Policy Object. To help ensure that no data loss occurs during the editing of a GPO, the default DC for changes to a GPO is the one holding the PDCE role. Therefore if the PDCE is unavailable you will receive the message shown in Figure 12 below when trying to edit a GPO within the domain.


Figure 12.

 

This is a minor issue, and can be quickly rectified by selecting one of the options provided. A brief description of these options is listed below.

  • The one with the operations master token for the PDC emulator.

This option is obviously not possible when the role is unavailable.

  • The one used by Active Directory Snap-ins.

This uses the domain controller that Active Directory management snap-ins are using. It is the recommended option when the PDCE is unavailable.

  • Use any available domain controller.

The third, and least desirable option, allows the Group Policy snap-in to choose any available domain controller. When this option is selected it is likely that a domain controller in the local site will be selected.

Note: The changes made here are only made for the single edit. In other words you will be asked this question every time you try to edit a GPO when the PDCE master is unavailable.

Considerations for Performing a Seizure on a PDC Emulator Master

The role of the PDCE master is not quite as critical as those previously mentioned. Thus, the act of seizing the role does not have the ramifications of the others. If you choose to seize the PDCE role, it is not necessary for the original role holder to be completely rebuilt before it can participate in the Windows 2000 environment again.

As a result, the decision to seize the PDCE role will have fewer implications to your environment and would generally be considered as standard practice in the event of a PDCE failure, particularly in a mixed mode environment.

The only real issue to consider when seizing the PDCE role is if you are functioning in a mixed mode environment with Windows NT Server 4.0 BDCs. In order for the BDCs to be aware of the changes they will perform, a full synchronization of the Built-in groups with the new PDCE is required.

Note: As the issues associated with the seizure of the PDCE role have less impact on the environment, Microsoft has allowed you to seize this role through the Active Directory Users And Computers snap-in. To do this, go through the steps you would normally go through for a Role Transfer. To do this, refer to the Windows 2000 Resource Kit Distributed Systems Guide. When the originating PDCE is unavailable, you will see the dialog shown in Figure 13 below.


Figure 13.

Click OK and the role will be seized.

Note: A forced transfer is equivalent to a seizure.

In addition, you can seize the role using ntdsutil. To do this, follow the steps outlined in the Steps Required to Seize an Operations Master section of this white paper. Replace Step 7 with:

7. At the FSMO maintenance prompt, type: seize pdc

Recovery of the Infrastructure Master

Before deciding whether to seize the infrastructure master role, consider:

Impact on Environment

The first thing you must understand is the impact that a failed infrastructure master will have on your environment.

The effect of an infrastructure master failure in your environment will be limited. It will not be visible to end users and will only affect administrators if there has been considerable group manipulation. These group manipulations will typically be in the form of user additions and or user renames. The only effect of an infrastructure master being down in this situation is a delay in these changes being referenced through the Active Directory management snap-ins.

There are very few occasions where your environment could not deal without an infrastructure master for the period it takes to repair/recover the original. However, if you foresee a very long outage a seizure of the role is recommended.

Considerations for Performing a Seizure on the Infrastructure Master

The primary consideration around the seizure of the infrastructure master role is to ensure that the new DC is not a GC server, but that it has a good connection to a GC, ideally within the same site.

 


This paper is an attempt to bring together information regarding backup and restore of the Active Directory to help the administrator better understand the issues involved. Using this paper, an administrator can develop a disaster recovery plan for his Windows 2000 environment.

For More Information

For the latest information on Windows 2000 Server, check out our Web site at  http://www.microsoft.com/windows2000 and the Windows 2000/NT Forum at http://computingcentral.msn.com/topics/windowsnt.

 


Database Integrity Testing

 

There is a limited amount of integrity testing that can be performed against the database, both at the ESE (extensible storage engine) layer and at the DB layer. The layers are illustrated in Figure 14 below.

 

 

 

 

 

Figure 14. Active Directory functional layers

 

These tests could be time consuming. Depending on the urgency of the situation, you could either perform them or go ahead with a restore from backup. Some considerations before performing these tests, include:

  1. If you can’t boot into Directory Service Restore mode, then you will not be able to perform these tests so you could ignore this section.
  2. The error message in the event log, or the corresponding KB article, should tell you more on the nature of the problem and if an integrity check would resolve the issue.
  3. This process could be time consuming. It depends on the size of your database, but could run into hours for large databases (1 GB and above).

 

The good thing about these tests is that you would not lose any data because of performing them (unlike a repair, which is described in Appendix II). These tests should be performed in the following order:

 

  1. Soft recovery of the log files.
  2. File integrity.
  3. Semantic analysis.

 

You have to be in Directory Services restore mode to continue further.

Performing a Soft Recovery of the Log Files

In the event that the power source failed unexpectedly, you can perform a "soft" recovery of the log files. Because transaction data is written to the log files before it is written to the data files, you can re-run the log files to reproduce the effects the transactions would have had if they were made to the data file. The Recover command in the Ntdsutil command line tool is used to perform this "soft" recovery. All of the log files are scanned to ensure that all committed transactions are made to the database file. Soft recovery is performed automatically when the domain controller starts if the previous shutdown was not clean.

 

Following is sample output of running the Recover command:

 
C:\>ntdsutil
ntdsutil: files
File maintenance: Recover
Executing Command: C:\WINNT\System32\esentutl.exe /r /8 /o /l"C:\WINNT\NTDS" /s"
C:\WINNT\NTDS" /!10240

 

Initiating RECOVERY mode...
       Log files: C:\WINNT\NTDS
    System files: C:\WINNT\NTDS

 

Performing soft recovery...

 

operation completed successfully in 4.717 seconds.
 
Spawned Process Exit code 0x0(0)

 

If recovery was successful, it is recommended
 you run semantic database analysis to insure
 semantic database consistency as well.

 

Ensuring File Integrity

By using the integrity command, you can detect low level (binary level) database corruption. The integrity command reads every byte of the data file. Therefore, depending upon the size of your database, the process might take a considerable amount of time.

The integrity command also makes sure that the correct headers exist in the database itself and that all of the tables are functioning and are consistent. This is used while in Directory Services Restore mode. If errors are encountered, they are recorded on the log files.

The length of time for the integrity command to complete its operation depends on the type of hardware you are using and the size of your directory database. (In testing environments, the speed of 2 gigabytes (GB) per hour was considered to be normal). However, when you carry out the command, an online graph displays showing the percentage completed.


Following is a sample run of an integrity check by using the ntdsutil tool:

 

C:\>ntdsutil
ntdsutil: files
file maintenance: Integrity
Opening database .
Executing Command: C:\WINNT\System32\esentutl.exe /g "C:\WINNT\NTDS\ntds.dit" /!
10240 /8 /v /x /o
Initiating INTEGRITY mode...
        Database: C:\WINNT\NTDS\ntds.dit
  Temp. Database: INTEG.EDB
failed to get 515126 buffers
checking database header
checking database integrity
Scanning Status  ( % complete )
0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
                checking SystemRoot
                SystemRoot (OE)
                SystemRoot (AE)
        checking system table
                MSysObjectsShadow
                MSysObjects
.               Name
                RootObjects
                rebuilding and comparing indexes
        checking table "datatable" (6)
                checking data
.......................         checking long value tree (24)
...             checking index "PhantomIndex" (125)
.               checking index "INDEX_000901FD" (122)
                checking index "INDEX_000900DE" (121)
                checking index "INDEX_00090089" (120)
                checking index "INDEX_00090573" (119)
                checking index "INDEX_00090073" (118)
                checking index "INDEX_00090571" (117)
                checking index "INDEX_0009056C" (116)
                checking index "INDEX_00090553" (115)
                checking index "INDEX_0009013A" (114)
                checking index "INDEX_00090138" (113)
                checking index "INDEX_00090330" (112)
                checking index "INDEX_00090030" (111)
                checking index "INDEX_00090013" (110)
                checking index "INDEX_00000013" (109)
                checking index "INDEX_0000000B" (108)
                checking index "INDEX_00000007" (107)
                checking index "INDEX_00000003" (106)
.               checking index "INDEX_00150003" (105)
                checking index "LCL_ABVIEW_index00000409" (104)
                checking index "INDEX_00090363" (103)
                checking index "INDEX_00090303" (102)
                checking index "INDEX_00090290" (101)
                checking index "INDEX_000901FF" (100)
                checking index "INDEX_000900DD" (99)
                checking index "INDEX_00090085" (98)
                checking index "INDEX_00090057" (97)
                checking index "INDEX_0009001C" (96)
                checking index "INDEX_000201CC" (95)
.               checking index "INDEX_000200D2" (94)
                checking index "INDEX_0002000D" (93)
                checking index "INDEX_0000002A" (92)
                checking index "INDEX_00000004" (91)
                checking index "NC_Acc_Type_Name" (90)
                checking index "PDNT_index" (89)
..              checking index "INDEX_00090001" (88)
.               checking index "INDEX_000901F6" (85)
                checking index "INDEX_000902EE" (84)
                checking index "INDEX_000904E1" (83)
                checking index "INDEX_000201D5" (80)
                checking index "INDEX_000902BB" (77)
                checking index "INDEX_000903B4" (76)
                checking index "INDEX_000200A9" (75)
                checking index "INDEX_0009039D" (74)
                checking index "INDEX_0009039A" (73)
                checking index "INDEX_00090098" (72)
                checking index "INDEX_00090395" (71)
                checking index "INDEX_0009028F" (69)
                checking index "INDEX_00090582" (66)
                checking index "INDEX_00020078" (65)
.               checking index "INDEX_00020073" (62)
                checking index "INDEX_00090171" (60)
                checking index "INDEX_00090167" (58)
                checking index "INDEX_00090062" (56)
                checking index "INDEX_00090261" (55)
                checking index "INDEX_0009014E" (52)
                checking index "INDEX_0009014D" (51)
                checking index "INDEX_0009014C" (50)
                checking index "INDEX_00090147" (49)
                checking index "INDEX_00090141" (48)
                checking index "INDEX_00090140" (47)
                checking index "INDEX_0009012E" (42)
                checking index "INDEX_00020013" (39)
.               checking index "INDEX_0009030E" (36)
                checking index "INDEX_00090008" (32)
                checking index "INDEX_00090202" (25)
                checking index "Ancestors_index" (13)
.               checking index "DRA_USN_CREATED_index" (12)
                checking index "DRA_USN_index" (11)
.               checking index "del_index" (10)
                checking index "INDEX_00090002" (9)
..              checking index "NC_Acc_Type_Sid" (8)
                checking index "INDEX_00090092" (7)
                rebuilding and comparing indexes
        checking table "hiddentable" (16)
                checking data
                rebuilding and comparing indexes
        checking table "link_table" (14)
                checking data
                checking index "backlink_index" (15)
                rebuilding and comparing indexes
        checking table "MSysDefrag1" (123)
                checking data
                checking index "TablesToDefrag" (124)
                rebuilding and comparing indexes
        checking table "sdproptable" (17)
                checking data
                checking index "clientid_index" (19)
                checking index "trim_index" (18)
                rebuilding and comparing indexes

 

integrity check completed.
operation completed successfully in 13.640 seconds.
Spawned Process Exit code 0x0(0)

 

If integrity was successful, it is recommended
 you run semantic database analysis to insure
 semantic database consistency as well.

 

Ensuring Database Integrity

The Ntdsutil tool includes a semantics checker that can be invoked by selecting the semantic database analysis option. The role of the semantic checker is to check the integrity of the contents of the Active Directory database.

The tool is run during Directory Service Restore mode. Errors are written into dsdit.dmp .xx log files. A progress indicator indicates the status of the check.

The following are examples of the functions that can be performed:

  • Reference count check. Counts all of the references from the data table and the link table to ensure they match the listed counts for the record. (For more information about data and link tables, see the section on Active Directory Data Storage in the Distributed Systems Guide of the Windows 2000 Resource Kit.) This also ensures that each object has a GUID, distinguished name and nonzero reference count. For a deleted object, the check ensures that the object  has a deleted time and date, but does not have a GUID or a distinguished name.
  • Deleted object check. Ensures the object has a deleted time and date, and a special relative distinguished name.
  • Ancestor check. Checks to determine if the current distinguished name tag (DNT) is equal to the ancestor list of the parent and the current DNT.
  • Security descriptor check. Checks for a valid descriptor, ensuring that it has a control field, and that the discretionary access control list is not empty. If there are deleted objects without a discretionary control access list, a warning is printed.
  • Replication check. Checks the UpToDate vector in the directory partition head to ensure that the correct number of cursors exists. It also checks to see that every object has property metadata vector. For the instance type of the object, it checks the metadata, the up-to-dateness vectors, the sub references, and partial attribute.

 

Performing Semantic Database Analysis

  1. Back up Active Directory. Windows 2000 Backup natively supports backing up Active Directory while online. This occurs automatically when you select the option to back up everything on the computer in the Backup Wizard, or independently by selecting to back up the System State in the wizard.
  2. Restart the domain controller, select the appropriate installation from the startup menu, and press F8 to display the Windows 2000 Advanced Option Menu.
  3. Select Directory Services Restore Mode, and then press enter. To start the boot process again, press Enter.
  4. Log on by using the Administrator account with the password defined for the Local Administrator account in the offline SAM.
  5. From the Start menu, point to Programs and Accessories, and then click Command Prompt.
  6. At the command prompt, type ntdsutil and then press Enter.
  7. Type Semantic database analysis, and then press Enter.
  8. Type Verbose on, and then press Enter. This displays the Semantic Checker.
  9. Type go, and then press Enter. The Semantic Checker is started without repairing any errors it encounters. To repair the errors encountered, select the Go Fixup option.
  10. Type quit, and then press Enter. To return to the command prompt, type quit again.

Following is a sample of running the semantic database analysis option with verbose mode turned on:

 

C:\>ntdsutil
ntdsutil: Semantic database analysis
semantic checker: Verbose on
Verbose mode enabled.
semantic checker:  Go
Opening database .
....Done.

 

Getting record count...2371 records
Writing summary into log file dsdit.dmp.0
Records scanned:       2300
Processing records..Done.

 


Database Repair

Repairing the Active Directory should be the last resort. If a valid backup is available, you should always restore that backup. There is no guarantee that repairing Active Directory will work. In fact there is a risk that this process will result in further loss of data. In addition this could be a very time consuming process.

 

After doing a repair, a semantic integrity check of the database must be done as outlined in Appendix I.

 

To do a repair of the Active Directory, use the repair option in the ntdsutil tool:

 

C:\>ntdsutil
ntdsutil: files
ntdsutil: repair
 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Database: Location, Move, and Offline Defragmentation

Determining the Location of Database Files and Log Files

To find out the location of the data files, log files, and working directory, you can use the info command, which is part of the ntdsutil command-line tool. This command does the following:

  • Analyzes and reports the free space for all disks installed on the computer.
  • Reads the registry keys that contact the location of the Active Directory files and reports their values.
  • Reports the sizes of the data file, working directory, and log file.

Following is sample output from running the info command:

C:\>ntdsutil
ntdsutil: files
file maintenance: Info

 

Drive Information:

 

        C:\ NTFS (Fixed Drive  ) free(2.9 Gb) total(3.9 Gb)

 

DS Path Information:

 

        Database   : C:\WINNT\NTDS\ntds.dit - 12.1 Mb
        Backup dir : C:\WINNT\NTDS\dsadata.bak
        Working dir: C:\WINNT\NTDS
        Log dir    : C:\WINNT\NTDS - 40.0 Mb total
                        res2.log - 10.0 Mb
                        res1.log - 10.0 Mb
                        REPAIR.TXT - 0.0 Kb
                        edb00001.log - 10.0 Mb
                        edb.log - 10.0 Mb

 

Moving the Database

When you move the database from one location to another location on the disk, you can use the Ntdsutil command-line tool in Directory Services Restore mode. For example, you might need to move a log file or the Ntds.dit file to another drive if corruption occurs on the previously assigned drive or directory. Specifically, the move db to %s command moves the Ntds.dit data file to the new directory specified by the "%s" and updates the registry keys so that the directory service restarts by using the new location. It is highly recommended that you make a backup before and after the move or else the restore operation does not retain the new file location.

You can also move the log files from one location to another. Specifically, the Move logs to %s command moves the directory service log files to the new directory specified by %s and updates the registry keys so that the directory service restarts by using the new location.

Offline Defragmentation

Active Directory automatically performs online defragmentation of the database at certain intervals (by default, every 12 hours) as part of the garbage collection process. Online defragmentation does not reduce the size of the database file (Ntds.dit), but instead optimizes data storage in the database and reclaims space in the directory for new objects. It prevents data storage problems. Performing offline defragmentation creates a new, compacted version of the database file. Depending on how fragmented the original database file was, the new file might be considerably smaller. To perform offline defragmentation:

  1. Back up Active Directory.
  2. Restart the domain controller, select the appropriate installation from the startup menu, and press F8 to display the Windows 2000 Advanced Options Menu.
  3. Select Directory Services Restore Mode, and then press Enter. To start the boot process again, press Enter.
  4. Log on by using the local Administrator account
  5. Click Start, point to Programs and then to Accessories, and then click Command Prompt.
  6. At the command prompt, type ntdsutil, and then press Enter.
  7. Type files, and then press Enter.
  8. Type info, and then press Enter. This displays current information about the path and size of the Active Directory database and its log files. Note the path.
  9. Establish a location that has enough drive space for the compacted database to be stored.
  10. Type the following, in which <drive> and <directory> is the path to the location that you established in the previous step. and then press Enter:

                compact to <drive>:\<directory>

 

Note: You must specify a directory path. If the path contains any spaces, the entire path must be surrounded by quotation marks (for example, compact to "c:\new folder").

A new database named Ntds.dit is created in the path that you specified.

  1. Type quit, and then press Enter. To return to the command prompt, type quit again.

12.   Copy the new Ntds.dit file over the old Ntds.dit file in the current Active Directory database path that you noted in Step 8.

  1. Restart the computer normally.

 

 

 

 


Useful tools for Active Directory Disaster Recovery

 

The following tools have been compiled as a recommended set of tools that are useful in a disaster situation.

Note: To obtain support tools, do one of the following:

·         Browse the Windows 2000 Server CD for the \support\tools folder and run the installer program from there.

Or

    1. Click Start, point to Programs, point to Administration Tools, click Configure Your Server.
    2. In the Configure Your Server Wizard, click Advanced, then click Support Tools. Follow the instructions.

Windows 2000 Domain Manager (NetDom.exe)

This tool enables administrators to manage Windows 2000 domains and trust relationships from the command line. This tool is included in the support tools on the Windows 2000 CD.

Replication Diagnostics Tool (Repadmin.exe)

This tool allows the administrator to view the replication topology as seen from the perspective of each domain controller. In addition, RepAdmin can be used to force replication events between domain controllers and to view both the replication metadata and up-to-datedness vectors. This tool is included in the support tools on the Windows 2000 CD.

Active Directory Diagnostics Tool (Ntdsutil.exe)

This tool is included with the Windows 2000 Server. It can be accessed from the command line by typing ntdsutil. The use of this tool is described throughout this white paper.

Windows 2000 Backup Utility (Ntbackup.exe)

This is the default backup application available with Windows 2000. You could use the tool to perform periodic backups of the System State and system drive. Details of tool usage are in the paper.

  • In Windows 2000, Backup is located in System Tools on the Accessories menu. To open a system tools item, click Start, point to Programs, point to Accessories, point to System Tools, and then click the appropriate icon.

 

ADSI Edit

ADSI Edit is a Microsoft Management Console (MMC) snap-in that acts as a low-level editor for Active Directory. Using Active Directory Service Interfaces (ADSI),  provides a means to add, delete, and move objects within the directory services. The attributes of each object can be viewed, changed, and deleted. This tool is included in the support tools on the Windows 2000 CD.