Publications
HOME > PUBLICATIONS > THE LOGICFORCE LETTER   
     The LogicForce Letter

 E-mail article      Print article

 

Total IT Redundancy
By Geoffrey Swihart,
Senior Technical Consultant


Today more and more firms are running mission-critical applications and storing vital client data on servers without addressing the critical nature of the “health” of that equipment. The idea of having any application or piece of hardware fail can be disastrous to a business, especially if the failure continues for an extended period of time.

The goal of total IT redundancy is to remove any single point of failure from a firm’s data infrastructure. In other words, this process seeks to ensure that the loss of a single component in an organization’s technology infrastructure does not interrupt the system’s normal functioning by having backup components that perform duplicate functions. The contingencies that can threaten an organization’s infrastructure include storage media problems, network failure, workstation malfunction and total site disruption.

For the systems affected by the above failures to be completely redundant and protected, one first needs to identify possible single points of failure. Today’s servers usually sport redundant power supplies, RAID disk drive arrays, dual network cards and the ability to swap out hardware without bringing the entire system down. These features, although contributing immensely to uptime and reliability, do not make a system truly redundant. Single points of failure still exist with the physical integrity and accessibility of the server itself and with the possibility of the site housing the server becoming unavailable.

For complete redundancy in today’s IT world, an organization needs to have a server cluster. Clusters are sets of computers that cooperate to provide a highly-available and highly-scalable platform for applications and services. Here, this definition is refined to mean the protection of applications and hardware by the use of clustered servers. A server cluster works by having one machine function as the primary server while the other computers in the cluster act as backups. The primary server performs all work until failure. Any data being written to the primary server is also written to the backup servers, preventing data loss in the event of failure.

Server clusters can be implemented at a single site (Figure 1) or in multiple locations (Figure 2). A single-site server cluster protects the network from having one server fail, but provides no protection if a disaster were to strike that office. A geographically-dispersed cluster protects a network from complete failure. If one site goes down, then all applications and services would transfer to the mirrored site. This type of clustering provides redundancy for complete site failure.

For a geographically-dispersed cluster to work seamlessly, more duplicated equipment and services may be required at the various sites. A dedicated internet connection from one site to the next might be in order depending on the amount of data an organization has. Each site will also need to have a second internet service connection for ordinary tasks to be performed such as browsing the web. At the secondary site a terminal server might be needed for access to the mirrored data and services in the event of disaster at the primary location. The secondary site might also need to host desktop computers if it will also be used as a meeting space in the event of disaster.

There are multiple scenarios which could cause any organization’s IT infrastructure to fail. Planning for the worst is the only way to be completely prepared. Remember, however, that planning is only a partial solution--for any disaster recovery plan to be successful, it needs to be routinely checked and rehearsed.

___________________
Source: The LogicForce Letter, Summer 2006

Subscribe to The LogicForce Letter
Email:

back to top