E-mail
article Print
article
Total IT Redundancy
By Geoffrey Swihart,
Senior Technical Consultant
Today more and more firms are running mission-critical applications
and storing vital client data on servers without addressing the
critical nature of the “health” of that equipment. The
idea of having any application or piece of hardware fail can be
disastrous to a business, especially if the failure continues for
an extended period of time.
The goal of total IT redundancy is to
remove any single point of failure from a firm’s data infrastructure.
In other words, this process seeks to ensure that the loss of a
single component in an organization’s technology infrastructure
does not interrupt the system’s normal functioning by having
backup components that perform duplicate functions. The contingencies
that can threaten an organization’s infrastructure include
storage media problems, network failure, workstation malfunction
and total site disruption.
For the systems affected by the above
failures to be completely redundant and protected, one first needs
to identify possible single points of failure. Today’s servers
usually sport redundant power supplies, RAID disk drive arrays,
dual network cards and the ability to swap out hardware without
bringing the
entire system down. These features, although contributing immensely
to uptime and reliability, do not make a system truly redundant.
Single points of failure still exist with the physical integrity
and accessibility of the server itself and with the possibility
of the site housing the server becoming unavailable.
For complete redundancy in today’s
IT world, an organization needs to have a server cluster. Clusters
are sets of computers that cooperate to provide a highly-available
and highly-scalable platform for applications and services. Here,
this definition is refined to mean the protection of applications
and hardware by the use of clustered servers. A server cluster works
by having one machine function as the primary server while the other
computers in the cluster act as backups. The primary server performs
all work until failure. Any data being written to the primary server
is also written to the backup servers, preventing data loss in the
event of failure.
Server clusters can be implemented at
a single site (Figure 1) or in multiple locations
(Figure 2). A single-site server cluster protects
the network from having one server fail, but provides no protection
if a disaster were to strike that office. A geographically-dispersed
cluster protects a network from complete failure. If one site goes
down, then all applications and services would transfer to the mirrored
site. This type of clustering provides redundancy for complete site
failure.
For
a geographically-dispersed cluster to work seamlessly, more duplicated
equipment and services may be required at the various sites. A dedicated
internet connection from one site to the next might be in order
depending on the amount of data an organization has. Each site will
also need to have a second internet service connection for ordinary
tasks to be performed such as browsing the web. At the secondary
site a terminal server might be needed for access to the mirrored
data and services in the event of disaster at the primary location.
The secondary site might also need to host desktop computers if
it will also be used as a meeting space in the event of disaster.
There are multiple scenarios
which could cause any organization’s IT infrastructure to
fail. Planning for the worst is the only way to be completely prepared.
Remember, however, that planning is only a partial solution--for
any disaster recovery plan to be successful, it needs to be routinely
checked and rehearsed.
___________________
Source: The LogicForce Letter, Summer 2006
| Subscribe
to The LogicForce Letter |
| |
back to top |