Table of Contents
Using DRBD in conjunction with the Linux-HA cluster manager ("Heartbeat") is arguably DRBD's most frequently found use case. Heartbeat is also one of the applications that make DRBD extremely powerful in a wide variety of usage scenarios. Hence, this is one of the more detailed chapters in this guide.
This chapter describes using DRBD as replicated storage for Linux-HA High Availability clusters. It covers both traditionally-configured, Heartbeat release 1-compatible clusters, and the more advanced CRM-enabled Heartbeat 2 clusters.
Heartbeat's purpose as a cluster manager is to ensure that the cluster maintains its services to the clients, even if single machines of the cluster fail. Applications that may be managed by Heartbeat as cluster services include, for example,
a web server such as Apache,
a database server such as MySQL, Oracle, or PostgreSQL,
a file server such as NFS or Samba, and many others.
In essence, any server application may be managed by Heartbeat as a cluster service.
Services managed by Heartbeat are typically removed from the system startup configuration; rather than being started at boot time, the cluster manager starts and stops them as required by the cluster configuration and status. If a machine (a physical cluster node) fails while running a particular set of services, Heartbeat will start the failed services on another machine in the cluster. These operations performed by Heartbeat are commonly referred to as (automatic) fail-over.
A migration of cluster services from one cluster node to another, by manual intervention, is commonly termed "manual fail-over". This being a slightly self-contradictory term, we use the alternative term switch-over for the purposes of this guide.
Heartbeat is also capable of automatically migrating resources back to a previously failed node, as soon as the latter recovers. This process is called fail-back.
Usually, there will be certain requirements in order to be able to start a cluster service managed by Heartbeat on a node. Consider the example of a typical database-driven web application:
Both the web server and the database server assume that their designated IP addresses are available (i.e. configured) on the node.
The database will require a file system to retrieve data files from.
That file system will require its underlying block device to read from and write to (this is where DRBD comes in, as we will see later).
The web server will also depend on the database being started, assuming it cannot server dynamic content without an available database.
The services Heartbeat controls, and any additional requirements those services depend on, are referred to as resources in Heartbeat terminology. Where resources form a co-dependent collection, that collection is called a resource group.
Heartbeat manages resources by way of invoking standardized shell scripts known as resource agents (RA's). In Heartbeat clusters, the following resource agent types are available:
Heartbeat resource agents. These agents are found in the
/etc/ha.d/resource.d directory.
They may take zero or more positional, unnamed
parameters, and one operation argument
(start, stop, or
status). Heartbeat translates resource
parameters it finds for a matching resource in
/etc/ha.d/haresources into
positional parameters for the RA, which then uses
these to configure the resource.
LSB resource agents. These are conventional, Linux Standard
Base-compliant init scripts found in
/etc/init.d, which Heartbeat
simply invokes with the start,
stop, or status argument.
They take no positional parameters. Thus, the
corresponding resources' configuration cannot be
managed by Heartbeat; these services are expected to
be configured by conventional configuration
files.
OCF resource agents. These are resource agents that conform to the
guidelines of the Open Cluster Framework, and they
only work with clusters in CRM
mode. They are usually found in either
/usr/lib/ocf/heartbeat/resource.d
or
/usr/lib64/ocf/heartbeat/resource.d,
depending on system architecture and distribution.
They take no positional parameters, but may be
extensively configured via environment variables that
the cluster management process derives from the
cluster configuration, and passes in to the resource
agent upon invocation.
Heartbeat uses a UDP-based communication protocol to periodically check for node availability (the "heartbeat" proper). For this purpose, Heartbeat can use several communication methods, including:
IP multicast,
IP broadcast,
IP unicast,
serial line.
Of these, IP multicast and IP broadcast are the most relevant in practice. The absolute minimum requirement for stable cluster operation is two independent communication channels.