DRBD Users Guide 8.0-8.3

Chapter 9. Integrating DRBD with Heartbeat clusters


This chapter talks about DRBD in combination with the legacy Linux-HA cluster manager found in Heartbeat 2.0 and 2.1. That cluster manager has been superseded by Pacemaker and the latter should be used whenever possible — please see Chapter 8, Integrating DRBD with Pacemaker clusters for more information. This chapter outlines legacy Heartbeat configurations and is intended for users who must maintain existing legacy Heartbeat systems for policy reasons.

The Heartbeat cluster messaging layer, a distinct part of the Linux-HA project that continues to be supported as of Heartbeat version 3, is fine to use in conjunction with the Pacemaker cluster manager. More information about configuring Heartbeat can be found as part of the Linux-HA User's Guide at http://www.linux-ha.org/doc/.

Heartbeat primer

The Heartbeat cluster manager

Heartbeat's purpose as a cluster manager is to ensure that the cluster maintains its services to the clients, even if single machines of the cluster fail. Applications that may be managed by Heartbeat as cluster services include, for example,

  • a web server such as Apache,

  • a database server such as MySQL, Oracle, or PostgreSQL,

  • a file server such as NFS or Samba, and many others.

In essence, any server application may be managed by Heartbeat as a cluster service.

Services managed by Heartbeat are typically removed from the system startup configuration; rather than being started at boot time, the cluster manager starts and stops them as required by the cluster configuration and status. If a machine (a physical cluster node) fails while running a particular set of services, Heartbeat will start the failed services on another machine in the cluster. These operations performed by Heartbeat are commonly referred to as (automatic) fail-over.

A migration of cluster services from one cluster node to another, by manual intervention, is commonly termed "manual fail-over". This being a slightly self-contradictory term, we use the alternative term switch-over for the purposes of this guide.

Heartbeat is also capable of automatically migrating resources back to a previously failed node, as soon as the latter recovers. This process is called fail-back.

Heartbeat resources

Usually, there will be certain requirements in order to be able to start a cluster service managed by Heartbeat on a node. Consider the example of a typical database-driven web application:

  • Both the web server and the database server assume that their designated IP addresses are available (i.e. configured) on the node.

  • The database will require a file system to retrieve data files from.

  • That file system will require its underlying block device to read from and write to (this is where DRBD comes in, as we will see later).

  • The web server will also depend on the database being started, assuming it cannot serve dynamic content without an available database.

The services Heartbeat controls, and any additional requirements those services depend on, are referred to as resources in Heartbeat terminology. Where resources form a co-dependent collection, that collection is called a resource group.

Heartbeat resource agents

Heartbeat manages resources by way of invoking standardized shell scripts known as resource agents (RA's). In Heartbeat clusters, the following resource agent types are available:

  • Heartbeat resource agents. These agents are found in the /etc/ha.d/resource.d directory. They may take zero or more positional, unnamed parameters, and one operation argument (start, stop, or status). Heartbeat translates resource parameters it finds for a matching resource in /etc/ha.d/haresources into positional parameters for the RA, which then uses these to configure the resource.

  • LSB resource agents. These are conventional, Linux Standard Base-compliant init scripts found in /etc/init.d, which Heartbeat simply invokes with the start, stop, or status argument. They take no positional parameters. Thus, the corresponding resources' configuration cannot be managed by Heartbeat; these services are expected to be configured by conventional configuration files.

  • OCF resource agents. These are resource agents that conform to the guidelines of the Open Cluster Framework, and they only work with clusters in CRM mode. They are usually found in either /usr/lib/ocf/resource.d/heartbeat or /usr/lib64/ocf/resource.d/heartbeat, depending on system architecture and distribution. They take no positional parameters, but may be extensively configured via environment variables that the cluster management process derives from the cluster configuration, and passes in to the resource agent upon invocation.

Heartbeat communication channels

Heartbeat uses a UDP-based communication protocol to periodically check for node availability (the "heartbeat" proper). For this purpose, Heartbeat can use several communication methods, including:

  • IP multicast,

  • IP broadcast,

  • IP unicast,

  • serial line.

Of these, IP multicast and IP broadcast are the most relevant in practice. The absolute minimum requirement for stable cluster operation is two independent communication channels.


A bonded network interface (a virtual aggregation of physical interfaces using the bonding driver) constitutes one Heartbeat communication channel.

Bonded links are not protected against bugs, known or as-yet-unknown, in the bonding driver. Also, bonded links are typically formed using identical network interface models, thus they are vulnerable to bugs in the NIC driver as well. Any such issue could lead to a cluster partition if no independent second Heartbeat communication channel were available.

It is thus not acceptable to omit the inclusion of a second Heartbeat link in the cluster configuration just because the first uses a bonded interface.