Table of Contents
Using DRBD in conjunction with the OpenAIS/Pacemaker cluster stack is arguably DRBD's most frequently found use case. Pacemaker is also one of the applications that make DRBD extremely powerful in a wide variety of usage scenarios. Hence, this is one of the more detailed chapters in this guide.
This chapter describes using DRBD as replicated storage for Pacemaker High Availability clusters.
![]() | Important |
|---|---|
This chapter is relevant for Pacemaker versions 1.0.3 and above, and DRBD version 8.3.2 and above. It does not touch upon DRBD configuration in Pacemaker clusters of earlier versions. |
![]() | Note |
|---|---|
OpenAIS/Pacemaker is the direct, logical successor to the Heartbeat 2 cluster stack, and as far as the cluster resource manager infrastructure is concerned, a direct continuation of the Heartbeat 2 codebase. Since the intial stable release of OpenAIS/Pacemaker, Heartbeat 2 can be considered obsolete and Pacemaker should be used instead. For legacy configurations where Heartbeat must still be used, see Chapter 9, Integrating DRBD with Heartbeat clusters. |
Pacemaker's purpose as a cluster manager is to ensure that the cluster maintains its services to the clients, even if single machines of the cluster fail. Applications that may be managed by Pacemaker as cluster services include, for example,
a web server such as Apache,
a database server such as MySQL, Oracle, or PostgreSQL,
a file server such as NFS or Samba, and many others.
In essence, any server application may be managed by Pacemaker as a cluster service.
Services managed by Pacemaker are typically removed from the system startup configuration; rather than being started at boot time, the cluster manager starts and stops them as required by the cluster configuration and status. If a machine (a physical cluster node) fails while running a particular set of services, Pacemaker will start the failed services on another machine in the cluster. These operations performed by Pacemaker are commonly referred to as (automatic) fail-over.
Moving cluster services from one cluster node to another, by manual intervention, is commonly termed "manual fail-over". This being a slightly self-contradictory term, the in Pacemaker terminology such an action is referred to as a resource migration, or simply migration for short. We use that same terminology throughout this chapter.
Pacemaker is also capable of automatically migrating resources back to a previously failed node, as soon as the latter recovers (if this is desired). This process is called fail-back.
Usually, there will be certain requirements in order to be able to start a cluster service managed by Pacemaker on a node. Consider the example of a typical database-driven web application:
Both the web server and the database server assume that their designated IP addresses are available (i.e. configured) on the node.
The database will require a file system to retrieve data files from.
That file system will require its underlying block device to read from and write to (this is where DRBD comes in, as we will see later).
The web server will also depend on the database being started, assuming it cannot serve dynamic content without an available database.
The services Pacemaker controls, and any additional requirements those services depend on, are referred to as resources in Pacemaker terminology. Where resources form a co-dependent collection, that collection is called a resource group.
Pacemaker manages resources by way of invoking standardized shell scripts known as resource agents (RA's). In Pacemaker clusters, the following resource agent types are available:
Heartbeat 1 compatible resource agents. These agents are commonly found in the
/etc/ha.d/resource.d directory.
They are supported in Pacemaker for compatibility
reasons only, and should generally not be used in
production clusters.
LSB resource agents. These are conventional, Linux Standard
Base-compliant init scripts found in
/etc/init.d, which Pacemaker
simply invokes with the start,
stop, or status argument.
They take no positional parameters. Thus, the
corresponding resources' configuration cannot be
managed by Pacemaker; these services are expected to
be configured by conventional configuration
files.
OCF resource agents. These are resource agents that conform to the
guidelines of the Open Cluster Framework. They are
usually found in either
/usr/lib/ocf/resource.d or
/usr/lib64/ocf/resource.d,
depending on system architecture and distribution.
They are grouped by providers,
where each provider corresponds to one subdirectory in
the aforementioned directory. They take no positional
parameters, but may be extensively configured via
environment variables that the cluster management
process derives from the cluster configuration, and
passes in to the resource agent upon
invocation.
OpenAIS uses a UDP multicast based communication protocol to periodically check for node availability. This communcation protocol may be configured to use multiple network paths using the Redundant Ring Protocol (RRP). The absolute minimum requirement for stable cluster operation is two independent communication channels in a redundant ring.