When DRBD detects that its peer node is down (either by true
hardware failure or manual intervention), DRBD changes its
connection state from Connected to
WFConnection and waits for the peer node to
re-appear. The DRBD resource is then said to operate in
disconnected mode. In disconnected mode,
the resource and its associated block device are fully usable,
and may be promoted and demoted as necessary, but no block
modifications are being replicated to the peer node. Instead,
DRBD stores internal information on which blocks are being
modified while disconnected.
If a node that currently has a resource in the secondary role fails temporarily (due to, for example, a memory problem that is subsequently rectified by replacing RAM), no further intervention is necessary — besides the obvious necessity to repair the failed node and bring it back on line. When that happens, the two nodes will simply re-establish connectivity upon system start-up. After this, DRBD replicates all modifications made on the primary node in the meantime, to the secondary node.
![]() | Important |
|---|---|
At this point, due to the nature of DRBD's re-synchronization algorithm, the resource is briefly inconsistent on the secondary node. During that short time window, the secondary node can not switch to the Primary role if the peer is unavailable. Thus, the period in which your cluster is not redundant consists of the actual secondary node down time, plus the subsequent re-synchronization. |
From DRBD's standpoint, failure of the primary node is almost identical to a failure of the secondary node. The surviving node detects the peer node's failure, and switches to disconnected mode. DRBD does not promote the surviving node to the primary role; it is the cluster management application's responsibility to do so.
When the failed node is repaired and returns to the cluster, it does so in the secondary role, thus, as outlined in the previous section, no further manual intervention is necessary. Again, DRBD does not change the resource role back, it is up to the cluster manager to do so (if so configured).
DRBD ensures block device consistency in case of a primary node failure by way of a special mechanism. For a detailed discussion, refer to the section called “The Activity Log”.
If a node suffers an unrecoverable problem or permanent destruction, you must follow the following steps:
Replace the failed hardware with one with similar performance and disk capacity.
![]() | Note |
|---|---|
Replacing a failed node with one with worse performance characteristics is possible, but not recommended. Replacing a failed node with one with less disk capacity is not supported, and will cause DRBD to refuse to connect to the replaced node. |
Install the base system and applications.
Install DRBD and copy
/etc/drbd.conf from the surviving
node.
Follow the steps outlined in Chapter 5, Configuring DRBD, but stop short of the section called “The initial device synchronization”.
Manually starting a full device synchronization is not necessary at this point, it will commence automatically upon connection to the surviving primary node.