Configuring split brain behavior

Split brain notification

DRBD invokes the split-brain handler, if configured, at any time split brain is detected. To configure this handler, add the following item to your resource configuration:

resource resource
  handlers {
    split-brain handler;
    ...
  }
  ...
}

handler may be any executable present on the system.

Since DRBD version 8.2.6, the DRBD distribution contains a split brain handler script that installs as /usr/lib/drbd/notify-split-brain.sh. It simply sends a notification e-mail message to a specified address. To configure the handler to send a message to root@localhost (which is expected to be an email address that forwards the notification to a real system administrator), configure the split-brain handler as follows:

resource resource
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    ...
  }
  ...
}

After you have made this modfication on a running resource (and synchronized the configuration file between nodes), no additional intervention is needed to enable the handler. DRBD will simply invoke the newly-configured handler on the next occurrence of split brain.

Automatic split brain recovery policies

In order to be able to enable and configure DRBD's automatic split brain recovery policies, you must understand that DRBD offers several configuration options for this purpose. DRBD applies its split brain recovery procedures based on the number of nodes in the Primary role at the time the split brain is detected. To that end, DRBD examines the following keywords, all found in the resource's net configuration section:

  • after-sb-0priSplit brain has just been detected, but at this time the resource is not in the Primary role on any host. For this option, DRBD understands the following keywords:

    • disconnectDo not recover automatically, simply invoke the split-brain handler script (if configured), drop the connection and continue in disconnected mode.

    • discard-younger-primaryDiscard and roll back the modifications made on the host which assumed the Primary role last.

    • discard-least-changesDiscard and roll back the modifications on the host where fewer changes occurred.

    • discard-zero-changesIf there is any host on which no changes occurred at all, simply apply all modifications made on the other and continue.

  • after-sb-1priSplit brain has just been detected, and at this time the resource is in the Primary role on one host. For this option, DRBD understands the following keywords:

    • disconnectAs with after-sb-0pri, simply invoke the split-brain handler script (if configured), drop the connection and continue in disconnected mode.

    • consensusApply the same recovery policies as specified in after-sb-0pri. If a split brain victim can be selected after applying these policies, automatically resolve. Otherwise, behave exactly as if disconnect were specified.

    • call-pri-lost-after-sbApply the recovery policies as specified in after-sb-0pri. If a split brain victim can be selected after applying these policies, invoke the pri-lost-after-sb handler on the victim node. This handler must be configured in the handlers section and is expected to forcibly remove the node from the cluster.

    • discard-secondaryWhichever host is currently in the Secondary role, make that host the split brain victim.

  • after-sb-2priSplit brain has just been detected, and at this time the resource is in the Primary role on both hosts. This option accepts the same keywords as after-sb-1pri except, of course, discard-secondary.

[Note]Note

DRBD understands additional keywords for these three options, which have been omitted here because they are very rarely used. Refer to drbd.conf(5) for details on split brain recovery keywords not discussed here.

For example, a resource which serves as the block device for a GFS or OCFS2 file system in dual-Primary mode may have its recovery policy defined as follows:

resource resource {
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh root"
    ...
  }
  net {
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
    ...
  }
  ...
}