Managing OpenStack services on HA Controller Nodes
OpenStack services on HA controller nodes are managed by Pacemaker. If you need to manage the services, you can place an HA controller node in standby mode to move it to another node. You can also place an entire set of HA controllers into standby mode, for example, in preparing for a site power outage. If necessary, you can instruct Pacemaker to stop managing a specific resource.
About this task
Under normal conditions, all services on an HA controller are stopped as a set by telling Pacemaker to put an HA controller node into standby mode. Services are started as a set by telling Pacemaker to take a HA controller out of standby mode. While it runs, Pacemaker monitors individual services, restarts them after failures, and moves them between nodes as required. When Pacemaker reports failures, it attempts to recover from and clear the failure. The recover can take anywhere from a few seconds to an hour depending on the failure. If the problem continues, the failure might require further investigation and manual intervention to resolve.
For HA controller nodes with internal DB2® installed, one HA controller node runs the virtual IP service, HAProxy service, and the master role for the IBM DB2 high availability disaster recovery (HADR) service. This HA controller node is the HADR primary, and is also referred to in this article as the primary HA controller. The node that is running these services changes in response to failures, or management actions like updating a node and shutting down nodes. This node has the most current database contents and also has the most current RabbitMQ message queue contents if it is the last node that is shut down.
$ knife os manage services standby --node HA_NODE_FQDN
Placing an HA controller node in standby mode, by using the standby action of the IBM Cloud Manager with OpenStack services command, instructs the HA components on that node to stop any running OpenStack services. Any OpenStack services that are found active are stopped, and if necessary, they are moved to another HA controller node. Using the standby action is the preferred way to move the DB2 HADR primary, the virtual IP, and any other resources that are Active/Passive off from a specific node. Active/Passive means that they run on only one node at a time. Once the resource or resources that you want to move are removed from the node, use the knife os manage services unstandby command to allow Pacemaker to restart and resume managing service on this node. For more information about the unstandby action, see details later in this document
$ pcs cluster standby HA_NODE_FQDN
$ knife os manage services unstandby --node HA_NODE_FQDN
Bringing an HA controller node out of standby mode by using the unstandby action of the IBM Cloud Manager with OpenStack services command allows it to run OpenStack services again if they are stopped. However, it does not automatically make it take over as the active node for active/passive OpenStack services.
The entire set of HA controllers can be put into standby mode. This method might be used, for example, when you are preparing for a site power outage. Putting an entire set of HA controller in standby mode is different from putting nodes into standby one by one. Pacemaker puts the entire cluster (all HA controllers) into standby. When the entire cluster is put in standby, it does not require moving services from node to node.
$ knife os manage services standby --topology-file your_topology_file
The standby --topology-file command returns before all the nodes are put in standby. Before you take other actions, use the knife os manage services status –topology-file your_topology_file command that is described later to monitor the services and wait until all services on the controller nodes are stopped.
$ knife os manage services unstandby --topology-file your_topology_file
$ knife os manage services status --node HA_NODE_FQDN
controller1.example.com Full list of resources:
controller1.example.com
controller1.example.com ibm-os-virtualip (ocf::heartbeat:IPaddr2): Started controller2.example.com
controller1.example.com ibm-os-haproxy (ocf::ibm-openstack:haproxy_agent): Started controller2.example.com
controller1.example.com Master/Slave Set: ibm-os-db2hadr-master [ibm-os-db2hadr]
controller1.example.com ibm-os-db2hadr (ocf::ibm-openstack:db2hadr): Master controller2.example.com
controller1.example.com ibm-os-db2hadr (ocf::ibm-openstack:db2hadr): Started controller1.example.com
controller1.example.com ibm-os-db2hadr (ocf::ibm-openstack:db2hadr): Started controller3.example.com
controller1.example.com Masters: [ controller2.example.com ]
controller1.example.com Slaves: [ controller1.example.com ]
$ pcs resource unmanage resource-name
You need to run this command from only one of the controller nodes. Pacemaker stops managing the resource on all nodes. After you have Pacemaker unmanage the resource, the service can be started and stopped manually without Pacemaker interfering.
$ pcs resource manage resource-name