Forum Discussion

merrittr's avatar
merrittr
Icon for Contributor II rankContributor II
22 days ago

putting a 2 node SL1 HA database env in maintenance

I have 2 databases in an HA setup 

I see 

 Enable Cluster Maintenance

as one of the choices in coro_config

can I select this on both database and be able to do VMWare snapshots before doing upgrades and the databases will be in a "non writable state?"

8 Replies

  • Hello Rob,

    There are two different points being brought up here that I will address individually. When running the command coro_config on a CDB that is on Oracle Linux 8, the option Enable Cluster Maintenance places the entire cluster in maintenance mode, meaning you only have to run this on one of the nodes and it will place the entire cluster into maintenance. If the CDBs are still on Oracle Linux 7, then you would have to run this on each node you want placed in maintenance. 

    As for doing this before running a VMware snapshot before an upgrade, we do not support running VMware snapshots on any SL1 appliance. The only supported backup process is the Config and Full Backups built into SL1. It is also unnecessary to place the cluster into maintenance before an upgrade.

    Antonio Andres

    Principal Technical Support Engineer | ScienceLogic

  • Savage's avatar
    Savage
    Icon for Contributor III rankContributor III

    The way InnoDB works this may leave things in a weird state as putting the cluster in maintenance mode does not stop services it just stops the control of the services which allows you to bring down services if you need to or stop a fail over while performing certain actions on the OS layer. 

    While snapshots I believe or not supported If you really need to get that snapshot say before you do something major . I would suggest the following as extra precaution to help ensure you have a clean snapshot with MariaDB while using InnoDB Engine.

    1. Put Cluster in maintenance mode
    2. Stop PaceMaker on the Secondary Node
    3. Shutdwon the Primary Node
    4. Take Snapshot while powered off on the primary node
    5. Power Back on Primary Node
    6. set DRBD back to Primary on the Primary Node
    7. Disable Maintenance Mode
    8. Let services come back on old Primary Node
    9. Start PaceMaker back on the Secondary Node

    This should keep the state of the cluster before you made the snapshot and if you really need to revert you know that MariaDB was 100% in a non writable mode since the VM was off this way you don't have to deal with redo logs and recovery mode as this can take forever depending on the size of the database but at that point it may be easier to get a temporary clone of the VM as well if your look for a last fall back option if all things go wrong. 

    You will also take a IO hit on your drive for the snapshot when you goto remove it if the VM is online and if you are IO bound already when fully running you may get errors and potential slower then normal write performance. 

    I would still suggest a Full Backup if you can afford the wait time as well as a restore is always a option if it needs to be done. 

    Disclaimer : You should always test your restore plan in a controlled lab environment first if you can

    Savage 

  • merrittr's avatar
    merrittr
    Icon for Contributor II rankContributor II

    that is good infor thanks! 

    so step 

    Put Cluster in maintenance mode is in

    1 coro_config > Enable Cluster Maintenance
    Enable Cluster Maintenance
    Right?

    what is the DRBD process to set this node to primary again ?

     

    • TonyAndres's avatar
      TonyAndres
      Icon for Moderator rankModerator

      Hello Rob,

      Yes, selecting option 1 after running coro_config will put the entire cluster in maintenance.

      To failover, you would go to the current primary node and run the commands sudo systemctl stop pacemaker and then sudo systemctl start pacemaker. Keep in mind that if the cluster is in maintenance, then the failover will not occur so the cluster maintenance would have to be disabled during the failover.

      Antonio Andres

      Principal Technical Support Engineer | ScienceLogic

      • merrittr's avatar
        merrittr
        Icon for Contributor II rankContributor II

        OK so basically stop and start the pacemaker on the server I want as primary