SYNOPSIS

hroller {backend options...} [algorithm options...] [reporting options...]

hroller --version

Backend options:

{ -m cluster | -L[ path ] | -t data-file | -I path }

[ --force ]

Algorithm options:

[ -G name ] [ -O name... ] [ --node-tags tag,.. ] [ --skip-non-redundant ]

[ --offline-maintenance ] [ --ignore-non-redundant ]

Reporting options:

[ -v... | -q ] [ -S file ] [ --one-step-only ] [ --print-moves ]

DESCRIPTION

hroller is a cluster maintenance reboot scheduler. It can calculate which set of nodes can be rebooted at the same time while avoiding having both primary and secondary nodes being rebooted at the same time.

For backends that support identifying the master node (currently RAPI and LUXI), the master node is scheduled as the last node in the last reboot group. Apart from this restriction, larger reboot groups are put first.

ALGORITHM FOR CALCULATING OFFLINE REBOOT GROUPS

hroller will view the nodes as vertices of an undirected graph, with two kind of edges. Firstly, there are edges from the primary to the secondary node of every instance. Secondly, two nodes are connected by an edge if they are the primary nodes of two instances that have the same secondary node. It will then color the graph using a few different heuristics, and return the minimum-size color set found. Node with the same color can then simultaneously migrate all instance off to their respective secondary nodes, and it is safe to reboot them simultaneously.

OPTIONS

For a description of the standard options check htools(1) and hbal(1).

--force

Do not fail, even if the master node cannot be determined.

--node-tags tag,...

Restrict to nodes having at least one of the given tags.

--full-evacuation

Also plan moving secondaries out of the nodes to be rebooted. For each instance the move is at most a migrate (if it was primary on that node) followed by a replace secondary.

--skip-non-redundant

Restrict to nodes not hosting any non-redundant instance.

--offline-maintenance

Pretend that all instances are shutdown before the reboots are carried out. I.e., only edges from the primary to the secondary node of an instance are considered.

--ignore-non-redundnant

Pretend that the non-redundant instances do not exist, and only take instances with primary and secondary node into account.

--one-step-only

Restrict to the first reboot group. Output the group one node per line.

--print-moves

After each group list for each affected instance a node where it can be evacuated to. The moves are computed under the assumption that after each reboot group, all instances are moved back to their initial position.

BUGS

If instances are online the tool should refuse to do offline rolling maintenances, unless explicitly requested.

End-to-end shelltests should be provided.

EXAMPLES

Online Rolling reboots, using tags

Selecting by tags and getting output for one step only can be used for planing the next maintenance step.

  • $ hroller --node-tags needsreboot --one-step-only -L
    \[aq]First Reboot Group\[aq]
     node1.example.com
     node3.example.com
    
    

Typically these nodes would be drained and migrated.

  • $ GROUP=`hroller --node-tags needsreboot --one-step-only --no-headers -L`
    $ for node in $GROUP; do gnt-node modify -D yes $node; done
    $ for node in $GROUP; do gnt-node migrate -f --submit $node; done
    
    

After maintenance, the tags would be removed and the nodes undrained.

Offline Rolling node reboot output

If all instances are shut down, usually larger node groups can be found.

  • $ hroller --offline-maintainance -L
    \[aq]Node Reboot Groups\[aq]
    node1.example.com,node3.example.com,node5.example.com
    node8.example.com,node6.example.com,node2.example.com
    node7.example.com,node4.example.com
    
    

Rolling reboots with non-redundant instances

By default, hroller plans capacity to move the non-redundant instances out of the nodes to be rebooted. If requested, apropriate locations for the non-redundant instances can be shown. The assumption is that instances are moved back to their original node after each reboot; these back moves are not part of the output.

  • $ hroller --print-moves -L
    \[aq]Node Reboot Groups\[aq]
    node-01-002,node-01-003
      inst-20 node-01-001
      inst-21 node-01-000
      inst-30 node-01-005
      inst-31 node-01-004
    node-01-004,node-01-005
      inst-40 node-01-001
      inst-41 node-01-000
      inst-50 node-01-003
      inst-51 node-01-002
    node-01-001,node-01-000
      inst-00 node-01-002
      inst-01 node-01-003
      inst-10 node-01-005
      inst-11 node-01-004
    
    

REPORTING BUGS

Report bugs to project website (http://code.google.com/p/ganeti/) or contact the developers using the Ganeti mailing list ([email protected]).

RELATED TO hroller…

Ganeti overview and specifications: ganeti(7) (general overview), ganeti-os-interface(7) (guest OS definitions), ganeti-extstorage-interface(7) (external storage providers).

Ganeti commands: gnt-cluster(8) (cluster-wide commands), gnt-job(8) (job-related commands), gnt-node(8) (node-related commands), gnt-instance(8) (instance commands), gnt-os(8) (guest OS commands), gnt-storage(8) (storage commands), gnt-group(8) (node group commands), gnt-backup(8) (instance import/export commands), gnt-debug(8) (debug commands).

Ganeti daemons: ganeti-watcher(8) (automatic instance restarter), ganeti-cleaner(8) (job queue cleaner), ganeti-noded(8) (node daemon), ganeti-masterd(8) (master daemon), ganeti-rapi(8) (remote API daemon).

Ganeti htools: htools(1) (generic binary), hbal(1) (cluster balancer), hspace(1) (capacity calculation), hail(1) (IAllocator plugin), hscan(1) (data gatherer from remote clusters), hinfo(1) (cluster information printer), mon-collector(7) (data collectors interface).

COPYRIGHT

Copyright (C) 2006-2014 Google Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.