Upgrading OpenStack

This section describes how to upgrade from the 2023.1 OpenStack release series to 2024.1. It is based on the upstream Kayobe documentation with additional considerations for using StackHPC Kayobe Configuration.

Overview

A StackHPC OpenStack upgrade is broken down into several phases.

  • Prerequisites

  • Preparation

  • Upgrading the Seed Hypervisor

  • Upgrading the Seed

  • Upgrading Wazuh Manager

  • Upgrading Wazuh Agents

  • Upgrading the Overcloud

  • Cleaning up

After preparation is complete, the remaining phases may be completed in any order, however the order specified above allows for completing as much as possible before the user-facing overcloud upgrade. It is not recommended to keep different parts of the system on different releases for extended periods due to the need to maintain and use separate local Kayobe environments.

Notable changes in the 2024.1 Release

There are many changes in the OpenStack 2024.1 release described in the release notes for each project. Here are some notable ones.

Heat disabled by default

The Heat OpenStack service is no longer enabled by default.

This behavior can be overridden manually:

kolla.yml
kolla_enable_heat: true

Wherever possible, Magnum deployments should be migrated to the CAPI Helm driver. Instructions for enabling the driver can be found here. Enable the driver, recreate any clusters using Heat, and disable the service.

After the upgrade (so that alerts don’t fire) you can remove Heat with the following:

kayobe overcloud host command run --command "rm /etc/kolla/haproxy/services.d/heat-api.cfg" -l network -b
kayobe overcloud host command run --command "rm /etc/kolla/haproxy/services.d/heat-api-cfn.cfg" -l network -b

kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[0] -b
kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[1] -b
kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[2] -b

kayobe overcloud host command run --command "systemctl stop kolla-heat_api-container.service kolla-heat_api_cfn-container.service kolla-heat_engine-container.service" -l controllers -b
kayobe overcloud host command run --command "systemctl disable kolla-heat_api-container.service kolla-heat_api_cfn-container.service kolla-heat_engine-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_api-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_api_cfn-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_engine-container.service" -l controllers -b

kayobe overcloud host command run --command "docker rm heat_api heat_api_cfn heat_engine" -l controllers

kayobe overcloud host command run --command "rm -rf /etc/kolla/heat-api /etc/kolla/heat-api-cfn /etc/kolla/heat-engine" --limit controllers -b

Then from the OpenStack CLI:

openstack service delete heat
openstack service delete heat-cfn
openstack user delete heat
openstack domain set --disable heat_user_domain
openstack domain delete heat_user_domain

You can drop the heat database too, unless you want to keep historical content.

docker exec -it mariadb mysql -u root -p
Enter the database password when prompted.
drop database heat;

Designate sink disabled by default

Designate sink is an optional Designate service which listens for event notifications, primarily from Nova and Neutron. It is disabled by default (when designate is enabled) in Caracal. It is not required for Designate to function.

If you still wish to use it, you should set the flag manually:

kolla/globals.yml
designate_enable_notifications_sink: true

If you are using Designate and do not make this change, the Antelope designate-sink container will remain on the controllers after the upgrade. It must be removed manually.

Grafana Volume

The Grafana container volume is no longer used. If you wish to automatically remove the old volume, set grafana_remove_old_volume to true in kolla/globals.yml. Note that doing this will lose any plugins installed via the CLI directly and not through Kolla. If you have previously installed Grafana plugins via the Grafana UI or CLI, you must change to installing them at image build time. The Grafana volume, which contains existing custom plugins, will be automatically removed in the next release.

Prometheus HAproxy Exporter

Due to the change from using the prometheus-haproxy-exporter to using the native support for Prometheus which is now built into HAProxy, metric names may have been replaced and/or removed, and in some cases the metric names may have remained the same but the labels may have changed. Alerts and dashboards may also need to be updated to use the new metrics. Please review any configuration that references the old metrics as this is not a backwards compatible change.

Horizon configuration

The Horizon role has been reworked to the preferred local_settings.d configuration model. Files local_settings and custom_local_settings have been renamed to _9998-kolla-settings.py and _9999-custom-settings.py respectively. Users who use Horizon’s custom configuration must change the names of those files in etc/kolla/config/horizon as well.

Neutron DNS Domain

When Designate is enabled and the default Neutron DNS integration has not been disabled, neutron_dns_domain must be configured manually in kolla/globals.yml.

The neutron_dns_domain must end with a period . e.g. example.com.. The domain set should be something that is not use anywhere else such as internal.compute.example.com.

The Neutron DNS integration can be disabled by setting neutron_dns_integration: false in kolla/globals.yml

Redis Default User

The redis_connection_string has changed the username used from admin to default. Whilst this does not have any negative impact on services that utilise Redis it will feature prominently in any preview of the overcloud configuration.

AvailabilityZoneFilter removal

Support for the AvailabilityZoneFilter filter has been dropped in Nova. Remove it from any Nova config files before upgrading. It will cause errors in Caracal and halt the Nova scheduler.

Keystone LDAP TLS configuration

Either [ldap] tls_cacertfile or [ldap] tls_cacertdir must be configured if [ldap] use_tls is true or LDAP URL uses the ldaps:// scheme. LDAP authentication will fail if this configuration is absent. See upstream Keystone change for more details.

OS Capacity exporter and dashboard enabled by default

The OS Capacity exporter will automatically be deployed after the upgrade. During the upgrade, HAProxy config, Prometheus config and Grafana dashboards will also be updated to use the exporter. If you want to disable this, change the following in kayobe-config/etc/kayobe/stackhpc-monitoring.yml:

# Whether the OpenStack Capacity exporter is enabled.
# Enabling this flag will result in HAProxy configuration and Prometheus scrape
# targets being templated during deployment.
stackhpc_enable_os_capacity: false

Known issues

  • Due to an incorrect default value NGS will attempt to use v3alpha for the api path when communicating with etcd3. This isn’t possible as in Caracal etcd is running a newer version that has dropped support for v3alpha. You can work around this in custom config, see the SMS PR for an example: https://github.com/stackhpc/smslab-kayobe-config/pull/354

  • Due to a security-related change in the GRUB package on Rocky Linux 9, the operating system can become unbootable (boot will stop at a grub> prompt). Remove the --root-dev-only option from /boot/efi/EFI/rocky/grub.cfg after applying package updates. This will happen automatically as a post hook when running the kayobe overcloud host package update command.

  • After upgrading OpenSearch to the latest 2023.1 container image, we have seen cluster routing allocation be disabled on some systems. See bug for details: https://bugs.launchpad.net/kolla-ansible/+bug/2085943. This will cause the “Perform a flush” handler to fail during the 2024.1 OpenSearch upgrade. To workaround this, you can run the following PUT request to enable allocation again:

    curl -X PUT "https://<kolla-vip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } } '
    
  • Cinder database migrations fail during the upgrade process when the use_quota column is set to NULL, which can be the case on deleted volumes and snapshots if OpenStack has been in operation for several releases. See Launchpad bug 2070475 for details. Until the database migrations are fixed, the data can be fixed with the following MySQL queries:

    UPDATE volumes SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL;
    UPDATE snapshots SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL;
    

Security baseline

As part of the Caracal release we are looking to improve the security baseline of StackHPC OpenStack deployments. If any of the following have not been done, they should be completed before the upgrade begins.

Prerequisites

Before starting the upgrade, ensure any appropriate prerequisites are satisfied. These will be specific to each deployment, but here are some suggestions:

  • If hypervisors will be rebooted, e.g. to pick up a new kernel, or reprovisioned, ensure that there is sufficient hypervisor capacity to drain at least one node.

  • If using Ironic for bare metal compute, ensure that at least one node is available for testing provisioning.

  • Ensure that expected test suites are passing, e.g. Tempest.

  • Resolve any Prometheus alerts.

  • Check for unexpected ERROR or CRITICAL messages in OpenSearch Dashboard.

  • Check Grafana dashboards.

  • Update the deployment to use the latest 2023.1 images and configuration.

  • If your customer has overriden any policies, check to see if they need updating to align with new defaults. These will be written to files kolla/config/<service>/policy.yaml. Policy reference documentation can generally be found in the documentation of each project. For example, Nova policy: https://docs.openstack.org/nova/latest/configuration/policy.html

RabbitMQ SLURP upgrade

Note

The upgrade is reliant on recent changes. Make sure you have updated to the latest version of kolla ansible and deployed the latest kolla containers before proceeding.

Because this is a SLURP upgrade, RabbitMQ must be upgraded manually from 3.11, to 3.12, then to 3.13 on Antelope before the Caracal upgrade. This upgrade should not cause an API outage (though it should still be considered “at risk”).

Some errors have been observed in testing when the upgrades are performed back-to-back. A 200s delay eliminates this issue. On particularly large or slow deployments, consider increasing this timeout.

Additionally errors have been observed at sites with OVS networking where after the upgrade, tenant networking is broken and requires a reset of RabbitMQ. This can be done by running the rabbitmq-reset playbook.

kayobe overcloud service configuration generate --node-config-dir /tmp/ignore -kt none
kayobe kolla ansible run "rabbitmq-upgrade 3.12"
sleep 200
kayobe kolla ansible run "rabbitmq-upgrade 3.13"

RabbitMQ quorum queues

In Caracal, quorum queues are enabled by default for RabbitMQ. This is different to Antelope which used HA queues. Before upgrading to Caracal, it is strongly recommended that you migrate from HA to quorum queues. The migration is automated using a script.

Warning

This migration will stop all services using RabbitMQ and cause an extended API outage while queues are migrated. It should only be performed in a pre-agreed maintenance window.

Set the following variables in your kolla globals file (i.e. $KAYOBE_CONFIG_PATH/kolla/globals.yml or $KAYOBE_CONFIG_PATH/environments/$KAYOBE_ENVIRONMENT/kolla/globals.yml):

om_enable_rabbitmq_high_availability: false
om_enable_rabbitmq_quorum_queues: true

Then execute the migration script:

$KAYOBE_CONFIG_PATH/../../tools/rabbitmq-quorum-migration.sh

Preparation

Preparation is crucial for a successful upgrade. It allows for a minimal maintenance/change window and ensures we are ready if unexpected issues arise.

Upgrade plan

The less you need to think on upgrade day, the better. Save your brain for solving any issues that arise. Write an upgrade plan detailing:

  • the predicted schedule

  • a checklist of prerequisites

  • a set of smoke tests to perform after significant changes

  • a list of steps to perform during the preparation phase

  • a list of steps to perform during the upgrade maintenance/change window phase

  • a list of steps to perform during the follow up phase

  • a set of full system tests to perform after the upgrade is complete

  • space to make notes of progress and any issues/solutions/workarounds that arise

Ideally all steps will include the exact commands to execute that can be copy/pasted, or links to appropriate CI/CD workflows to run.

Backing up

Before you start, be sure to back up any local changes, configuration, and data.

See the Kayobe documentation for information on backing up the overcloud MariaDB database. It may be prudent to take backups at various stages of the upgrade since the database state will change over time.

Updating code forks

If the deployment uses any source code forks (other than the StackHPC ones), update them to use the 2024.1 release.

Migrating Kayobe Configuration

Kayobe configuration options may be changed between releases of Kayobe. Ensure that all site local configuration is migrated to the target version format. See the StackHPC Kayobe Configuration release notes, Kayobe release notes and Kolla Ansible release notes. In particular, the Upgrade Notes and Deprecation Notes sections provide information that might affect the configuration migration.

In the following example we assume a branch naming scheme of example/<release>.

Create a branch for the new release:

git fetch origin
git checkout example/2023.1
git checkout -b example/2024.1
git push origin example/2024.1

Merge in the new branch of StackHPC Kayobe Configuration:

git remote add stackhpc https://github.com/stackhpc/stackhpc-kayobe-config
git fetch stackhpc
git fetch origin
git checkout -b example/2024.1-sync origin/example/2024.1
git merge stackhpc/stackhpc/2024.1

There may be conflicts to resolve. The configuration should be manually inspected after the merge to ensure that it is correct. Once complete, push the branch and create a pull request with the changes:

git push origin example/2024.1-sync

Once approved and merged, update the configuration to adapt to the new release. This may involve e.g. adding, removing or renaming variables to allow for upstream changes. Note that configuration in the base environment (etc/kayobe/) will be merged with upstream changes, but anything in a deployment-specific environment directory (etc/kayobe/environments/ may require manual inspection.

If using the kayobe-env environment file in kayobe-config, this should also be inspected for changes and modified to suit the local Ansible control host environment if necessary. When ready, source the environment file:

source kayobe-env

Create one or more pull requests with these changes.

Once the configuration has been migrated, it is possible to view the global variables for all hosts:

kayobe configuration dump

The output of this command is a JSON object mapping hosts to their configuration. The output of the command may be restricted using the --host, --hosts, --var-name and --dump-facts options.

Upgrading local Kayobe environment

The local Kayobe environment should be either recreated or upgraded to use the new release. It may be beneficial to keep a Kayobe environment for the old release in case it is necessary before the uprade begins.

In general it is safer to rebuild an environment than upgrade, but for completeness the following shows how to upgrade an existing local Kayobe environment.

Change to the Kayobe configuration directory:

cd /path/to/src/kayobe-config

Check the status:

git status

Pull down the new branch:

git checkout example/2024.1
git pull origin example/2024.1

Activate the Kayobe virtual environment:

source /path/to/venvs/kayobe/bin/activate

Reinstall Kayobe and other dependencies:

pip install --force-reinstall -r requirements.txt

Source the kayobe-env script:

source kayobe-env [--environment <env>]

Export the Ansible Vault password:

export KAYOBE_VAULT_PASSWORD=$(cat /path/to/vault/password/file)

Next we must upgrade the Ansible control host. Tasks performed here include:

  • Install updated Ansible collection and role dependencies from Ansible Galaxy.

  • Generate an SSH key if necessary and add it to the current user’s authorised keys.

  • Upgrade Kolla Ansible locally to the configured version.

To upgrade the Ansible control host:

kayobe control host upgrade

Syncing Release Train artifacts

New StackHPC Release Train content should be synced to the local Pulp server. This includes host packages (Deb/RPM) and container images.

To sync host packages:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-publish.yml

Once the host package content has been tested in a test/staging environment, it may be promoted to production:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-promote-production.yml

To sync container images:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-publish.yml

Build locally customised container images

Note

The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.

In some cases it is necessary to build some or all images locally to apply customisations. In order to do this it is necessary to set stackhpc_pulp_sync_for_local_container_build to true before syncing container images.

To build the overcloud images locally and push them to the local Pulp server:

kayobe overcloud container image build --push

It is possible to build a specific set of images by supplying one or more image name regular expressions:

kayobe overcloud container image build --push ironic- nova-api

Pull container images to hosts

Pulling container images from the local Pulp server to the control plane hosts can take a considerable time, because images are only synced from Ark to the local Pulp on demand, and there is potentially a large fan-out. Pulling images in advance of the upgrade moves this step out of the maintenance/change window. Consider checking available disk space before pulling:

kayobe overcloud host command run --command "df -h" --show-output --limit controllers[0],compute[0],storage[0]

Then pull the images:

kayobe overcloud container image pull

Preview overcloud service configuration changes

Kayobe allows us to generate overcloud service configuration in advance, and compare it with the running configuration. This allows us to check for any unexpected changes.

This can take a significant time, and it may be advisable to limit these commands to one of each type of host (controller, compute, storage, etc.). The following commands use a limit including the first host in each of these groups.

Save the old configuration locally.

kayobe overcloud service configuration save --node-config-dir /etc/kolla --output-dir ~/kolla-diff/old --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel

Generate the new configuration to a tmpdir.

kayobe overcloud service configuration generate --node-config-dir /tmp/kolla --kolla-limit controllers[0],compute[0],storage[0]

Save the new configuration locally.

kayobe overcloud service configuration save --node-config-dir /tmp/kolla --output-dir ~/kolla-diff/new --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel

The old and new configuration will be saved to ~/kolla-diff/old and ~/kolla-diff/new respectively on the Ansible control host.

Fix up the paths:

cd ~/kolla-diff/new
for i in *; do mv $i/tmp $i/etc; done
cd -

Compare the old & new configuration:

diff -ru ~/kolla-diff/{old,new} > ~/kolla-diff.diff
less ~/kolla-diff.diff

Upgrading the Seed Hypervisor

Currently, upgrading the seed hypervisor services is not supported. It may however be necessary to upgrade host packages and some host services.

Consider whether the seed hypervisor needs to be upgraded within or outside of a maintenance/change window.

Upgrading Host Packages

Note

In case of issues booting up, consider alternative access methods if the hypervisor is also used as the Ansible control host (or runs it in a VM).

Prior to upgrading the seed hypervisor, it may be desirable to upgrade system packages on the seed hypervisor host.

To update all eligible packages, use *, escaping if necessary:

kayobe seed hypervisor host package update --packages "*"

If the kernel has been upgraded, reboot the seed hypervisor to pick up the change:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed-hypervisor

Upgrading Host Services

It may be necessary to upgrade some host services:

kayobe seed hypervisor host upgrade

Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.

Configuring hosts

Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.

Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:

kayobe seed hypervisor host configure --check --diff

When ready to apply the changes:

kayobe seed hypervisor host configure

Upgrading the Seed

Consider whether the seed needs to be upgraded within or outside of a maintenance/change window.

Upgrading Host Packages

Note

In case of issues booting up, consider alternative access methods if the seed is also used as the Ansible control host.

Prior to upgrading the seed, it may be desirable to upgrade system packages on the seed host.

Note that these commands do not affect packages installed in containers, only those installed on the host.

To update all eligible packages, use *, escaping if necessary:

kayobe seed host package update --packages "*"

If the kernel has been upgraded, reboot the seed to pick up the change:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed

Verify that Bifrost, Ironic and Inspector are running as expected:

ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit

Building Ironic Deployment Images

Note

It is possible to use prebuilt deployment images. In this case, this step can be skipped.

It is possible to use prebuilt deployment images from the OpenStack hosted tarballs or another source. In some cases it may be necessary to build images locally either to apply local image customisation or to use a downstream version of Ironic Python Agent (IPA). In order to build IPA images, the ipa_build_images variable should be set to True. To build images locally:

kayobe seed deployment image build

To overwrite existing images, add the --force-rebuild argument.

Upgrading Host Services

It may be necessary to upgrade some host services:

kayobe seed host upgrade

Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.

Configuring hosts

Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.

Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:

kayobe seed host configure --check --diff

When ready to apply the changes:

kayobe seed host configure

Building Container Images

Note

The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.

In some cases it is necessary to build some or all images locally to apply customisations. In order to do this it is necessary to set stackhpc_pulp_sync_for_local_container_build to true before syncing container images.

To build the seed images locally and push them to the local Pulp server:

kayobe seed container image build --push

Upgrading Containerised Services

Containerised seed services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from the local Pulp registry.

To upgrade the containerised seed services:

kayobe seed service upgrade

Verify that Bifrost, Ironic and Inspector are running as expected:

ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit

Upgrading Wazuh Manager

Consider whether Wazuh Manager needs to be upgraded within or outside of a maintenance/change window.

Upgrading Host Packages

Prior to upgrading the Wazuh manager services, it may be desirable to upgrade system packages on the Wazuh manager host.

To update all eligible packages, use *, escaping if necessary:

kayobe infra vm host package update --packages "*" -l wazuh-manager

If the kernel has been upgraded, reboot the Wazuh Manager to pick up the change:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l wazuh-manager

Verify that Wazuh Manager is functioning correctly by logging into the Wazuh UI.

Configuring hosts

Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.

Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:

kayobe infra vm host configure --check --diff -l wazuh-manager

When ready to apply the changes:

kayobe infra vm host configure -l wazuh-manager

Upgrade Wazuh Manager services

Run the following playbook to update Wazuh Manager services and configuration:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-manager.yml

Verify that Wazuh Manager is functioning correctly by logging into the Wazuh UI.

Upgrading Wazuh Agents

Consider whether Wazuh Agents need to be upgraded within or outside of a maintenance/change window.

Upgrade Wazuh Agent services

Run the following playbook to update Wazuh Agent services and configuration:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-agent.yml

Verify that the agents have conncted to Wazuh Manager correctly by logging into the Wazuh UI.

Upgrading the Overcloud

Consider which of the overcloud upgrade steps need to be performed within or outside of a maintenance/change window.

Upgrading Host Packages

Prior to upgrading the OpenStack control plane, it may be desirable to upgrade system packages on the overcloud hosts.

Note that these commands do not affect packages installed in containers, only those installed on the host.

In order to avoid downtime, it is important to control how package updates are rolled out. In general, controllers and network hosts should be updated one by one, ideally updating the host with the Virtual IP (VIP) last. For hypervisors it may be possible to update packages in batches of hosts, provided there is sufficient capacity to migrate VMs to other hypervisors.

For each host or batch of hosts, perform the following steps.

If the host is a hypervisor, disable the Nova compute service and drain it of VMs using live migration. If any VMs fail to migrate, they may be cold migrated or powered off:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-{disable,drain}.yml --limit <host>

To update all eligible packages, use *, escaping if necessary:

kayobe overcloud host package update --packages "*" --limit <host>

Note

Due to a security-related change in the GRUB package on Rocky Linux 9, the operating system can become unbootable (boot will stop at a grub> prompt). Remove the --root-dev-only option from /boot/efi/EFI/rocky/grub.cfg after applying package updates. This will happen automatically as a post hook when running the kayobe overcloud host package update command.

If the kernel has been upgraded, reboot the host or batch of hosts to pick up the change:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>

Warning

Take extra care when updating packages on Ceph hosts. Docker live-restore does not work until the Squid version of Ceph, so a reload of docker will restart all Ceph containers. Set the hosts to maintenance mode before updating packages, and unset when done:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
kayobe overcloud host package update --packages "*" --limit <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>

Always reconfigure hosts in small batches or one-by-one. Check the Ceph state after each host configuration. Ensure all warnings and errors are resolved before moving on.

If the host is a hypervisor, enable the Nova compute service.

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-enable.yml --limit <host>

If any VMs were powered off, they may now be powered back on.

Wait for Prometheus alerts and errors in OpenSearch Dashboard to resolve, or address them.

After updating controllers or network hosts, run any appropriate smoke tests.

Once happy that the system has been restored to full health, move onto the next host or batch or hosts.

Upgrading Host Services

Prior to upgrading the OpenStack control plane, the overcloud host services should be upgraded:

kayobe overcloud host upgrade

Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.

Configuring hosts

Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.

Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:

kayobe overcloud host configure --check --diff

When ready to apply the changes, it may be advisable to do so in batches, or at least start with a small number of hosts:

kayobe overcloud host configure --limit <host>

Warning

Take extra care when configuring Ceph hosts. Set the hosts to maintenance mode before reconfiguring them, and unset when done:

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
kayobe overcloud host configure --limit <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>

Always reconfigure hosts in small batches or one-by-one. Check the Ceph state after each host configuration. Ensure all warnings and errors are resolved before moving on.

Building Ironic Deployment Images

Note

It is possible to use prebuilt deployment images. In this case, this step can be skipped.

It is possible to use prebuilt deployment images from the OpenStack hosted tarballs or another source. In some cases it may be necessary to build images locally either to apply local image customisation or to use a downstream version of Ironic Python Agent (IPA). In order to build IPA images, the ipa_build_images variable should be set to True. To build images locally:

kayobe overcloud deployment image build

To overwrite existing images, add the --force-rebuild argument.

Upgrading Ironic Deployment Images

Prior to upgrading the OpenStack control plane you should upgrade the deployment images. If you are using prebuilt images, update the following variables in etc/kayobe/ipa.yml accordingly:

  • ipa_kernel_upstream_url

  • ipa_kernel_checksum_url

  • ipa_kernel_checksum_algorithm

  • ipa_ramdisk_upstream_url

  • ipa_ramdisk_checksum_url

  • ipa_ramdisk_checksum_algorithm

Alternatively, you can update the files that the URLs point to. If building the images locally, follow the process outlined in Building Ironic Deployment Images.

To get Ironic to use an updated set of overcloud deployment images, you can run:

kayobe baremetal compute update deployment image

This will register the images in Glance and update the deploy_ramdisk and deploy_kernel properties of the Ironic nodes.

Before rolling out the update to all nodes, it can be useful to test the image on a limited subset. To do this, you can use the --baremetal-compute-limit option. The argument should take the form of an ansible host pattern which is matched against the Ironic node name.

Upgrading Containerised Services

Containerised control plane services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from a registry or built locally.

If using overcloud Ironic, check whether any ironic nodes are in a wait state:

baremetal node list | grep wait

This will block the upgrade, but may be overridden by setting ironic_upgrade_skip_wait_check to true in etc/kayobe/kolla/globals.yml or etc/kayobe/environments/<env>/kolla/globals.yml.

To upgrade the containerised control plane services:

kayobe overcloud service upgrade

It is possible to specify tags for Kayobe and/or kolla-ansible to restrict the scope of the upgrade:

kayobe overcloud service upgrade --tags config --kolla-tags keystone

Updating the Octavia Amphora Image

If using Octavia with the Amphora driver, you should update the amphora image.

Testing

At this point it is recommended to perform a thorough test of the system to catch any unexpected issues. This may include:

  • Check Prometheus, OpenSearch Dashboards and Grafana

  • Smoke tests

  • All applicable tempest tests

  • Horizon UI inspection

Cleaning up

Prune unused container images:

kayobe overcloud host command run -b --command "docker image prune -a -f"