Upgrading OpenStack¶
This section describes how to upgrade from the 2023.1 OpenStack release series to 2024.1. It is based on the upstream Kayobe documentation with additional considerations for using StackHPC Kayobe Configuration.
Overview¶
A StackHPC OpenStack upgrade is broken down into several phases.
Prerequisites
Preparation
Upgrading the Seed Hypervisor
Upgrading the Seed
Upgrading Wazuh Manager
Upgrading Wazuh Agents
Upgrading the Overcloud
Cleaning up
After preparation is complete, the remaining phases may be completed in any order, however the order specified above allows for completing as much as possible before the user-facing overcloud upgrade. It is not recommended to keep different parts of the system on different releases for extended periods due to the need to maintain and use separate local Kayobe environments.
Notable changes in the 2024.1 Release¶
There are many changes in the OpenStack 2024.1 release described in the release notes for each project. Here are some notable ones.
Heat disabled by default¶
The Heat OpenStack service is no longer enabled by default.
This behavior can be overridden manually:
kolla.yml¶kolla_enable_heat: true
Wherever possible, Magnum deployments should be migrated to the CAPI Helm driver. Instructions for enabling the driver can be found here. Enable the driver, recreate any clusters using Heat, and disable the service.
After the upgrade (so that alerts don’t fire) you can remove Heat with the following:
kayobe overcloud host command run --command "rm /etc/kolla/haproxy/services.d/heat-api.cfg" -l network -b
kayobe overcloud host command run --command "rm /etc/kolla/haproxy/services.d/heat-api-cfn.cfg" -l network -b
kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[0] -b
kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[1] -b
kayobe overcloud host command run --command "systemctl restart kolla-haproxy-container.service" -l network[2] -b
kayobe overcloud host command run --command "systemctl stop kolla-heat_api-container.service kolla-heat_api_cfn-container.service kolla-heat_engine-container.service" -l controllers -b
kayobe overcloud host command run --command "systemctl disable kolla-heat_api-container.service kolla-heat_api_cfn-container.service kolla-heat_engine-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_api-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_api_cfn-container.service" -l controllers -b
kayobe overcloud host command run --command "rm /etc/systemd/system/kolla-heat_engine-container.service" -l controllers -b
kayobe overcloud host command run --command "docker rm heat_api heat_api_cfn heat_engine" -l controllers
kayobe overcloud host command run --command "rm -rf /etc/kolla/heat-api /etc/kolla/heat-api-cfn /etc/kolla/heat-engine" --limit controllers -b
Then from the OpenStack CLI:
openstack service delete heat
openstack service delete heat-cfn
openstack user delete heat
openstack domain set --disable heat_user_domain
openstack domain delete heat_user_domain
You can drop the heat database too, unless you want to keep historical content.
docker exec -it mariadb mysql -u root -p
Enter the database password when prompted.
drop database heat;
Designate sink disabled by default¶
Designate sink is an optional Designate service which listens for event notifications, primarily from Nova and Neutron. It is disabled by default (when designate is enabled) in Caracal. It is not required for Designate to function.
If you still wish to use it, you should set the flag manually:
kolla/globals.yml¶designate_enable_notifications_sink: true
If you are using Designate and do not make this change, the Antelope
designate-sink container will remain on the controllers after the upgrade.
It must be removed manually.
Grafana Volume¶
The Grafana container volume is no longer used. If you wish to automatically
remove the old volume, set grafana_remove_old_volume to true in
kolla/globals.yml. Note that doing this will lose any plugins installed via
the CLI directly and not through Kolla. If you have previously installed
Grafana plugins via the Grafana UI or CLI, you must change to installing them
at image build time. The Grafana volume, which contains existing custom
plugins, will be automatically removed in the next release.
Prometheus HAproxy Exporter¶
Due to the change from using the prometheus-haproxy-exporter to using the
native support for Prometheus which is now built into HAProxy, metric names may
have been replaced and/or removed, and in some cases the metric names may have
remained the same but the labels may have changed. Alerts and dashboards may
also need to be updated to use the new metrics. Please review any configuration
that references the old metrics as this is not a backwards compatible change.
Horizon configuration¶
The Horizon role has been reworked to the preferred local_settings.d
configuration model. Files local_settings and custom_local_settings
have been renamed to _9998-kolla-settings.py and
_9999-custom-settings.py respectively. Users who use Horizon’s custom
configuration must change the names of those files in
etc/kolla/config/horizon as well.
Neutron DNS Domain¶
When Designate is enabled and the default Neutron DNS integration has not been
disabled, neutron_dns_domain must be configured manually in
kolla/globals.yml.
The neutron_dns_domain must end with a period . e.g. example.com..
The domain set should be something that is not use anywhere else such as
internal.compute.example.com.
The Neutron DNS integration can be disabled by setting
neutron_dns_integration: false in kolla/globals.yml
Redis Default User¶
The redis_connection_string has changed the username used from admin
to default. Whilst this does not have any negative impact on services
that utilise Redis it will feature prominently in any preview of the overcloud
configuration.
AvailabilityZoneFilter removal¶
Support for the AvailabilityZoneFilter filter has been dropped in Nova.
Remove it from any Nova config files before upgrading. It will cause errors in
Caracal and halt the Nova scheduler.
Keystone LDAP TLS configuration¶
Either [ldap] tls_cacertfile or [ldap] tls_cacertdir must be configured
if [ldap] use_tls is true or LDAP URL uses the ldaps:// scheme. LDAP
authentication will fail if this configuration is absent. See upstream
Keystone change
for more details.
OS Capacity exporter and dashboard enabled by default¶
The OS Capacity exporter will automatically be deployed after the upgrade.
During the upgrade, HAProxy config, Prometheus config and Grafana dashboards
will also be updated to use the exporter. If you want to disable this, change
the following in kayobe-config/etc/kayobe/stackhpc-monitoring.yml:
# Whether the OpenStack Capacity exporter is enabled.
# Enabling this flag will result in HAProxy configuration and Prometheus scrape
# targets being templated during deployment.
stackhpc_enable_os_capacity: false
Known issues¶
Due to an incorrect default value NGS will attempt to use v3alpha for the api path when communicating with etcd3. This isn’t possible as in Caracal etcd is running a newer version that has dropped support for v3alpha. You can work around this in custom config, see the SMS PR for an example: https://github.com/stackhpc/smslab-kayobe-config/pull/354
Due to a security-related change in the GRUB package on Rocky Linux 9, the operating system can become unbootable (boot will stop at a
grub>prompt). Remove the--root-dev-onlyoption from/boot/efi/EFI/rocky/grub.cfgafter applying package updates. This will happen automatically as a post hook when running thekayobe overcloud host package updatecommand.After upgrading OpenSearch to the latest 2023.1 container image, we have seen cluster routing allocation be disabled on some systems. See bug for details: https://bugs.launchpad.net/kolla-ansible/+bug/2085943. This will cause the “Perform a flush” handler to fail during the 2024.1 OpenSearch upgrade. To workaround this, you can run the following PUT request to enable allocation again:
curl -X PUT "https://<kolla-vip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } } 'Cinder database migrations fail during the upgrade process when the
use_quotacolumn is set toNULL, which can be the case on deleted volumes and snapshots if OpenStack has been in operation for several releases. See Launchpad bug 2070475 for details. Until the database migrations are fixed, the data can be fixed with the following MySQL queries:UPDATE volumes SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL; UPDATE snapshots SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL;
Security baseline¶
As part of the Caracal release we are looking to improve the security baseline of StackHPC OpenStack deployments. If any of the following have not been done, they should be completed before the upgrade begins.
Enable TLS on the public API network
Enable TLS on the internal API network
Configure walled garden networking
Deploy Wazuh
Prerequisites¶
Before starting the upgrade, ensure any appropriate prerequisites are satisfied. These will be specific to each deployment, but here are some suggestions:
If hypervisors will be rebooted, e.g. to pick up a new kernel, or reprovisioned, ensure that there is sufficient hypervisor capacity to drain at least one node.
If using Ironic for bare metal compute, ensure that at least one node is available for testing provisioning.
Ensure that expected test suites are passing, e.g. Tempest.
Resolve any Prometheus alerts.
Check for unexpected
ERRORorCRITICALmessages in OpenSearch Dashboard.Check Grafana dashboards.
Update the deployment to use the latest 2023.1 images and configuration.
If your customer has overriden any policies, check to see if they need updating to align with new defaults. These will be written to files
kolla/config/<service>/policy.yaml. Policy reference documentation can generally be found in the documentation of each project. For example, Nova policy: https://docs.openstack.org/nova/latest/configuration/policy.html
RabbitMQ SLURP upgrade¶
Note
The upgrade is reliant on recent changes. Make sure you have updated to the latest version of kolla ansible and deployed the latest kolla containers before proceeding.
Because this is a SLURP upgrade, RabbitMQ must be upgraded manually from 3.11, to 3.12, then to 3.13 on Antelope before the Caracal upgrade. This upgrade should not cause an API outage (though it should still be considered “at risk”).
Some errors have been observed in testing when the upgrades are performed back-to-back. A 200s delay eliminates this issue. On particularly large or slow deployments, consider increasing this timeout.
Additionally errors have been observed at sites with OVS networking where after the upgrade, tenant networking is broken and requires a reset of RabbitMQ. This can be done by running the rabbitmq-reset playbook.
kayobe overcloud service configuration generate --node-config-dir /tmp/ignore -kt none
kayobe kolla ansible run "rabbitmq-upgrade 3.12"
sleep 200
kayobe kolla ansible run "rabbitmq-upgrade 3.13"
RabbitMQ quorum queues¶
In Caracal, quorum queues are enabled by default for RabbitMQ. This is different to Antelope which used HA queues. Before upgrading to Caracal, it is strongly recommended that you migrate from HA to quorum queues. The migration is automated using a script.
Warning
This migration will stop all services using RabbitMQ and cause an extended API outage while queues are migrated. It should only be performed in a pre-agreed maintenance window.
Set the following variables in your kolla globals file (i.e.
$KAYOBE_CONFIG_PATH/kolla/globals.yml or
$KAYOBE_CONFIG_PATH/environments/$KAYOBE_ENVIRONMENT/kolla/globals.yml):
om_enable_rabbitmq_high_availability: false
om_enable_rabbitmq_quorum_queues: true
Then execute the migration script:
$KAYOBE_CONFIG_PATH/../../tools/rabbitmq-quorum-migration.sh
Preparation¶
Preparation is crucial for a successful upgrade. It allows for a minimal maintenance/change window and ensures we are ready if unexpected issues arise.
Upgrade plan¶
The less you need to think on upgrade day, the better. Save your brain for solving any issues that arise. Write an upgrade plan detailing:
the predicted schedule
a checklist of prerequisites
a set of smoke tests to perform after significant changes
a list of steps to perform during the preparation phase
a list of steps to perform during the upgrade maintenance/change window phase
a list of steps to perform during the follow up phase
a set of full system tests to perform after the upgrade is complete
space to make notes of progress and any issues/solutions/workarounds that arise
Ideally all steps will include the exact commands to execute that can be copy/pasted, or links to appropriate CI/CD workflows to run.
Backing up¶
Before you start, be sure to back up any local changes, configuration, and data.
See the Kayobe documentation for information on backing up the overcloud MariaDB database. It may be prudent to take backups at various stages of the upgrade since the database state will change over time.
Updating code forks¶
If the deployment uses any source code forks (other than the StackHPC ones), update them to use the 2024.1 release.
Migrating Kayobe Configuration¶
Kayobe configuration options may be changed between releases of Kayobe. Ensure that all site local configuration is migrated to the target version format. See the StackHPC Kayobe Configuration release notes, Kayobe release notes and Kolla Ansible release notes. In particular, the Upgrade Notes and Deprecation Notes sections provide information that might affect the configuration migration.
In the following example we assume a branch naming scheme of
example/<release>.
Create a branch for the new release:
git fetch origin
git checkout example/2023.1
git checkout -b example/2024.1
git push origin example/2024.1
Merge in the new branch of StackHPC Kayobe Configuration:
git remote add stackhpc https://github.com/stackhpc/stackhpc-kayobe-config
git fetch stackhpc
git fetch origin
git checkout -b example/2024.1-sync origin/example/2024.1
git merge stackhpc/stackhpc/2024.1
There may be conflicts to resolve. The configuration should be manually inspected after the merge to ensure that it is correct. Once complete, push the branch and create a pull request with the changes:
git push origin example/2024.1-sync
Once approved and merged, update the configuration to adapt to the new release.
This may involve e.g. adding, removing or renaming variables to allow for
upstream changes. Note that configuration in the base environment
(etc/kayobe/) will be merged with upstream changes, but anything in a
deployment-specific environment directory (etc/kayobe/environments/ may
require manual inspection.
If using the kayobe-env environment file in kayobe-config, this should
also be inspected for changes and modified to suit the local Ansible control
host environment if necessary. When ready, source the environment file:
source kayobe-env
Create one or more pull requests with these changes.
Once the configuration has been migrated, it is possible to view the global variables for all hosts:
kayobe configuration dump
The output of this command is a JSON object mapping hosts to their
configuration. The output of the command may be restricted using the
--host, --hosts, --var-name and --dump-facts options.
Upgrading local Kayobe environment¶
The local Kayobe environment should be either recreated or upgraded to use the new release. It may be beneficial to keep a Kayobe environment for the old release in case it is necessary before the uprade begins.
In general it is safer to rebuild an environment than upgrade, but for completeness the following shows how to upgrade an existing local Kayobe environment.
Change to the Kayobe configuration directory:
cd /path/to/src/kayobe-config
Check the status:
git status
Pull down the new branch:
git checkout example/2024.1
git pull origin example/2024.1
Activate the Kayobe virtual environment:
source /path/to/venvs/kayobe/bin/activate
Reinstall Kayobe and other dependencies:
pip install --force-reinstall -r requirements.txt
Source the kayobe-env script:
source kayobe-env [--environment <env>]
Export the Ansible Vault password:
export KAYOBE_VAULT_PASSWORD=$(cat /path/to/vault/password/file)
Next we must upgrade the Ansible control host. Tasks performed here include:
Install updated Ansible collection and role dependencies from Ansible Galaxy.
Generate an SSH key if necessary and add it to the current user’s authorised keys.
Upgrade Kolla Ansible locally to the configured version.
To upgrade the Ansible control host:
kayobe control host upgrade
Syncing Release Train artifacts¶
New StackHPC Release Train content should be synced to the local Pulp server. This includes host packages (Deb/RPM) and container images.
To sync host packages:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-publish.yml
Once the host package content has been tested in a test/staging environment, it may be promoted to production:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-promote-production.yml
To sync container images:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-publish.yml
Build locally customised container images¶
Note
The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.
In some cases it is necessary to build some or all images locally to apply
customisations. In order to do this it is necessary to set
stackhpc_pulp_sync_for_local_container_build to true before
syncing container images.
To build the overcloud images locally and push them to the local Pulp server:
kayobe overcloud container image build --push
It is possible to build a specific set of images by supplying one or more image name regular expressions:
kayobe overcloud container image build --push ironic- nova-api
Pull container images to hosts¶
Pulling container images from the local Pulp server to the control plane hosts can take a considerable time, because images are only synced from Ark to the local Pulp on demand, and there is potentially a large fan-out. Pulling images in advance of the upgrade moves this step out of the maintenance/change window. Consider checking available disk space before pulling:
kayobe overcloud host command run --command "df -h" --show-output --limit controllers[0],compute[0],storage[0]
Then pull the images:
kayobe overcloud container image pull
Preview overcloud service configuration changes¶
Kayobe allows us to generate overcloud service configuration in advance, and compare it with the running configuration. This allows us to check for any unexpected changes.
This can take a significant time, and it may be advisable to limit these commands to one of each type of host (controller, compute, storage, etc.). The following commands use a limit including the first host in each of these groups.
Save the old configuration locally.
kayobe overcloud service configuration save --node-config-dir /etc/kolla --output-dir ~/kolla-diff/old --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel
Generate the new configuration to a tmpdir.
kayobe overcloud service configuration generate --node-config-dir /tmp/kolla --kolla-limit controllers[0],compute[0],storage[0]
Save the new configuration locally.
kayobe overcloud service configuration save --node-config-dir /tmp/kolla --output-dir ~/kolla-diff/new --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel
The old and new configuration will be saved to ~/kolla-diff/old and
~/kolla-diff/new respectively on the Ansible control host.
Fix up the paths:
cd ~/kolla-diff/new
for i in *; do mv $i/tmp $i/etc; done
cd -
Compare the old & new configuration:
diff -ru ~/kolla-diff/{old,new} > ~/kolla-diff.diff
less ~/kolla-diff.diff
Upgrading the Seed Hypervisor¶
Currently, upgrading the seed hypervisor services is not supported. It may however be necessary to upgrade host packages and some host services.
Consider whether the seed hypervisor needs to be upgraded within or outside of a maintenance/change window.
Upgrading Host Packages¶
Note
In case of issues booting up, consider alternative access methods if the hypervisor is also used as the Ansible control host (or runs it in a VM).
Prior to upgrading the seed hypervisor, it may be desirable to upgrade system packages on the seed hypervisor host.
To update all eligible packages, use *, escaping if necessary:
kayobe seed hypervisor host package update --packages "*"
If the kernel has been upgraded, reboot the seed hypervisor to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed-hypervisor
Upgrading Host Services¶
It may be necessary to upgrade some host services:
kayobe seed hypervisor host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Configuring hosts¶
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:
kayobe seed hypervisor host configure --check --diff
When ready to apply the changes:
kayobe seed hypervisor host configure
Upgrading the Seed¶
Consider whether the seed needs to be upgraded within or outside of a maintenance/change window.
Upgrading Host Packages¶
Note
In case of issues booting up, consider alternative access methods if the seed is also used as the Ansible control host.
Prior to upgrading the seed, it may be desirable to upgrade system packages on the seed host.
Note that these commands do not affect packages installed in containers, only those installed on the host.
To update all eligible packages, use *, escaping if necessary:
kayobe seed host package update --packages "*"
If the kernel has been upgraded, reboot the seed to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed
Verify that Bifrost, Ironic and Inspector are running as expected:
ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit
Building Ironic Deployment Images¶
Note
It is possible to use prebuilt deployment images. In this case, this step can be skipped.
It is possible to use prebuilt deployment images from the OpenStack hosted
tarballs or another
source. In some cases it may be necessary to build images locally either to
apply local image customisation or to use a downstream version of Ironic Python
Agent (IPA). In order to build IPA images, the ipa_build_images variable
should be set to True. To build images locally:
kayobe seed deployment image build
To overwrite existing images, add the --force-rebuild argument.
Upgrading Host Services¶
It may be necessary to upgrade some host services:
kayobe seed host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Configuring hosts¶
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:
kayobe seed host configure --check --diff
When ready to apply the changes:
kayobe seed host configure
Building Container Images¶
Note
The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.
In some cases it is necessary to build some or all images locally to apply
customisations. In order to do this it is necessary to set
stackhpc_pulp_sync_for_local_container_build to true before
syncing container images.
To build the seed images locally and push them to the local Pulp server:
kayobe seed container image build --push
Upgrading Containerised Services¶
Containerised seed services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from the local Pulp registry.
To upgrade the containerised seed services:
kayobe seed service upgrade
Verify that Bifrost, Ironic and Inspector are running as expected:
ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit
Upgrading Wazuh Manager¶
Consider whether Wazuh Manager needs to be upgraded within or outside of a maintenance/change window.
Upgrading Host Packages¶
Prior to upgrading the Wazuh manager services, it may be desirable to upgrade system packages on the Wazuh manager host.
To update all eligible packages, use *, escaping if necessary:
kayobe infra vm host package update --packages "*" -l wazuh-manager
If the kernel has been upgraded, reboot the Wazuh Manager to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l wazuh-manager
Verify that Wazuh Manager is functioning correctly by logging into the Wazuh UI.
Configuring hosts¶
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:
kayobe infra vm host configure --check --diff -l wazuh-manager
When ready to apply the changes:
kayobe infra vm host configure -l wazuh-manager
Upgrade Wazuh Manager services¶
Run the following playbook to update Wazuh Manager services and configuration:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-manager.yml
Verify that Wazuh Manager is functioning correctly by logging into the Wazuh UI.
Upgrading Wazuh Agents¶
Consider whether Wazuh Agents need to be upgraded within or outside of a maintenance/change window.
Upgrade Wazuh Agent services¶
Run the following playbook to update Wazuh Agent services and configuration:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-agent.yml
Verify that the agents have conncted to Wazuh Manager correctly by logging into the Wazuh UI.
Upgrading the Overcloud¶
Consider which of the overcloud upgrade steps need to be performed within or outside of a maintenance/change window.
Upgrading Host Packages¶
Prior to upgrading the OpenStack control plane, it may be desirable to upgrade system packages on the overcloud hosts.
Note that these commands do not affect packages installed in containers, only those installed on the host.
In order to avoid downtime, it is important to control how package updates are rolled out. In general, controllers and network hosts should be updated one by one, ideally updating the host with the Virtual IP (VIP) last. For hypervisors it may be possible to update packages in batches of hosts, provided there is sufficient capacity to migrate VMs to other hypervisors.
For each host or batch of hosts, perform the following steps.
If the host is a hypervisor, disable the Nova compute service and drain it of VMs using live migration. If any VMs fail to migrate, they may be cold migrated or powered off:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-{disable,drain}.yml --limit <host>
To update all eligible packages, use *, escaping if necessary:
kayobe overcloud host package update --packages "*" --limit <host>
Note
Due to a security-related change in the GRUB package on Rocky Linux 9, the operating
system can become unbootable (boot will stop at a grub> prompt). Remove
the --root-dev-only option from /boot/efi/EFI/rocky/grub.cfg after
applying package updates. This will happen automatically as a post hook when
running the kayobe overcloud host package update command.
If the kernel has been upgraded, reboot the host or batch of hosts to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>
Warning
Take extra care when updating packages on Ceph hosts. Docker live-restore does not work until the Squid version of Ceph, so a reload of docker will restart all Ceph containers. Set the hosts to maintenance mode before updating packages, and unset when done:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
kayobe overcloud host package update --packages "*" --limit <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>
Always reconfigure hosts in small batches or one-by-one. Check the Ceph state after each host configuration. Ensure all warnings and errors are resolved before moving on.
If the host is a hypervisor, enable the Nova compute service.
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-enable.yml --limit <host>
If any VMs were powered off, they may now be powered back on.
Wait for Prometheus alerts and errors in OpenSearch Dashboard to resolve, or address them.
After updating controllers or network hosts, run any appropriate smoke tests.
Once happy that the system has been restored to full health, move onto the next host or batch or hosts.
Upgrading Host Services¶
Prior to upgrading the OpenStack control plane, the overcloud host services should be upgraded:
kayobe overcloud host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Configuring hosts¶
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in “check mode” to see what changes would be applied:
kayobe overcloud host configure --check --diff
When ready to apply the changes, it may be advisable to do so in batches, or at least start with a small number of hosts:
kayobe overcloud host configure --limit <host>
Warning
Take extra care when configuring Ceph hosts. Set the hosts to maintenance mode before reconfiguring them, and unset when done:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
kayobe overcloud host configure --limit <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>
Always reconfigure hosts in small batches or one-by-one. Check the Ceph state after each host configuration. Ensure all warnings and errors are resolved before moving on.
Building Ironic Deployment Images¶
Note
It is possible to use prebuilt deployment images. In this case, this step can be skipped.
It is possible to use prebuilt deployment images from the OpenStack hosted
tarballs or another
source. In some cases it may be necessary to build images locally either to
apply local image customisation or to use a downstream version of Ironic Python
Agent (IPA). In order to build IPA images, the ipa_build_images variable
should be set to True. To build images locally:
kayobe overcloud deployment image build
To overwrite existing images, add the --force-rebuild argument.
Upgrading Ironic Deployment Images¶
Prior to upgrading the OpenStack control plane you should upgrade
the deployment images. If you are using prebuilt images, update
the following variables in etc/kayobe/ipa.yml accordingly:
ipa_kernel_upstream_urlipa_kernel_checksum_urlipa_kernel_checksum_algorithmipa_ramdisk_upstream_urlipa_ramdisk_checksum_urlipa_ramdisk_checksum_algorithm
Alternatively, you can update the files that the URLs point to. If building the images locally, follow the process outlined in Building Ironic Deployment Images.
To get Ironic to use an updated set of overcloud deployment images, you can run:
kayobe baremetal compute update deployment image
This will register the images in Glance and update the deploy_ramdisk
and deploy_kernel properties of the Ironic nodes.
Before rolling out the update to all nodes, it can be useful to test the image
on a limited subset. To do this, you can use the --baremetal-compute-limit
option. The argument should take the form of an ansible host pattern
which is matched against the Ironic node name.
Upgrading Containerised Services¶
Containerised control plane services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from a registry or built locally.
If using overcloud Ironic, check whether any ironic nodes are in a wait state:
baremetal node list | grep wait
This will block the upgrade, but may be overridden by setting
ironic_upgrade_skip_wait_check to true in
etc/kayobe/kolla/globals.yml or
etc/kayobe/environments/<env>/kolla/globals.yml.
To upgrade the containerised control plane services:
kayobe overcloud service upgrade
It is possible to specify tags for Kayobe and/or kolla-ansible to restrict the scope of the upgrade:
kayobe overcloud service upgrade --tags config --kolla-tags keystone
Updating the Octavia Amphora Image¶
If using Octavia with the Amphora driver, you should update the amphora image.
Testing¶
At this point it is recommended to perform a thorough test of the system to catch any unexpected issues. This may include:
Check Prometheus, OpenSearch Dashboards and Grafana
Smoke tests
All applicable tempest tests
Horizon UI inspection
Cleaning up¶
Prune unused container images:
kayobe overcloud host command run -b --command "docker image prune -a -f"