2023.1 Antelope Series Release Notes

stackhpc/14.0.0.274-16

Security Issues

  • Fixes CVE-2026-42998, CVE-2026-42999, CVE-2026-43000, CVE-2026-43001 and CVE-2026-44394 with updated Keystone images.

stackhpc/14.0.0.274

Security Issues

stackhpc/14.0.0.259

Security Issues

  • Security fixes for bug 2119646: Unauthenticated access to EC2/S3 token endpoints can grant Keystone authorization.

stackhpc/14.0.0.258

Bug Fixes

stackhpc/14.0.0.247

Bug Fixes

  • Updated Neutron container image tags to fix CVE-2024-53916. See #2037002 for more details.

stackhpc/14.0.0.246

Bug Fixes

  • Reverts “Track all interfaces in Keepalived” so only HA interfaces are tracked. This prevents L3 HA router flapping when detaching floating IP addresses, because non-HA router interfaces did not include “no_track”. Closes-Bug: #2097770

stackhpc/14.0.0.245

Bug Fixes

  • Fix some broken links in the docs.

stackhpc/14.0.0.244

New Features

  • Workflow to update Kolla dependencies (Kayobe, Kolla and Kolla-Ansible) to the latest tag available in the StackHPC branch via CI.

stackhpc/14.0.0.243

Bug Fixes

  • Changed the Prometheus job name of OS Capacity exporter to os_capacity which is what Azimuth is expecting to have for cloud metrics dashboard.

stackhpc/14.0.0.239

Bug Fixes

  • Update neutron container images to apply keepalived PID clean up fix With this change, Neutron always deletes stale PID file if exists.

stackhpc/14.0.0.238

New Features

  • Added a new playbook pulp_sync_publish_promote that can be used to sync, publish and promote all repositories in a single step, as well as sync and publish container repos. If you do not want to promote repos then run with -e repo_promote_production=false.

stackhpc/14.0.0.237

Bug Fixes

  • Updates the nova-compute container image to fix bug 2091033. This bug would cause nova-compute to freeze, which would result in frequent monitoring alerts.

stackhpc/14.0.0.236

Bug Fixes

  • Fixed an issue with the prometheus.yml template which would break when deploying alertmanager.

stackhpc/14.0.0.234

New Features

  • Updates the StackHPC Cephadm Ansible collection from 1.18.0 to 1.19.1.

stackhpc/14.0.0.229

Bug Fixes

  • Fixes an issue where Squid proxy could be unable to reach external servers due to a preference of choosing IPv6 connectivity by default.

stackhpc/14.0.0.227

Bug Fixes

  • Updates Cinder container images to fix bug 1823445 (cinder.exception.MetadataCopyFailure).

stackhpc/14.0.0.223

Bug Fixes

  • OVN packages in Rocky Linux 9 container images have been updated to the latest minor release in the 24.03 series: ovn24.03-24.03.2-34. Neutron container images for Rocky Linux 9 have also been rebuilt.

stackhpc/14.0.0.218

New Features

  • Configures the Ironic Python Agent with useful settings for inspection, such as the extra-hardware and mellanox elements.

stackhpc/14.0.0.217

New Features

  • Use the StackHPC fork for building Blazar images with customizations to support flavor-based reservation.

stackhpc/14.0.0.216

New Features

  • The Openstack Dashboard in Grafana now includes logs from Openstack services.

stackhpc/14.0.0.215

Bug Fixes

  • The CIS hardening scripts no longer change permissions of log files by default. It is preferred to configure these permissions at source i.e on whatever is creating the files. It also suffered from a time-of-check to time-of-use race condition. If you want the old behaviour you can change rhel9cis_rule_4_2_3 and/or ubtu22cis_rule_4_2_3 to true.

stackhpc/14.0.0.212

Bug Fixes

  • Changes the duration for which redfish exporter must continually fail scrapes before triggering an alert to 15 minutes. This should hopefully reduce some alert spam.

stackhpc/14.0.0.211

Bug Fixes

  • Fixes an issue where setting redfish_exporter_scrape_group to a value other than overcloud would exclude those nodes from the redfish exporter scrapes.

stackhpc/14.0.0.207

Security Issues

stackhpc/14.0.0.205

New Features

  • Upgrades kayobe-automation submodule to 7676aa8.

    Upgrades kayobe-workflows collection to v1.1.0.

    Kayobe-automation config-diff now runs in parallel and generates both the old and new configuration at the same time. This should improve config-diff wait times.

    Add support for the pulp-sync-content run book.

Deprecation Notes

  • Kayobe-automation will now automatically detect vaulted files for the purpose of config-diff therefore, KAYOBE_CONFIG_SECRET_PATHS_EXTRA and KAYOBE_CONFIG_VAULTED_FILES_PATHS_EXTRA are no longer used

Security Issues

  • The upgraded kayobe-workflows collection increases the version of various Actions and containers used within GitHub based workflows, including increasing Docker in Docker to version 27.3.1 thus removing the vunerabilities present in 24.0-git.

stackhpc/14.0.0.204

Bug Fixes

  • Fixes creation and failover of Octavia TLS-terminated load balancers when storing the certificate and key as a PKCS12 bundle in Barbican.

stackhpc/14.0.0.196

Bug Fixes

  • Fixes a file descriptor leak in networking-mlnx which prevented VMs using Infiniband virtual functions from provisioning after a period of time.

  • Fixes KeyError: ip_version in networking-mlnx when used in conjuction with OVN mechanism driver.

stackhpc/14.0.0.195

New Features

  • A default firewall configuration is now included on an opt-in basis. The rules are defined under etc/kayobe/inventory/group_vars/all/firewall. More information can be found here

stackhpc/14.0.0.194

New Features

  • The default Tempest concurrency has been increased from 2 to 16. This is often easily achievable in production systems.

stackhpc/14.0.0.193

Upgrade Notes

stackhpc/14.0.0.192

Bug Fixes

  • Bumps Neutron container image tags to fix bug 2068644 which could prevent associating floating IPs with OVN-based load balancers.

stackhpc/14.0.0.190

New Features

  • Adds the networking-mlnx mechanism driver to the Neutron Server container and ebrctl utility to the Nova Compute container. This allows you to use the kolla_enable_neutron_mlnx feature flag.

stackhpc/14.0.0.185

Bug Fixes

  • Fixes a regression when using growroot.yml and software raid where the playbook would fail to identify the correct disk.

stackhpc/14.0.0.178

Security Issues

  • Fixes CVE-2024-44082 with updated container images for Ironic services. Note that Ironic Python Agent images also need to be updated to fully fix this vulnerability. If this is not possible, a new configuration option [conductor]conductor_always_validates_images is available. See the OSSA-2024-003 description for more details.

stackhpc/14.0.0.174

New Features

  • Add playbook to install pre-commit hooks and register them with git. The hooks currently configured to be installed will check yaml syntax, fix new line at end of file and remove excess whitespace. This is currently opt-in which can be achieved by running install-pre-commit-hooks playbook.

stackhpc/14.0.0.172

New Features

stackhpc/14.0.0.166

New Features

stackhpc/14.0.0.162

New Features

  • Added a script to automate RabbitMQ quorum queue migrations.

stackhpc/14.0.0.161

New Features

  • Adds two new custom playbooks for placing Ceph hosts into and removing them from maintenance:

    • ceph-enter-maintenance.yml

    • ceph-exit-maintenance.yml

Upgrade Notes

  • Updates the stackhpc.cephadm collection to version 1.18.0.

Bug Fixes

  • Fixes an issue with idempotency in the stackhpc.ceph.cephadm_keys plugin.

stackhpc/14.0.0.156

New Features

  • OVN version in Rocky Linux 9 container images has been updated to 24.03 (latest LTS).

stackhpc/14.0.0.155

Bug Fixes

  • Fixes the issue with interface names containing dashes in Hashicorp collection.

stackhpc/14.0.0.147

Bug Fixes

  • Updates Octavia container images to fix a maintenance task that was breaking OVN IPv4 load balancers with health monitors. LP#2072754.

stackhpc/14.0.0.146

New Features

  • Added a new group variable - stackhpc_repos_enabled - for unified control over usage of StackHPC Release Train package repositories. This makes it easier to set which hosts do or do not pull packages from release train.

stackhpc/14.0.0.143

Critical Issues

  • Fixes CVE-2024-40767 with updated container images for Nova services.

stackhpc/14.0.0.142

New Features

  • Added support for Rocky Linux 9.4 repositories and Kolla containers. Made 9.4 the default version for Rocky Linux.

  • Updated Rocky Linux 9.3 pulp repo versions. Added Rocky Linux pulp repo versions. Rebuilt Kolla containers with Rocky 9.4.

stackhpc/14.0.0.139

New Features

  • The Docker CE package for Ubuntu has been bumped from 5:24.0.6-1 to 5:25.0.0-1 This is a side effect of separating out the repos for Docker CE for Ubuntu Jammy/Focal.

Critical Issues

  • Disables password expiration and inactivity policies. This caused the kayobe and kolla service accounts to be locked out of the system. You should re-apply the CIS benchmark hardening playbook as soon as possible to avoid being locked out of your system.

Bug Fixes

  • Separated out repos for Docker CE for Ubuntu Jammy/Focal. This fixes a Pulp sync issue where two “identical” repository versions existed with different checksums.

stackhpc/14.0.0.126

New Features

  • Adds support for deploying a Prometheus Redfish exporter container on the seed. This can be used to query the overcloud BMCs via their redfish interfaces to produce various metrics relating to the hardware, and system health.

stackhpc/14.0.0.125

New Features

  • Adds a hook to automatically run the CIS benchmark hardening playbooks as part of host configure. This is guarded by the stackhpc_enable_cis_benchmark_hardening_hook configuration option and is disabled by default.

stackhpc/14.0.0.124

Security Issues

  • Adds a custom Apt repository to address CVE-2024-6387 in OpenSSH.

stackhpc/14.0.0.122

Upgrade Notes

  • To match the new CIS benchmark defaults on Ubuntu, you should remove the ipv6.disable=1 kernel command line option. If you wish to carry on with the current settings, change ubtu22cis_ipv6_required to false.

Bug Fixes

  • IPV6 is no longer disabled by default in the Ubuntu CIS hardening. If using the old behaviour you may hit 2071443.

stackhpc/14.0.0.118

Security Issues

  • Updates the Rocky Linux 9 SIG Security Common repository to address CVE-2024-6409 in OpenSSH.

stackhpc/14.0.0.117

Bug Fixes

  • Fixed incorrect Opensearch Dashboards Prometheus Blackbox Exporter configuration.

stackhpc/14.0.0.114

New Features

  • Adds a new Prometheus alert FluentdBufferTooLarge which is raised when the total size of queue buffers grows above 128 MiB.

stackhpc/14.0.0.113

Security Issues

  • Enables the Rocky Linux 9 SIG Security Common repository, which provides updated OpenSSH packages addressing CVE-2024-6387 (regreSSHion). Other packages available in this repository are currently ignored.

stackhpc/14.0.0.110

Security Issues

  • Addresses critical vulnerability CVE-2024-36039 by bumping the PyMySQL library to 1.1.1 in all affected Kolla images. This vulnerability allows SQL injection through untrusted JSON objects.

stackhpc/14.0.0.107

Critical Issues

  • Fixes CVE-2024-32498 with updated container images for Cinder, Glance and Nova services.

stackhpc/14.0.0.106

New Features

  • Added a templated set of default Prometheus Blackbox exporter endpoints.

stackhpc/14.0.0.102

New Features

  • Per OSD usage metrics are now available in the OSDs dashboard. The dashboard now includes a new section that displays a histogram of of the utilization of each OSD in the cluster. This can be useful for identifying OSDs that are outliers in terms of utilization, and may need to be rebalanced. Additionally, there is a histogram displaying the usage of the bluestoreDB for each OSD.

stackhpc/14.0.0.100

New Features

  • Adds a new diagnostics.yml playbook that collects diagnostic information from hosts. The diagnostics are aggregated to a directory ($PWD/diagnostics/ by default) on localhost. The diagnostics include:

    • Docker container logs

    • Kolla configuration files

    • Log files

    The collected diagnostic information contains sensitive information such as passwords in configuration files.

stackhpc/14.0.0.98

New Features

  • Adds a new stackhpc-openstack-tests.yml playbook that executes tests in the StackHPC OpenStack Tests repository. Both the playbook and tests are currently experimental, and are currently targeting only an all-in-one CI use case.

stackhpc/14.0.0.96

Bug Fixes

  • Fixes an issue where HashiCorp Vault standby nodes would trigger a Prometheus alert. To apply this fix to an existing system, the HAProxy configuration for Vault (kolla/config/haproxy/services.d/vault.cfg) must be manually updated following the Vault documentation.

  • Updates the stackhpc.hashicorp Ansible collection to 2.5.0. This brings in an idempotency fix for generating certificates.

  • The overcloud HashiCorp Vault playbooks have been modified to use the local Vault service rather than via HAProxy. This makes it possible to deploy and use Vault without HAProxy. This eliminates the previous bootstrapping issue where HAProxy needed to be deployed without TLS enabled while generating initial certificates.

stackhpc/14.0.0.94

New Features

  • Added two alerts (warning and critical) that are triggered when the ratio of (free_swap_space / total_swap_space) is below thresholds. Each threshold can be modified by altering value of alertmanager_node_free_swap_warning_threshold_ratio and alertmanager_node_free_swap_critical_threshold_ratio.

    Currently this solution has limitation of having one-size fits all policy. This can cause unwanted alerts for the hosts which utilise swap heavily Therefore it is recommended to tune the thresholds or apply silence rules for the needs.

  • Bumped Horizon kolla image Bumped Grafana from 10.1.5-1 to 10.4.2-1 (CentOS & Rocky Linux) Bumped Grafana from 10.4.1 to 10.4.2 (Ubuntu) Bumped Prometheus-msteams from 1.5.0 to 1.5.2

  • Adds support for providing a CA certificate for OpenStack Capacity exporter.

  • Allows to synchronise a custom list of containers to Pulp using the stackhpc_pulp_repository_container_repos_extra and stackhpc_pulp_distribution_container_extra variables.

  • Bumped Horizon kolla image Bumped Grafana from 10.1.5-1 to 10.4.2-1 (Rocky Linux) Bumped Grafana from 10.4.1 to 10.4.2 (Ubuntu) Bumped Prometheus-msteams from 1.5.1 to 1.5.2

Security Issues

  • Fixed CVE-2023-31047 for Horizon. Fixed CVE-2023-49569 for Grafana. Fixed CVE-2022-40083 and CVE-2021-4238 for Prometheus-msteams.

  • Fixed CVE-2023-31047 for Horizon. Fixed CVE-2023-49569 for Grafana. Fixed CVE-2022-40083 and CVE-2021-4238 for Prometheus-msteams.

stackhpc/14.0.0.92

New Features

  • Updates Magnum CAPI Helm driver version to OpenDev v1.0.0

stackhpc/14.0.0.87

Known Issues

  • Generate backend TLS files for network hosts. This fixes backend TLS configuration for deployments where some API services are running on network hosts.

Bug Fixes

  • Prevents raising a Ceph PgsUnclean alert because of backfilling which can frequently happen because of normal rebalancing activities, such as use of the Ceph balancer or OSD addition.

stackhpc/14.0.0.83

New Features

  • Add optional support for relabelling network devices in Prometheus. Use network names as defined in kayobe, instead of network device names. Reuse of device names within an environment is not supported.

stackhpc/14.0.0.81

New Features

  • Bumped pulp repo versions for Q2 2024 Bumped Kolla image tags for Q2 2024 Bumped prometheus server from 2.38.0 to 2.51.1 Bumped prometheus alertmanager from 0.24.0 to 0.26.0 Bumped prometheus blackbox exporter from 0.23.0 to 0.25.0 Bumped prometheus cadvisor exporter from 0.48.0 to 0.49.1 Bumped prometheus haproxy exporter from 0.13.0 to 0.15.0 Bumped prometheus memcached exporter from 0.10.0 to 0.14.3 Bumped prometheus msteams from 1.5.1 to 1.5.2 Bumped prometheus mtail from 3.0.0-rc50 to 3.0.0-rc53 Bumped prometheus mysqld exporter from 0.15.0 to 0.15.1 Bumped prometheus node exporter from 1.4.0 to 1.7.0 Bumped prometheus openstack exporter from 1.6.0 to 1.7.0 Bumped prometheus ovn exporter from 1.0.6 to 1.0.7 Bumped opensearch from 2.11.1-1 to 2.13.0-1 (Rocky Linux 9) Bumped opensearch from 2.12.0 to 2.13.0 (Ubuntu Jammy) Bumped grafana from 10.1.5-1 to 10.4.2-1 (Rocky Linux 9) Bumped grafana from 10.4.0 to 10.4.2 (Ubuntu Jammy)

Security Issues

  • Fixed CVE-2023-31047, CVE-2023-23969, CVE-2023-24580, CVE-2023-36053, CVE-2023-46695, CVE-2023-30861, CVE-2022-4899. CVE-2024-1135, GHSA-2m57-hf25-phgg, CVE-2023-0286, CVE-2023-50782, CVE-2024-26130 for openstack services.

    Fixed CVE-2022-41723, CVE-2023-39325 (except prometheus-alertmanager, prometheus-msteams-exporter, prometheus-haproxy-exporter, prometheus-openstack-exporter. No patch available.), CVE-2021-43565, CVE-2022-27191, CVE-2022-27664, CVE-2021-38561, CVE-2022-21698, CVE-2021-4238, CVE-2022-40083, CVE-2022-41721, CVE-2021-33194, CVE-2023-2253, CVE-2023-27561, CVE-2023-28840, CVE-2024-21626, CVE-2022-32149, CVE-2023-45142, GHSA-m425-mq94-257g for prometheus server and exporters except prometheus-libvirt-exporter and prometheus-haproxy-exporter. (Source repository of each are archived and no longer maintained)

    Fixed CVE-2023-39325, CVE-2023-45142, CVE-2023-47108, CVE-2023-49568, CVE-2023-49569, GHSA-9763-4f94-gfch, GHSA-m425-mq94-257g for grafana.

    It is advised to redeploy service with current version of images from StackHPC Release Train.

stackhpc/14.0.0.80

New Features

  • Supports adding CA certificates to the Tempest container trust store.

stackhpc/14.0.0.79

Bug Fixes

  • The OpenSearch backend for CloudKitty has been fixed, so the Horizon Rating panels work again.

stackhpc/14.0.0.78

New Features

  • Adds a new Prometheus alert HostNetworkBondDegraded which will be raised when at least one bond member is down.

  • Adds a new Prometheus alert HostNetworkBondSingleLink which will be raised when a bond is configured with only one member. This can happen when NetworkManager detects that a bond member is down at boot time. This alert can be disabled by setting alertmanager_warn_network_bond_single_link to false.

stackhpc/14.0.0.77

Bug Fixes

  • Adds a custom fix-houston.yml playbook to address dmesg errors, specifically: “tc mirred to Houston: device bond0-ovs is down”. This error typically appears when OVS HW offloading is enabled, often in conjunction with VF-LAG and ASAP^2. Detailed usage instructions are provided within the playbook’s comments. Additional context is available at the following links: LP#1899364 Kernel Patch

stackhpc/14.0.0.76

Bug Fixes

  • Fixes appending to ca.crt in make-cert-client.sh causing multiple identical ca certs being added into /etc/kubernetes/certs/ca.crt.

stackhpc/14.0.0.73

Security Issues

  • Update Horizon on Ubuntu to include apache2 package 2.4.52-1ubuntu4.8 which fixes CVE-2023-31122.

stackhpc/14.0.0.72

New Features

  • Updates Magnum CAPI Helm driver version to v0.13.0

stackhpc/14.0.0.71

New Features

  • Updates Magnum CAPI Helm driver version to v0.11.0

  • Automatic deployment for OpenStack Capacity via a Kayobe service deploy hook using kolla admin credentials.

Upgrade Notes

  • Updates the Ansible configuration to fail on any unparsed inventory source. If you are using a separate Ansible configuration for Kolla Ansible, you may wish to add this setting in etc/kayobe/kolla/ansible.cfg.

  • OpenStack Capacity no longer uses application credentials. Please delete any previously generated application credentials.

stackhpc/14.0.0.69

Upgrade Notes

  • Ensure that your deployment has only one nova-compute-ironic service running per conductor group. See the operations / nova-compute-ironic doc for further details.

Bug Fixes

  • Adds basic support and a document explaining how to migrate to a single nova-compute-ironic instance, and how to re-deploy the instance to another machine in the event of failure. See the operations / nova-compute-ironic doc for further details.

stackhpc/14.0.0.68

Security Issues

  • The Heat container images are rebuilt with yaql 3.0.0 to include patch for vulnerability OSSN/OSSN-0093. It is recommended that you redeploy Heat services in your system with the current version of Heat images from StackHPC Release Train.

stackhpc/14.0.0.67

New Features

  • Updates Magnum CAPI Helm driver version to v0.12.0

stackhpc/14.0.0.66

Security Issues

  • Kolla container images created using the stackhpc-container-image-build.yml workflow are now automatically scanned for vulnerablilities.

stackhpc/14.0.0.64

Bug Fixes

  • The grafana image now includes the gnocchixyz-gnocchi-datasource and the grafana-opensearch-datasource plugins, which are the default upstream plugins.

stackhpc/14.0.0.62

Upgrade Notes

  • Updates Magnum CAPI Helm driver version to v0.11.0

stackhpc/14.0.0.59

Bug Fixes

  • Fix an issue with the OSD summary pie chart not showing any data.

stackhpc/14.0.0.55

Bug Fixes

  • Updates Magnum CAPI Helm driver version to v0.10.0

  • Fixes Grafana panel of top Ceph pools by capacity used. This panel was only showing the most used pool instead of as many pools as configured with the $topk variable.

stackhpc/14.0.0.54

New Features

  • The smartmon-tools playbook now ensures that the cron service is running as in some cases it may not be running by default.

Upgrade Notes

  • Update Ubuntu Jammy Zed Kolla container tags.

stackhpc/14.0.0.51

New Features

  • Adds alerts for software raid failures.

stackhpc/14.0.0.49

New Features

  • Adds a custom playbook (pulp-auth-proxy.yml) for deploying an authenticating proxy for Pulp. This can be used when building container images to avoid leaking credentials for package repositories into the built images or their metadata.

stackhpc/14.0.0.42

New Features

  • Rocky images have been rebuilt and are now based on Rocky 9.3.

stackhpc/14.0.0.40

New Features

  • Adds NVMe and S.M.A.R.T utilities to the overcloud host image built by DIB.

stackhpc/14.0.0.39

Bug Fixes

  • Removes bogus ContainerVolumeUsage alert. This rule wasn’t correctly measuring container volume IO and could cause spurious alerts.

  • Add a new reset-bls-entries.yml custom playbook which will rename existing Boot Loader Specification (BLS) entries using the current machine ID for each host. This should fix an issue with Grub not selecting the most recent kernel during boot.

stackhpc/14.0.0.36

New Features

  • Added support for Rocky Linux 9.3 repositories and Kolla containers. Made 9.3 the default version for Rocky Linux.

  • Updated Rocky Linux 9.2 pulp repo versions. Added Rocky Linux 9.3 pulp repo versions. Rebuilt Kolla containers with Rocky Linux 9.3.

Upgrade Notes

  • Bifrost Ironic debug logging is now disabled by default. Change ironic_debug to true to revert.

  • Updates Consul to 1.16.4 and Vault to 1.14.8.

Bug Fixes

  • Bumps OpenSearch heap size to 8 GB, to be identical to Elasticsearch.

stackhpc/14.0.0.31

New Features

  • StackHPC Kayobe Configuration container images for CI/CD with Kayobe Automation are now published to GitHub Container Registry (GHCR) at ghcr.io/stackhpc/stackhpc-kayobe-config. The image is tagged with the name of the release branch, e.g. stackhpc/yoga.

stackhpc/14.0.0.30

Bug Fixes

  • Previously switchdev capabilities should be configured manually by a user with admin privileges using port’s binding profile. This blocked regular users from managing ports with Open vSwitch hardware offloading as providing write access to a port’s binding profile to non-admin users introduces security risks. For example, a binding profile may contain a pci_slot definition, which denotes the host PCI address of the device attached to the VM. A malicious user can use this parameter to passthrough any host device to a guest, so it is impossible to provide write access to a binding profile to regular users in many scenarios.

    This patch fixes this situation by translating VF capabilities reported by Libvirt to Neutron port binding profiles. Other VF capabilities are translated as well for possible future use. LP#2008238. LP#2020813.

  • Neutron ovn db sync operation will no longer removes OVN metadata ports in networks with Octavia OVN Load balancers health monitors. A maintenance task process has been added to update the existing OVN LB HM ports to the new behaviour defined. Specifically, the “device_owner” field will be updated from network:distributed to ovn-lb-hm:distributed. Additionally, the “device_id” will be populated during update action. LP#2038091.

stackhpc/14.0.0.26

Bug Fixes

stackhpc/14.0.0.25

New Features

  • Adds support for deploying GitHub runners and creating GitHub workflows for use within Kayobe Automation. Two playbooks and their requirements have been added to ansible/ in addition to the relevant groups defined with some useful default variables where appropriate. Finally, documentation has been added to cover how to deploy these runners and workflows.

  • Added the stop-openstack-services.yml playbook, which can be used to stop OpenStack services across the overcloud.

Bug Fixes

  • Pin the OCI image tag used for the Ubuntu Focal base-image of Kolla image builds. This prevents packages in the image with the latest tag getting in front of StackHPC release-train package repositories. Ubuntu tag should be bumped when new packages are available in StackHPC release-train.

stackhpc/14.0.0.22

New Features

  • Updates OpenSearch to 2.11.1.

stackhpc/14.0.0.21

Bug Fixes

  • Pin the OCI image tag used for the base-image of Rocky 9 Kolla image builds. This prevents packages in the image with the latest tag getting in front of StackHPC release-train package repositories.

stackhpc/14.0.0.20

New Features

  • Added the rekey-hosts.yml playbook to automatically rotate the SSH keys on all hosts.

  • Adds support for Ubuntu Jammy and Rocky 9 to the CIS benchmark hardening playbook: cis.yml. This playbook will need to be manually applied.

  • Adds a panel in the Hardware Overview dashboard to show DWPD (Drive writes per day) for NVMEs. This is calculated by dividing the total bytes written in the past 24 hours by the drive capacity. This is currently only supported on NVMEs.

  • Adds alerts that will fire after 1 DWPD is sustained for 7 days, and a critical alert if 1 DWPD is sustained for 30 days.

Bug Fixes

  • Fixes display of the OpenSearch cluster health in Grafana when in yellow state.

  • Fix Grafana HAProxy dashboard when non-default Prometheus instance labels are used.

stackhpc/14.0.0.17

New Features

  • Neutron containers are now built from our StackHPC fork.

Upgrade Notes

  • Updates default Ceph images to v17.2.7 for Quincy.

  • Updates Consul to 1.16.3 and Vault to 1.14.6.

Bug Fixes

  • Fixes the bulk API of CloudKitty so that it now supports the migration from Elasticsearch to OpenSearch.

  • Fixes an issue with the growroot playbook where disks such as ‘sdp’ would become ‘sd’ due to the removal of the trailing ‘p’ when dealing with nvme devices.

  • Fixes Neutron so that load balancer FIPs are not broken on Neutron restart. See Neutron bug report.

  • Fixes issue where Netmiko devices were sending no commands to the switch since plug_bond_to_network is overridden in networking_generic_switch/devices/netmiko_devices/init.py and PLUG_BOND_TO_NETWORK to set to None. See NGS bug report.

  • Restores valid value for the flavor_id label on openstack_nova_server_status Prometheus metrics.

stackhpc/14.0.0.16

New Features

  • Adds kolla config merging options to the Kolla custom config generation section of etc/kayobe/kolla.yml.

Upgrade Notes

  • Kolla config merging is enabled by default in the Antelope release of Kayobe. This was quite an extensive change and whilst backwards compatbility was one of the goals, there may be some situations where refactoring of your Kolla config will be necessary. Extra care should be taken if you are using the multiple environments feature. It is recommended that you carefully check the diff in the resultant Kolla configuration by following these steps to check for missing config or duplicated config options. The kolla_openstack_custom_config_environment_merging_enabled option can be set to False to revert back to the old behaviour.

stackhpc/14.0.0.15

New Features

  • The Cephadm pre and post commands now support default commands with the variables cephadm_commands_pre_default and cephadm_commands_post_default. As such, any extra commands should be added to the variables cephadm_commands_pre_extra and cephadm_commands_post_extra.

  • Rocky Linux 9 image has been rebuilt with missing base packages (e.g. microcode_ctl) by installing ‘Minimal Install’ DNF group. Also cloud-init from CentOS 9 Stream has been installed with NetworkManager support.

Bug Fixes

  • Fixes an issue when live migrating instances to hosts with cgroups v2 enabled (Ubuntu Jammy and Rocky 9). See Nova bug report.

  • Fixes a race condition when launching multiple Ironic instances in parallel (as is commonly triggered when using Terraform/OpenTofu). See Nova bug report.

  • When using custom SCA policies for Wazuh, the agents are now correctly configured to allow commands to be executed from the manager.

  • Fixes an issue with Ansible Pulp modules depending on the pulp_glue Python library since the pulp.squeezer 0.0.14 release.

  • Fixes an issue with Kolla container image builds for Ubuntu where the release train package repositories could be behind the container image, leading to image build failures.

stackhpc/14.0.0.14

Bug Fixes

  • Rebuild and bump the Bifrost container for Xena to include fix for Error while running update_to_latest_versions: ‘’BIOSSetting’’ object has no attribute during Ironic database migrations on upgrade

  • Disabled custom APT configuration for non-overcloud hosts (Ubuntu Only). This resolves the issue of the seed hypervisor attempting to pull packages from the repository on the seed before it has been deployed.

stackhpc/14.0.0.12

New Features

  • This patch adds OpenStack Capacity metrics and exporters to StackHPC Kayobe Config. This includes a deployment playbook, Prometheus scrape jobs and HAProxy configurations to support this change.

  • Adds ethtool and pciutils to the overcloud host disk image.

  • Raises an alert when the count of RabbitMQ ready messages increases above a threshold.

  • Adapt threshold of RabbitMQ connection alert based on the size of the deployment to avoid spurious alerts.

  • Wazuh can now de deployed with additional custom SCA policies. Just add the policy file(s) to the directory {{ kayobe_env_config_path }}/wazuh/custom_sca_policies.

Upgrade Notes

  • Rebuilt all kolla and package repo tags to bring in kernel fixes and apply CentOS image build customisations that were previously being ignored.

  • To deploy the OpenStack Capacity Grafana dashboard, you must define OpenStack application credential variables: secrets_os_capacity_credential_id and secrets_os_capacity_credential_secret as laid out in the ‘Monitoring’ documentation.

    You must also enable the stackhpc_enable_os_capacity flag for OpenStack Capacity HAProxy and Prometheus configuration to be templated.

    You may also change the default authentication URL from the kolla_internal_fqdn and change the default OpenStack region from RegionOne with the variables: stackhpc_os_capacity_auth_url and stackhpc_os_capacity_openstack_region_name.

    To disable certificate verification for the OpenStack Capacity exporter, you can set stackhpc_os_capacity_openstack_verify to false.

stackhpc/14.0.0.9

Upgrade Notes

  • Enabled ML2/OVN by default. Checks preventing accidental migration from ML2/OVS were added in Kolla Ansible. If you are using a Neutron plugin other than ML2/OVN, set kolla_enable_ovn to false.

    OVN distributed FIP is disabled, to enable it set neutron_ovn_distributed_fip to true in etc/kayobe/kolla/globals.yml.

  • The reboot.yml custom Ansible playbook now defaults to reboot only one host at a time. Existing behaviour can be retained by setting ANSIBLE_SERIAL=0.

Security Issues

  • The Rocky 8 minor version has been bumped to 8.8 and new snapshots have been created to include fixes for Zenbleed (CVE-2023-20593), Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.

  • The snapshots for Rocky 9.2 have been refreshed to include fixes for Zenbleed (CVE-2023-20593), Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.

stackhpc/14.0.0.8

Upgrade Notes

  • The path used to store Wazuh certificates has changed. local_certs_path is now set to the environment directory e.g $KAYOBE_CONFIG_PATH/environments/<environment>/wazuh or $KAYOBE_CONFIG_PATH/wazuh/ if not using environments. The contents of $KAYOBE_CONFIG_PATH/ansible/wazuh/certificates should be moved to the new location and the empty directory should be removed.

  • The local_custom_certs_path variable has been removed. Custom wazuh certificates should be moved to $KAYOBE_CONFIG_PATH/environments/<environment>/wazuh/wazuh-certificates/ if using environments, or $KAYOBE_CONFIG_PATH/wazuh/wazuh-certificates if not.

stackhpc/14.0.0.6

New Features

  • Provide ELRepo 9, which in turn provides packages to support be2net and mpt3sas hardware. Configuration of ELRepo 9 is disabled by default and may be enabled by setting dnf_install_elrepo_9: true.

  • Nvmemon now reports physical size of the disk.

Upgrade Notes

  • CentOS Stream 8 snapshots have been bumped and new container images are available. Make sure to sync these into your local pulp. The yum repositories must be reconfigured to exclude a buggy version of iptables. To do this use: kayobe overcloud service reconfigure -kt none -t dnf.

  • CentOS Extras has been replaced with CentOS Extras Common. You may need to use the --allowerasing option with DNF if you have packages installed from the old repo. This is a one time only thing and on the next package update you can drop this argument.

  • Configure Nova to use more modern ‘q35’ libvirt machine type rather than ‘pc’ which is considered legacy.

  • Instance labels in prometheus now use inventory hostnames rather than IPs.

Security Issues

  • Bumps CentOS Stream 8 snapshots to include fixes for Zenbleed (CVE-2023-20593) and Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.

  • Bumps Ubuntu repository snapshots and container images to bring in latest security patches. This includes the microcode to patch Downfall (CVE-2022-40982). Zenbleed (CVE-2023-20593) was patched in the previous snapshot bump. To apply the microcode updates, it is recommended to reboot each host after upgrading all of the packages.

Bug Fixes

  • Fixes an issue with local image builds where kolla_tag had not been set. The error had the signature:

  • Upstream package repository mirrors are now restored in Kolla container images. This makes it possible to install or update packages for debugging purposes.

stackhpc/14.0.0.1

New Features

  • Add blazar project Kolla container images. Blazar is a resource reservation service for OpenStack. Blazar enables users to reserve a specific type/amount of resources for a specific time period and it leases these resources to users based on their reservations.

  • Adds caso container images. cASO is an is an accounting reporter that supports Cloud Accounting Usage Records. For more information, see the upstream docs. Note that this container does not exist in upstream Kolla and is maintained downstream by StackHPC.

  • Adds code to the globals.yml file to add endpoints for the ceph_mgr_exporter. If ceph is configured correctly, managers will be under the mgrs inventory group. If this group is empty, then the variable will just be empty (the KA default). This also requires setting kolla_enable_prometheus_ceph_mgr_exporter to true.

  • The playbook hotfix-containers.yml has been added. This allows arbitrary files to be copied into, and/or arbitrary commands to be executed within, overcloud containers.

  • Support for Ubuntu 22.04 Jammy Jellyfish repositories have been added to the Yoga Release.

  • Set monitoring services be enabled by default in the ci-multinode environment.

  • OpenSearch container images have been added.

  • Add the package repository configuration required for Rocky Linux 9 support.

    Add CI for Rocky 9 hosts.

  • Added support for Rocky Linux 9.2 repositories and made 9.2 the default version.

  • Adds support for using a VMs as compute and controller nodes in the ci-multinode environment by dynamically setting the MTU of the networks in networks.yml and removing the static definition of the network interfaces for the compute and controller groups.

  • Add Wazuh deployment playbook.

  • Adds utility playbooks to build and rotate amphora images. For more details check out the Octavia section of the Operator Guide included in the documentation.

  • Brings in new neutron container images to add batching support to Networking Generic Switch. This is opt in via the ngs_batch_requests configuration option and only affects Ironic deployments that use Networking Generic Switch. See the following PR for more details.

  • Updates neutron containers to contain a version of networking-generic-switch with support for trunk ports when using DellOS 10 or Cisco switches. See this PR for more details.

  • Updates neutron containers to contain a version of networking-generic-switch with support for DellOS 10. See this PR for more details.

  • Improvements to the ci-aio automated deployment script to allow the script to successfully run on LVM-based images.

  • Added a script to the AIO environment that can be used to quickly deploy an AIO for testing.

  • Adds time information to tasks using the ansible.posix.profile_tasks callback.

  • Adds some basic tuning of Ansible, including use of 20 forks, enabling SSH pipelining, YAML-formatted output, and disabling fact variable injection.

  • magnum container now has capi driver

  • Adds support for using Ceph HAProxy and Keepalived images stored in Pulp. This is enabled automatically if stackhpc_sync_ceph_images is set to true.

  • Mariabackup is now enabled by default.

  • The flag om_enable_rabbitmq_high_availability is now set to true. Adds tags for new RabbitMQ containers to update to RabbitMQ version 3.9.22.

  • Adds an etcd Kolla container image. This can be used for OpenStack service coordination as a tooz backend, or for batched processing of switch configuration in Networking Generic Switch (this requires a downstream NGS patch).

  • Adds drive temperatures to the table on the hardware overview dashboard and a timeseries to show the temperature over time.

  • Adds picker to hardware overview dashboard to select a specific host to show drive information for.

  • Adds support for synchronising HashiCorp Consul and Vault images to a local Pulp registry.

  • Adds a new variable stackhpc_pulp_sync_for_local_container_build which, when set to true, configures the local Pulp server to sync all package repositories required for building kolla containers on a local kolla build host.

  • Enable TLS for the Seed Pulp service. Set pulp_enable_tls: true and provide paths to a TLS certificate and key using pulp_cert_path and pulp_key_path respectively.

  • Adds a standard LVM configuration that is compatible with the new overcloud host image.

  • adds helm client into magnum container

  • Adds support for Manila in the ci-multinode environment using the CephFS native backend. This is disabled by default, but can be enabled by setting the following variables in the kayobe configuration: kolla_enable_manila: true kolla_enable_manila_backend_cephfs_native: true

  • Updated the documentation for the ci-multinode to include instructions on how to set up and test Magnum.

  • Added support for Wazuh in the ci-multinode environment.

  • Updates Prometheus Node exporter to version 1.5.0.

  • Adds NTP alerts to prometheus alertmanager.

  • Adds alerts for Octavia load balancers and amphorae. Alerts are triggered when load balancers enter the ERROR or DEGRADED states, or when amphorae enter the ERROR state.

  • Adds a new Grafana dashboard for Octavia. This dashboard is used to monitor the load balancers as well as the amphorae.

  • Adds a standard overcloud Diskimage Builder (DIB) host image configuration.

  • Prebuilt overcloud host images can now be pulled from Ark using the stackhpc_download_overcloud_host_images variable. The image is selected based on os_distribution and os_release.

  • Re-enable Pulp Ubuntu repositories.

  • Package repositories and container images for CentOS Stream based deployments have been updated. Key packages to note are:

    • Kernel

      • version: 4.18.0

      • release: 448.el8

    • Libvirt

      • version: 8.0.0

      • release: 6.module_el8.7.0+1140+ff0772f9

    • OVS

      • version: 2.17.0

      • release: 71.el8s

    • OVN

      • version: 22.09.0

      • release: 11.el8s

  • Container images for Ubuntu based deployments have been updated. Key packages to note are:

    • Libvirt

      • version: 8.0.0

      • release: 1ubuntu7.4~cloud0

    • OVS

      • version: 2.17.3

      • release: 0ubuntu0.22.04.1~cloud0

    • OVN (unchanged since last container build)

      • version: 22.03.0

      • release: 0ubuntu1~cloud0

  • Sync Rocky Linux 8.7 RPM repositories to local Pulp servers.

  • Enables SMART monitoring. Manual action is required, please see the monitoring documentation for the procedure.

  • Split cephadm_commands into cephadm_commands_pre and cephadm_commands_post commands. This allows the user to run commands that must be run before the rest of the post-deployment configuration, as well as commands that rely on resources created by the post-deployment config.

  • Updates Grafana to 9.4.7 version.

  • Upgrades Pulp from 3.21 to 3.22.

  • Disables Pulp analytics.

  • Sets Pulp worker based on available CPU cores. This may improve performance when pulling container images to many hosts simultaneously.

  • Upgrades Pulp from 3.22 to 3.23.

  • Upgrades Pulp from 3.23 to 3.24.

  • Adds support for package repository snapshots via Pulp. A local Pulp server is deployed on the seed, which syncs package repositories and container images from the StackHPC Ark Pulp server. Control plane servers pull packages and container images from the local Pulp server.

  • The EPEL package repository is disabled by default. It may be enabled by setting dnf_enable_epel to true.

  • Uses StackHPC source code repositories for kolla, kolla-ansible, and bifrost.

  • Supports Kolla CentOS Stream 8 source container images.

  • Adds custom playbooks for compute host maintenance:

    • nova-compute-drain.yml

    • nova-compute-disable.yml

    • nova-compute-enable.yml

    • reboot.yml

  • Adds a custom playbook to reset the RabbitMQ cluster and restart OpenStack services that use it, rabbitmq-reset.yml.

  • Adds a custom playbook to configure swap, swap.yml.

  • Adds the Kayobe Automation Git repository as a submodule, and provides some basic configuration for it in an .automation.conf directory.

  • Adds support for deploying a Squid caching proxy as a custom container on the seed.

  • Enables Elasticsearch, Grafana, Kibana, Prometheus by default. Provides standard dashboards for Grafana and alerting rules for Prometheus.

Upgrade Notes

  • Bumped focal package versions due to unmet depenencies

  • Bumps octavia container versions

  • Bumped rocky 9 package versions due to missing snapshot

  • container tags for magnum capi changes

  • Updates Ceph Pacific container image to v16.2.11.

  • Automatically install Quincy if the node is running Ubuntu 22.04, else install Pacific.

  • Increase stackhpc.cephadm collection to version 1.12.2.

  • Enables Docker live restore by default. This may be disabled by setting docker_daemon_live_restore to false in docker.yml.

  • The flag om_enable_rabbitmq_high_availability is now set to true. As this enables durable queues, RabbitMQ will need to be reset, and the services which use it restarted. Tags are added to update the RabbitMQ containers to version 3.9.22.

  • The overcloud host image build workflow now uploads the built image to SMS as well as ARK, allowing it to be tested both manually and through AIO CI jobs.

  • Updated OVN package version from 22.06 to 22.09.

  • openvswitch version has been updated to ~2.17.5 on all distributions (CentOS/Rocky9 are two patches ahead of 2.17.5). Images include fixes for CVE-2023-1668.

    Ubuntu repository versions for focal and ubuntu cloud archive have been updated to 20230515.

  • Kolla tag overrides have been refactored to allow kolla-ansible to resolve them individually by host. This means that mixed clouds can be deployed which allows for migration between distributions.

  • Dont pull apt packages from pulp for Ubuntu Jammy until Jammy packages are published.

  • Dont pull ceph packages from ceph official repos for Ubuntu Jammy until Jammy packages are published.

  • Updates the smartmon-tools.yml playbook to ensure that cron is installed before attempting to configure crontab.

  • Updated Ubuntu package repository versions.

Bug Fixes

  • Added NetworkManager-config-server package to Rocky Linux 9 deployment image. Which prevents NetworkManager from automatically running DHCP on unconfigured ethernet devices and allows connections with static IP addresses to be brought up even on ethernet devices with no carrier.

  • Fixed a syntax error in Prometheus SMART monitoring rules.

  • Caps the number of Pulp API and content workers to 32 each to avoid errors on hosts with many CPUs.

  • Fixes the hardware overview dashboard to use the correct metric for displaying drive temps. Now uses an or to display whichever metric is compatible with the drives in the system. The two metrics are temperature_case_raw_value and temperature_celsius_raw_value.

  • Fixes the issue with using SAML2 federation in Keystone against NetIQ IdP.

  • Fixes internet connectivity for VMs deployed in the ci-multinode environment.

  • Fixes creation of over 1TB memory VMs on AMD with IOMMU enabled on Rocky Linux 9.

  • Fixes the smartmon script to be case insensitive when checking for the inital SMART info. This is to ensure that the script works correctly on systems where the output of smartctl -i is not capitalised as previously expected by the script. This leads to badly formatted .prom files which lead to node_exporter failing to scrape the file.

  • Fixes the InstanceDown alerting rule wait time to be consistent with the alert message. The alert message says “for 5 minutes” but the rule was set to wait for 1 minute.

  • Updates nova image to bring in a fix for parsing mdev uuids when using libvirt>=7.7. See bug for more details.

  • Add unit to LowMemory alert description.

  • Fixes Octavia health monitors not being created on cluster spawn.

  • Fixes CoreDNS for Magnum clusters crashing on startup.

  • Allows cinder-csi nodeplugin to start on the same Magnum cluster host as cinder-csi controllerplugin.

  • Corrects ClusterRole rules for Magnum cluster-autoscaler, and sets cluster-autoscaler pods to use hostNetwork.

  • Disables metadata proxy over IPv6 inside Neutron DHCP agent to work around bug 1953165.

  • Fix for nova resize API not parsing the new flavor on resize - bug 1805969.

  • Fix creation of VM instances with UEFI enabled and Secure Boot disabled.

  • Fixes documentation builds on Read the Docs.

  • Fixes synchronisation and DNF configuration of the Rocky Linux 9 CRB repository.

  • HAProxy alerting rules have been updated to use the server name that is down, rather than the name of the instance that reported the down server.

Other Notes

  • deployment guide docs added for new capi driver

  • Reduced verbosity in etc/kayobe/pulp.yml

  • Changes the Grafana OpenStack dashboard to show HTTP status 300 as green instead of orange.

  • Adds a ci-aio environment for CI testing.

  • Adds a ci-builder environment for building Kolla container images in CI.