2023.1 Antelope Series Release Notes¶
stackhpc/14.0.0.274-16¶
Security Issues¶
Fixes CVE-2026-42998, CVE-2026-42999, CVE-2026-43000, CVE-2026-43001 and CVE-2026-44394 with updated Keystone images.
stackhpc/14.0.0.274¶
Security Issues¶
Fixes CVE-2026-33551 with updated Keystone images
stackhpc/14.0.0.259¶
Security Issues¶
Security fixes for bug 2119646: Unauthenticated access to EC2/S3 token endpoints can grant Keystone authorization.
stackhpc/14.0.0.258¶
Bug Fixes¶
Bumps Nova images to fix Launchpad bug 2098892.
stackhpc/14.0.0.247¶
Bug Fixes¶
Updated Neutron container image tags to fix CVE-2024-53916. See #2037002 for more details.
stackhpc/14.0.0.246¶
Bug Fixes¶
Reverts “Track all interfaces in Keepalived” so only HA interfaces are tracked. This prevents L3 HA router flapping when detaching floating IP addresses, because non-HA router interfaces did not include “no_track”. Closes-Bug: #2097770
stackhpc/14.0.0.245¶
Bug Fixes¶
Fix some broken links in the docs.
stackhpc/14.0.0.244¶
New Features¶
Workflow to update Kolla dependencies (Kayobe, Kolla and Kolla-Ansible) to the latest tag available in the StackHPC branch via CI.
stackhpc/14.0.0.243¶
Bug Fixes¶
Changed the Prometheus job name of OS Capacity exporter to
os_capacitywhich is what Azimuth is expecting to have for cloud metrics dashboard.
stackhpc/14.0.0.239¶
Bug Fixes¶
Update neutron container images to apply keepalived PID clean up fix With this change, Neutron always deletes stale PID file if exists.
stackhpc/14.0.0.238¶
New Features¶
Added a new playbook pulp_sync_publish_promote that can be used to sync, publish and promote all repositories in a single step, as well as sync and publish container repos. If you do not want to promote repos then run with
-e repo_promote_production=false.
stackhpc/14.0.0.237¶
Bug Fixes¶
Updates the
nova-computecontainer image to fix bug 2091033. This bug would causenova-computeto freeze, which would result in frequent monitoring alerts.
stackhpc/14.0.0.236¶
Bug Fixes¶
Fixed an issue with the
prometheus.ymltemplate which would break when deploying alertmanager.
stackhpc/14.0.0.234¶
New Features¶
Updates the StackHPC Cephadm Ansible collection from 1.18.0 to 1.19.1.
stackhpc/14.0.0.229¶
Bug Fixes¶
Fixes an issue where Squid proxy could be unable to reach external servers due to a preference of choosing IPv6 connectivity by default.
stackhpc/14.0.0.227¶
Bug Fixes¶
Updates Cinder container images to fix bug 1823445 (
cinder.exception.MetadataCopyFailure).
stackhpc/14.0.0.223¶
Bug Fixes¶
OVN packages in Rocky Linux 9 container images have been updated to the latest minor release in the
24.03series:ovn24.03-24.03.2-34. Neutron container images for Rocky Linux 9 have also been rebuilt.
stackhpc/14.0.0.218¶
New Features¶
Configures the Ironic Python Agent with useful settings for inspection, such as the
extra-hardwareandmellanoxelements.
stackhpc/14.0.0.217¶
New Features¶
Use the StackHPC fork for building Blazar images with customizations to support flavor-based reservation.
stackhpc/14.0.0.216¶
New Features¶
The Openstack Dashboard in Grafana now includes logs from Openstack services.
stackhpc/14.0.0.215¶
Bug Fixes¶
The CIS hardening scripts no longer change permissions of log files by default. It is preferred to configure these permissions at source i.e on whatever is creating the files. It also suffered from a time-of-check to time-of-use race condition. If you want the old behaviour you can change
rhel9cis_rule_4_2_3and/orubtu22cis_rule_4_2_3totrue.
stackhpc/14.0.0.212¶
Bug Fixes¶
Changes the duration for which redfish exporter must continually fail scrapes before triggering an alert to 15 minutes. This should hopefully reduce some alert spam.
stackhpc/14.0.0.211¶
Bug Fixes¶
Fixes an issue where setting
redfish_exporter_scrape_groupto a value other thanovercloudwould exclude those nodes from the redfish exporter scrapes.
stackhpc/14.0.0.207¶
Security Issues¶
Fixes OSSA-2024-004 with updated container images for Ironic.
stackhpc/14.0.0.205¶
New Features¶
Upgrades kayobe-automation submodule to
7676aa8.Upgrades kayobe-workflows collection to
v1.1.0.Kayobe-automation config-diff now runs in parallel and generates both the old and new configuration at the same time. This should improve config-diff wait times.
Add support for the pulp-sync-content run book.
Deprecation Notes¶
Kayobe-automation will now automatically detect vaulted files for the purpose of config-diff therefore,
KAYOBE_CONFIG_SECRET_PATHS_EXTRAandKAYOBE_CONFIG_VAULTED_FILES_PATHS_EXTRAare no longer used
Security Issues¶
The upgraded kayobe-workflows collection increases the version of various Actions and containers used within GitHub based workflows, including increasing Docker in Docker to version
27.3.1thus removing the vunerabilities present in24.0-git.
stackhpc/14.0.0.204¶
Bug Fixes¶
Fixes creation and failover of Octavia TLS-terminated load balancers when storing the certificate and key as a PKCS12 bundle in Barbican.
stackhpc/14.0.0.196¶
Bug Fixes¶
Fixes a file descriptor leak in networking-mlnx which prevented VMs using Infiniband virtual functions from provisioning after a period of time.
Fixes
KeyError: ip_versionin networking-mlnx when used in conjuction with OVN mechanism driver.
stackhpc/14.0.0.195¶
New Features¶
A default firewall configuration is now included on an opt-in basis. The rules are defined under
etc/kayobe/inventory/group_vars/all/firewall. More information can be found here
stackhpc/14.0.0.194¶
New Features¶
The default Tempest concurrency has been increased from 2 to 16. This is often easily achievable in production systems.
stackhpc/14.0.0.193¶
Upgrade Notes¶
RabbitMQ and Erlang packages are now all installed from the Cloudsmith
rabbitmq.commirrors since the RabbitMQpackagecloud.iois getting shut down August 18st, 2024: https://www.rabbitmq.com/blog/2024/08/11/package-repository-updates#packagecloud-will-be-discontinued-on-aug-18th-2024
stackhpc/14.0.0.192¶
Bug Fixes¶
Bumps Neutron container image tags to fix bug 2068644 which could prevent associating floating IPs with OVN-based load balancers.
stackhpc/14.0.0.190¶
New Features¶
Adds the
networking-mlnxmechanism driver to the Neutron Server container andebrctlutility to the Nova Compute container. This allows you to use thekolla_enable_neutron_mlnxfeature flag.
stackhpc/14.0.0.185¶
Bug Fixes¶
Fixes a regression when using
growroot.ymland software raid where the playbook would fail to identify the correct disk.
stackhpc/14.0.0.178¶
Security Issues¶
Fixes CVE-2024-44082 with updated container images for Ironic services. Note that Ironic Python Agent images also need to be updated to fully fix this vulnerability. If this is not possible, a new configuration option
[conductor]conductor_always_validates_imagesis available. See the OSSA-2024-003 description for more details.
stackhpc/14.0.0.174¶
New Features¶
Add playbook to install pre-commit hooks and register them with git. The hooks currently configured to be installed will check yaml syntax, fix new line at end of file and remove excess whitespace. This is currently opt-in which can be achieved by running install-pre-commit-hooks playbook.
stackhpc/14.0.0.172¶
New Features¶
Adds alternative RabbitMQ container images for versions 3.11, 3.12 and 3.13. This allows us to perform intermediary RabbitMQ upgrades prior to a SLURP upgrade to Caracal. See the Kolla docs for more details: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#slurp
stackhpc/14.0.0.166¶
New Features¶
magnum-capi-helmdriver has been updated to 1.1.0. Please see magnum-capi-helm release notes for changes.
stackhpc/14.0.0.162¶
New Features¶
Added a script to automate RabbitMQ quorum queue migrations.
stackhpc/14.0.0.161¶
New Features¶
Adds two new custom playbooks for placing Ceph hosts into and removing them from maintenance:
ceph-enter-maintenance.ymlceph-exit-maintenance.yml
Upgrade Notes¶
Updates the
stackhpc.cephadmcollection to version1.18.0.
Bug Fixes¶
Fixes an issue with idempotency in the
stackhpc.ceph.cephadm_keysplugin.
stackhpc/14.0.0.156¶
New Features¶
OVNversion in Rocky Linux 9 container images has been updated to24.03(latest LTS).
stackhpc/14.0.0.155¶
Bug Fixes¶
Fixes the issue with interface names containing dashes in Hashicorp collection.
stackhpc/14.0.0.147¶
Bug Fixes¶
Updates Octavia container images to fix a maintenance task that was breaking OVN IPv4 load balancers with health monitors. LP#2072754.
stackhpc/14.0.0.146¶
New Features¶
Added a new group variable -
stackhpc_repos_enabled- for unified control over usage of StackHPC Release Train package repositories. This makes it easier to set which hosts do or do not pull packages from release train.
stackhpc/14.0.0.143¶
Critical Issues¶
Fixes CVE-2024-40767 with updated container images for Nova services.
stackhpc/14.0.0.142¶
New Features¶
Added support for Rocky Linux 9.4 repositories and Kolla containers. Made 9.4 the default version for Rocky Linux.
Updated Rocky Linux 9.3 pulp repo versions. Added Rocky Linux pulp repo versions. Rebuilt Kolla containers with Rocky 9.4.
stackhpc/14.0.0.139¶
New Features¶
The Docker CE package for Ubuntu has been bumped from
5:24.0.6-1to5:25.0.0-1This is a side effect of separating out the repos for Docker CE for Ubuntu Jammy/Focal.
Critical Issues¶
Disables password expiration and inactivity policies. This caused the kayobe and kolla service accounts to be locked out of the system. You should re-apply the CIS benchmark hardening playbook as soon as possible to avoid being locked out of your system.
Bug Fixes¶
Separated out repos for Docker CE for Ubuntu Jammy/Focal. This fixes a Pulp sync issue where two “identical” repository versions existed with different checksums.
stackhpc/14.0.0.126¶
New Features¶
Adds support for deploying a Prometheus Redfish exporter container on the seed. This can be used to query the overcloud BMCs via their redfish interfaces to produce various metrics relating to the hardware, and system health.
stackhpc/14.0.0.125¶
New Features¶
Adds a hook to automatically run the CIS benchmark hardening playbooks as part of host configure. This is guarded by the
stackhpc_enable_cis_benchmark_hardening_hookconfiguration option and is disabled by default.
stackhpc/14.0.0.124¶
Security Issues¶
Adds a custom Apt repository to address CVE-2024-6387 in OpenSSH.
stackhpc/14.0.0.122¶
Upgrade Notes¶
To match the new CIS benchmark defaults on Ubuntu, you should remove the
ipv6.disable=1kernel command line option. If you wish to carry on with the current settings, changeubtu22cis_ipv6_requiredtofalse.
Bug Fixes¶
IPV6 is no longer disabled by default in the Ubuntu CIS hardening. If using the old behaviour you may hit 2071443.
stackhpc/14.0.0.118¶
Security Issues¶
Updates the Rocky Linux 9 SIG Security Common repository to address CVE-2024-6409 in OpenSSH.
stackhpc/14.0.0.117¶
Bug Fixes¶
Fixed incorrect Opensearch Dashboards Prometheus Blackbox Exporter configuration.
stackhpc/14.0.0.114¶
New Features¶
Adds a new Prometheus alert
FluentdBufferTooLargewhich is raised when the total size of queue buffers grows above 128 MiB.
stackhpc/14.0.0.113¶
Security Issues¶
Enables the Rocky Linux 9 SIG Security Common repository, which provides updated OpenSSH packages addressing CVE-2024-6387 (regreSSHion). Other packages available in this repository are currently ignored.
stackhpc/14.0.0.110¶
Security Issues¶
Addresses critical vulnerability CVE-2024-36039 by bumping the PyMySQL library to 1.1.1 in all affected Kolla images. This vulnerability allows SQL injection through untrusted JSON objects.
stackhpc/14.0.0.107¶
Critical Issues¶
Fixes CVE-2024-32498 with updated container images for Cinder, Glance and Nova services.
stackhpc/14.0.0.106¶
New Features¶
Added a templated set of default Prometheus Blackbox exporter endpoints.
stackhpc/14.0.0.102¶
New Features¶
Per OSD usage metrics are now available in the OSDs dashboard. The dashboard now includes a new section that displays a histogram of of the utilization of each OSD in the cluster. This can be useful for identifying OSDs that are outliers in terms of utilization, and may need to be rebalanced. Additionally, there is a histogram displaying the usage of the bluestoreDB for each OSD.
stackhpc/14.0.0.100¶
New Features¶
Adds a new
diagnostics.ymlplaybook that collects diagnostic information from hosts. The diagnostics are aggregated to a directory ($PWD/diagnostics/by default) on localhost. The diagnostics include:Docker container logs
Kolla configuration files
Log files
The collected diagnostic information contains sensitive information such as passwords in configuration files.
stackhpc/14.0.0.98¶
New Features¶
Adds a new
stackhpc-openstack-tests.ymlplaybook that executes tests in the StackHPC OpenStack Tests repository. Both the playbook and tests are currently experimental, and are currently targeting only an all-in-one CI use case.
stackhpc/14.0.0.96¶
Bug Fixes¶
Fixes an issue where HashiCorp Vault standby nodes would trigger a Prometheus alert. To apply this fix to an existing system, the HAProxy configuration for Vault (
kolla/config/haproxy/services.d/vault.cfg) must be manually updated following the Vault documentation.
Updates the
stackhpc.hashicorpAnsible collection to 2.5.0. This brings in an idempotency fix for generating certificates.
The overcloud HashiCorp Vault playbooks have been modified to use the local Vault service rather than via HAProxy. This makes it possible to deploy and use Vault without HAProxy. This eliminates the previous bootstrapping issue where HAProxy needed to be deployed without TLS enabled while generating initial certificates.
stackhpc/14.0.0.94¶
New Features¶
Added two alerts (warning and critical) that are triggered when the ratio of (free_swap_space / total_swap_space) is below thresholds. Each threshold can be modified by altering value of
alertmanager_node_free_swap_warning_threshold_ratioandalertmanager_node_free_swap_critical_threshold_ratio.Currently this solution has limitation of having one-size fits all policy. This can cause unwanted alerts for the hosts which utilise swap heavily Therefore it is recommended to tune the thresholds or apply silence rules for the needs.
Bumped Horizon kolla image Bumped Grafana from 10.1.5-1 to 10.4.2-1 (CentOS & Rocky Linux) Bumped Grafana from 10.4.1 to 10.4.2 (Ubuntu) Bumped Prometheus-msteams from 1.5.0 to 1.5.2
Adds support for providing a CA certificate for OpenStack Capacity exporter.
Allows to synchronise a custom list of containers to Pulp using the
stackhpc_pulp_repository_container_repos_extraandstackhpc_pulp_distribution_container_extravariables.
Bumped Horizon kolla image Bumped Grafana from 10.1.5-1 to 10.4.2-1 (Rocky Linux) Bumped Grafana from 10.4.1 to 10.4.2 (Ubuntu) Bumped Prometheus-msteams from 1.5.1 to 1.5.2
Security Issues¶
Fixed CVE-2023-31047 for Horizon. Fixed CVE-2023-49569 for Grafana. Fixed CVE-2022-40083 and CVE-2021-4238 for Prometheus-msteams.
Fixed CVE-2023-31047 for Horizon. Fixed CVE-2023-49569 for Grafana. Fixed CVE-2022-40083 and CVE-2021-4238 for Prometheus-msteams.
stackhpc/14.0.0.92¶
New Features¶
Updates Magnum CAPI Helm driver version to OpenDev v1.0.0
stackhpc/14.0.0.87¶
Known Issues¶
Generate backend TLS files for network hosts. This fixes backend TLS configuration for deployments where some API services are running on network hosts.
Bug Fixes¶
Prevents raising a Ceph
PgsUncleanalert because of backfilling which can frequently happen because of normal rebalancing activities, such as use of the Ceph balancer or OSD addition.
stackhpc/14.0.0.83¶
New Features¶
Add optional support for relabelling network devices in Prometheus. Use network names as defined in kayobe, instead of network device names. Reuse of device names within an environment is not supported.
stackhpc/14.0.0.81¶
New Features¶
Bumped pulp repo versions for Q2 2024 Bumped Kolla image tags for Q2 2024 Bumped prometheus server from 2.38.0 to 2.51.1 Bumped prometheus alertmanager from 0.24.0 to 0.26.0 Bumped prometheus blackbox exporter from 0.23.0 to 0.25.0 Bumped prometheus cadvisor exporter from 0.48.0 to 0.49.1 Bumped prometheus haproxy exporter from 0.13.0 to 0.15.0 Bumped prometheus memcached exporter from 0.10.0 to 0.14.3 Bumped prometheus msteams from 1.5.1 to 1.5.2 Bumped prometheus mtail from 3.0.0-rc50 to 3.0.0-rc53 Bumped prometheus mysqld exporter from 0.15.0 to 0.15.1 Bumped prometheus node exporter from 1.4.0 to 1.7.0 Bumped prometheus openstack exporter from 1.6.0 to 1.7.0 Bumped prometheus ovn exporter from 1.0.6 to 1.0.7 Bumped opensearch from 2.11.1-1 to 2.13.0-1 (Rocky Linux 9) Bumped opensearch from 2.12.0 to 2.13.0 (Ubuntu Jammy) Bumped grafana from 10.1.5-1 to 10.4.2-1 (Rocky Linux 9) Bumped grafana from 10.4.0 to 10.4.2 (Ubuntu Jammy)
Security Issues¶
Fixed CVE-2023-31047, CVE-2023-23969, CVE-2023-24580, CVE-2023-36053, CVE-2023-46695, CVE-2023-30861, CVE-2022-4899. CVE-2024-1135, GHSA-2m57-hf25-phgg, CVE-2023-0286, CVE-2023-50782, CVE-2024-26130 for openstack services.
Fixed CVE-2022-41723, CVE-2023-39325 (except prometheus-alertmanager, prometheus-msteams-exporter, prometheus-haproxy-exporter, prometheus-openstack-exporter. No patch available.), CVE-2021-43565, CVE-2022-27191, CVE-2022-27664, CVE-2021-38561, CVE-2022-21698, CVE-2021-4238, CVE-2022-40083, CVE-2022-41721, CVE-2021-33194, CVE-2023-2253, CVE-2023-27561, CVE-2023-28840, CVE-2024-21626, CVE-2022-32149, CVE-2023-45142, GHSA-m425-mq94-257g for prometheus server and exporters except prometheus-libvirt-exporter and prometheus-haproxy-exporter. (Source repository of each are archived and no longer maintained)
Fixed CVE-2023-39325, CVE-2023-45142, CVE-2023-47108, CVE-2023-49568, CVE-2023-49569, GHSA-9763-4f94-gfch, GHSA-m425-mq94-257g for grafana.
It is advised to redeploy service with current version of images from StackHPC Release Train.
stackhpc/14.0.0.80¶
New Features¶
Supports adding CA certificates to the Tempest container trust store.
stackhpc/14.0.0.79¶
Bug Fixes¶
The OpenSearch backend for CloudKitty has been fixed, so the Horizon
Ratingpanels work again.
stackhpc/14.0.0.78¶
New Features¶
Adds a new Prometheus alert
HostNetworkBondDegradedwhich will be raised when at least one bond member is down.
Adds a new Prometheus alert
HostNetworkBondSingleLinkwhich will be raised when a bond is configured with only one member. This can happen when NetworkManager detects that a bond member is down at boot time. This alert can be disabled by settingalertmanager_warn_network_bond_single_linktofalse.
stackhpc/14.0.0.77¶
Bug Fixes¶
Adds a custom
fix-houston.ymlplaybook to address dmesg errors, specifically: “tc mirred to Houston: device bond0-ovs is down”. This error typically appears when OVS HW offloading is enabled, often in conjunction with VF-LAG and ASAP^2. Detailed usage instructions are provided within the playbook’s comments. Additional context is available at the following links: LP#1899364 Kernel Patch
stackhpc/14.0.0.76¶
Bug Fixes¶
Fixes appending to ca.crt in make-cert-client.sh causing multiple identical ca certs being added into /etc/kubernetes/certs/ca.crt.
stackhpc/14.0.0.73¶
Security Issues¶
Update Horizon on Ubuntu to include apache2 package
2.4.52-1ubuntu4.8which fixes CVE-2023-31122.
stackhpc/14.0.0.72¶
New Features¶
Updates Magnum CAPI Helm driver version to v0.13.0
stackhpc/14.0.0.71¶
New Features¶
Updates Magnum CAPI Helm driver version to v0.11.0
Automatic deployment for OpenStack Capacity via a Kayobe service deploy hook using kolla admin credentials.
Upgrade Notes¶
Updates the Ansible configuration to fail on any unparsed inventory source. If you are using a separate Ansible configuration for Kolla Ansible, you may wish to add this setting in
etc/kayobe/kolla/ansible.cfg.
OpenStack Capacity no longer uses application credentials. Please delete any previously generated application credentials.
stackhpc/14.0.0.69¶
Upgrade Notes¶
Ensure that your deployment has only one nova-compute-ironic service running per conductor group. See the operations / nova-compute-ironic doc for further details.
Bug Fixes¶
Adds basic support and a document explaining how to migrate to a single nova-compute-ironic instance, and how to re-deploy the instance to another machine in the event of failure. See the operations / nova-compute-ironic doc for further details.
stackhpc/14.0.0.68¶
Security Issues¶
The Heat container images are rebuilt with yaql 3.0.0 to include patch for vulnerability OSSN/OSSN-0093. It is recommended that you redeploy Heat services in your system with the current version of Heat images from StackHPC Release Train.
stackhpc/14.0.0.67¶
New Features¶
Updates Magnum CAPI Helm driver version to v0.12.0
stackhpc/14.0.0.66¶
Security Issues¶
Kolla container images created using the
stackhpc-container-image-build.ymlworkflow are now automatically scanned for vulnerablilities.
stackhpc/14.0.0.64¶
Bug Fixes¶
The grafana image now includes the gnocchixyz-gnocchi-datasource and the grafana-opensearch-datasource plugins, which are the default upstream plugins.
stackhpc/14.0.0.62¶
Upgrade Notes¶
Updates Magnum CAPI Helm driver version to v0.11.0
stackhpc/14.0.0.59¶
Bug Fixes¶
Fix an issue with the OSD summary pie chart not showing any data.
stackhpc/14.0.0.55¶
Bug Fixes¶
Updates Magnum CAPI Helm driver version to v0.10.0
Fixes Grafana panel of top Ceph pools by capacity used. This panel was only showing the most used pool instead of as many pools as configured with the
$topkvariable.
stackhpc/14.0.0.54¶
New Features¶
The smartmon-tools playbook now ensures that the cron service is running as in some cases it may not be running by default.
Upgrade Notes¶
Update Ubuntu Jammy Zed Kolla container tags.
stackhpc/14.0.0.51¶
New Features¶
Adds alerts for software raid failures.
stackhpc/14.0.0.49¶
New Features¶
Adds a custom playbook (
pulp-auth-proxy.yml) for deploying an authenticating proxy for Pulp. This can be used when building container images to avoid leaking credentials for package repositories into the built images or their metadata.
stackhpc/14.0.0.42¶
New Features¶
Rocky images have been rebuilt and are now based on Rocky 9.3.
stackhpc/14.0.0.40¶
New Features¶
Adds NVMe and S.M.A.R.T utilities to the overcloud host image built by DIB.
stackhpc/14.0.0.39¶
Bug Fixes¶
Removes bogus ContainerVolumeUsage alert. This rule wasn’t correctly measuring container volume IO and could cause spurious alerts.
Add a new
reset-bls-entries.ymlcustom playbook which will rename existing Boot Loader Specification (BLS) entries using the current machine ID for each host. This should fix an issue with Grub not selecting the most recent kernel during boot.
stackhpc/14.0.0.36¶
New Features¶
Added support for Rocky Linux 9.3 repositories and Kolla containers. Made 9.3 the default version for Rocky Linux.
Updated Rocky Linux 9.2 pulp repo versions. Added Rocky Linux 9.3 pulp repo versions. Rebuilt Kolla containers with Rocky Linux 9.3.
Upgrade Notes¶
Bifrost Ironic debug logging is now disabled by default. Change
ironic_debugtotrueto revert.
Updates Consul to 1.16.4 and Vault to 1.14.8.
Bug Fixes¶
Bumps OpenSearch heap size to 8 GB, to be identical to Elasticsearch.
stackhpc/14.0.0.31¶
New Features¶
StackHPC Kayobe Configuration container images for CI/CD with Kayobe Automation are now published to GitHub Container Registry (GHCR) at ghcr.io/stackhpc/stackhpc-kayobe-config. The image is tagged with the name of the release branch, e.g.
stackhpc/yoga.
stackhpc/14.0.0.30¶
Bug Fixes¶
Previously
switchdevcapabilities should be configured manually by a user with admin privileges using port’s binding profile. This blocked regular users from managing ports with Open vSwitch hardware offloading as providing write access to a port’s binding profile to non-admin users introduces security risks. For example, a binding profile may contain apci_slotdefinition, which denotes the host PCI address of the device attached to the VM. A malicious user can use this parameter to passthrough any host device to a guest, so it is impossible to provide write access to a binding profile to regular users in many scenarios.This patch fixes this situation by translating VF capabilities reported by Libvirt to Neutron port binding profiles. Other VF capabilities are translated as well for possible future use. LP#2008238. LP#2020813.
Neutron ovn db sync operation will no longer removes OVN metadata ports in networks with Octavia OVN Load balancers health monitors. A maintenance task process has been added to update the existing OVN LB HM ports to the new behaviour defined. Specifically, the “device_owner” field will be updated from network:distributed to ovn-lb-hm:distributed. Additionally, the “device_id” will be populated during update action. LP#2038091.
stackhpc/14.0.0.26¶
Bug Fixes¶
Update Bifrost container images to include a fix for DHCP-based hardware discovery from https://review.opendev.org/c/openstack/bifrost/+/902233.
stackhpc/14.0.0.25¶
New Features¶
Adds support for deploying GitHub runners and creating GitHub workflows for use within Kayobe Automation. Two playbooks and their requirements have been added to ansible/ in addition to the relevant groups defined with some useful default variables where appropriate. Finally, documentation has been added to cover how to deploy these runners and workflows.
Added the
stop-openstack-services.ymlplaybook, which can be used to stop OpenStack services across the overcloud.
Bug Fixes¶
Pin the OCI image tag used for the Ubuntu Focal base-image of Kolla image builds. This prevents packages in the image with the latest tag getting in front of StackHPC release-train package repositories. Ubuntu tag should be bumped when new packages are available in StackHPC release-train.
stackhpc/14.0.0.22¶
New Features¶
Updates OpenSearch to 2.11.1.
stackhpc/14.0.0.21¶
Bug Fixes¶
Pin the OCI image tag used for the base-image of Rocky 9 Kolla image builds. This prevents packages in the image with the latest tag getting in front of StackHPC release-train package repositories.
stackhpc/14.0.0.20¶
New Features¶
Added the
rekey-hosts.ymlplaybook to automatically rotate the SSH keys on all hosts.
Adds support for Ubuntu Jammy and Rocky 9 to the CIS benchmark hardening playbook:
cis.yml. This playbook will need to be manually applied.
Adds a panel in the Hardware Overview dashboard to show DWPD (Drive writes per day) for NVMEs. This is calculated by dividing the total bytes written in the past 24 hours by the drive capacity. This is currently only supported on NVMEs.
Adds alerts that will fire after 1 DWPD is sustained for 7 days, and a critical alert if 1 DWPD is sustained for 30 days.
Bug Fixes¶
Fixes display of the OpenSearch cluster health in Grafana when in yellow state.
Fix Grafana HAProxy dashboard when non-default Prometheus instance labels are used.
stackhpc/14.0.0.17¶
New Features¶
Neutron containers are now built from our StackHPC fork.
Upgrade Notes¶
Updates default Ceph images to v17.2.7 for Quincy.
Updates Consul to 1.16.3 and Vault to 1.14.6.
Bug Fixes¶
Fixes the bulk API of CloudKitty so that it now supports the migration from Elasticsearch to OpenSearch.
Fixes an issue with the growroot playbook where disks such as ‘sdp’ would become ‘sd’ due to the removal of the trailing ‘p’ when dealing with nvme devices.
Fixes Neutron so that load balancer FIPs are not broken on Neutron restart. See Neutron bug report.
Fixes issue where Netmiko devices were sending no commands to the switch since plug_bond_to_network is overridden in networking_generic_switch/devices/netmiko_devices/init.py and PLUG_BOND_TO_NETWORK to set to None. See NGS bug report.
Restores valid value for the
flavor_idlabel onopenstack_nova_server_statusPrometheus metrics.
stackhpc/14.0.0.16¶
New Features¶
Adds kolla config merging options to the
Kolla custom config generationsection ofetc/kayobe/kolla.yml.
Upgrade Notes¶
Kolla config merging is enabled by default in the Antelope release of Kayobe. This was quite an extensive change and whilst backwards compatbility was one of the goals, there may be some situations where refactoring of your Kolla config will be necessary. Extra care should be taken if you are using the multiple environments feature. It is recommended that you carefully check the diff in the resultant Kolla configuration by following these steps to check for missing config or duplicated config options. The
kolla_openstack_custom_config_environment_merging_enabledoption can be set toFalseto revert back to the old behaviour.
stackhpc/14.0.0.15¶
New Features¶
The Cephadm pre and post commands now support default commands with the variables
cephadm_commands_pre_defaultandcephadm_commands_post_default. As such, any extra commands should be added to the variablescephadm_commands_pre_extraandcephadm_commands_post_extra.
Rocky Linux 9 image has been rebuilt with missing base packages (e.g. microcode_ctl) by installing ‘Minimal Install’ DNF group. Also cloud-init from CentOS 9 Stream has been installed with NetworkManager support.
Bug Fixes¶
Fixes an issue when live migrating instances to hosts with cgroups v2 enabled (Ubuntu Jammy and Rocky 9). See Nova bug report.
Fixes a race condition when launching multiple Ironic instances in parallel (as is commonly triggered when using Terraform/OpenTofu). See Nova bug report.
When using custom SCA policies for Wazuh, the agents are now correctly configured to allow commands to be executed from the manager.
Fixes an issue with Ansible Pulp modules depending on the
pulp_gluePython library since thepulp.squeezer0.0.14 release.
Fixes an issue with Kolla container image builds for Ubuntu where the release train package repositories could be behind the container image, leading to image build failures.
stackhpc/14.0.0.14¶
Bug Fixes¶
Rebuild and bump the Bifrost container for Xena to include fix for Error while running update_to_latest_versions: ‘’BIOSSetting’’ object has no attribute during Ironic database migrations on upgrade
Disabled custom APT configuration for non-overcloud hosts (Ubuntu Only). This resolves the issue of the seed hypervisor attempting to pull packages from the repository on the seed before it has been deployed.
stackhpc/14.0.0.12¶
New Features¶
Adds the new magnum-capi-helm out of tree driver (see here https://github.com/stackhpc/magnum-capi-helm) into release train magnum containers
This patch adds OpenStack Capacity metrics and exporters to StackHPC Kayobe Config. This includes a deployment playbook, Prometheus scrape jobs and HAProxy configurations to support this change.
Adds
ethtoolandpciutilsto the overcloud host disk image.
Raises an alert when the count of RabbitMQ ready messages increases above a threshold.
Adapt threshold of RabbitMQ connection alert based on the size of the deployment to avoid spurious alerts.
Wazuh can now de deployed with additional custom SCA policies. Just add the policy file(s) to the directory
{{ kayobe_env_config_path }}/wazuh/custom_sca_policies.
Upgrade Notes¶
Rebuilt all kolla and package repo tags to bring in kernel fixes and apply CentOS image build customisations that were previously being ignored.
To deploy the OpenStack Capacity Grafana dashboard, you must define OpenStack application credential variables:
secrets_os_capacity_credential_idandsecrets_os_capacity_credential_secretas laid out in the ‘Monitoring’ documentation.You must also enable the
stackhpc_enable_os_capacityflag for OpenStack Capacity HAProxy and Prometheus configuration to be templated.You may also change the default authentication URL from the kolla_internal_fqdn and change the default OpenStack region from RegionOne with the variables:
stackhpc_os_capacity_auth_urlandstackhpc_os_capacity_openstack_region_name.To disable certificate verification for the OpenStack Capacity exporter, you can set
stackhpc_os_capacity_openstack_verifyto false.
stackhpc/14.0.0.9¶
Upgrade Notes¶
Enabled ML2/OVN by default. Checks preventing accidental migration from ML2/OVS were added in Kolla Ansible. If you are using a Neutron plugin other than ML2/OVN, set
kolla_enable_ovntofalse.OVN distributed FIP is disabled, to enable it set
neutron_ovn_distributed_fiptotrueinetc/kayobe/kolla/globals.yml.
The
reboot.ymlcustom Ansible playbook now defaults to reboot only one host at a time. Existing behaviour can be retained by setting ANSIBLE_SERIAL=0.
Security Issues¶
The Rocky 8 minor version has been bumped to 8.8 and new snapshots have been created to include fixes for Zenbleed (CVE-2023-20593), Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.
The snapshots for Rocky 9.2 have been refreshed to include fixes for Zenbleed (CVE-2023-20593), Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.
stackhpc/14.0.0.8¶
Upgrade Notes¶
The path used to store Wazuh certificates has changed.
local_certs_pathis now set to the environment directory e.g$KAYOBE_CONFIG_PATH/environments/<environment>/wazuhor$KAYOBE_CONFIG_PATH/wazuh/if not using environments. The contents of$KAYOBE_CONFIG_PATH/ansible/wazuh/certificatesshould be moved to the new location and the empty directory should be removed.
The
local_custom_certs_pathvariable has been removed. Custom wazuh certificates should be moved to$KAYOBE_CONFIG_PATH/environments/<environment>/wazuh/wazuh-certificates/if using environments, or$KAYOBE_CONFIG_PATH/wazuh/wazuh-certificatesif not.
stackhpc/14.0.0.6¶
New Features¶
Provide ELRepo 9, which in turn provides packages to support be2net and mpt3sas hardware. Configuration of ELRepo 9 is disabled by default and may be enabled by setting dnf_install_elrepo_9: true.
Nvmemon now reports physical size of the disk.
Upgrade Notes¶
CentOS Stream 8 snapshots have been bumped and new container images are available. Make sure to sync these into your local pulp. The yum repositories must be reconfigured to exclude a buggy version of iptables. To do this use:
kayobe overcloud service reconfigure -kt none -t dnf.
CentOS Extras has been replaced with CentOS Extras Common. You may need to use the
--allowerasingoption with DNF if you have packages installed from the old repo. This is a one time only thing and on the next package update you can drop this argument.
Configure Nova to use more modern ‘q35’ libvirt machine type rather than ‘pc’ which is considered legacy.
Instance labels in prometheus now use inventory hostnames rather than IPs.
Security Issues¶
Bumps CentOS Stream 8 snapshots to include fixes for Zenbleed (CVE-2023-20593) and Downfall (CVE-2022-40982). It is recommended that you update your OS packages and reboot into the kernel as soon as possible.
Bumps Ubuntu repository snapshots and container images to bring in latest security patches. This includes the microcode to patch Downfall (CVE-2022-40982). Zenbleed (CVE-2023-20593) was patched in the previous snapshot bump. To apply the microcode updates, it is recommended to reboot each host after upgrading all of the packages.
Bug Fixes¶
Fixes an issue with local image builds where kolla_tag had not been set. The error had the signature:
Upstream package repository mirrors are now restored in Kolla container images. This makes it possible to install or update packages for debugging purposes.
stackhpc/14.0.0.1¶
New Features¶
Add
blazarproject Kolla container images.Blazaris a resource reservation service for OpenStack.Blazarenables users to reserve a specific type/amount of resources for a specific time period and it leases these resources to users based on their reservations.
Adds
casocontainer images.cASOis an is an accounting reporter that supports Cloud Accounting Usage Records. For more information, see the upstream docs. Note that this container does not exist in upstream Kolla and is maintained downstream by StackHPC.
Adds code to the globals.yml file to add endpoints for the ceph_mgr_exporter. If ceph is configured correctly, managers will be under the mgrs inventory group. If this group is empty, then the variable will just be empty (the KA default). This also requires setting
kolla_enable_prometheus_ceph_mgr_exportertotrue.
The playbook
hotfix-containers.ymlhas been added. This allows arbitrary files to be copied into, and/or arbitrary commands to be executed within, overcloud containers.
Support for Ubuntu 22.04 Jammy Jellyfish repositories have been added to the Yoga Release.
Set monitoring services be enabled by default in the
ci-multinodeenvironment.
OpenSearchcontainer images have been added.
Add the package repository configuration required for Rocky Linux 9 support.
Add CI for Rocky 9 hosts.
Added support for Rocky Linux 9.2 repositories and made 9.2 the default version.
Adds support for using a VMs as compute and controller nodes in the
ci-multinodeenvironment by dynamically setting the MTU of the networks in networks.yml and removing the static definition of the network interfaces for the compute and controller groups.
Add Wazuh deployment playbook.
Adds utility playbooks to build and rotate amphora images. For more details check out the Octavia section of the Operator Guide included in the documentation.
Brings in new neutron container images to add batching support to Networking Generic Switch. This is opt in via the
ngs_batch_requestsconfiguration option and only affects Ironic deployments that use Networking Generic Switch. See the following PR for more details.
Updates neutron containers to contain a version of networking-generic-switch with support for trunk ports when using DellOS 10 or Cisco switches. See this PR for more details.
Updates neutron containers to contain a version of networking-generic-switch with support for DellOS 10. See this PR for more details.
Improvements to the ci-aio automated deployment script to allow the script to successfully run on LVM-based images.
Added a script to the AIO environment that can be used to quickly deploy an AIO for testing.
Adds time information to tasks using the ansible.posix.profile_tasks callback.
Adds some basic tuning of Ansible, including use of 20 forks, enabling SSH pipelining, YAML-formatted output, and disabling fact variable injection.
magnum container now has capi driver
Adds support for using Ceph HAProxy and Keepalived images stored in Pulp. This is enabled automatically if
stackhpc_sync_ceph_imagesis set totrue.
Mariabackup is now enabled by default.
The flag
om_enable_rabbitmq_high_availabilityis now set totrue. Adds tags for new RabbitMQ containers to update to RabbitMQ version 3.9.22.
Adds an
etcdKolla container image. This can be used for OpenStack service coordination as a tooz backend, or for batched processing of switch configuration in Networking Generic Switch (this requires a downstream NGS patch).
Adds drive temperatures to the table on the hardware overview dashboard and a timeseries to show the temperature over time.
Adds picker to hardware overview dashboard to select a specific host to show drive information for.
Adds support for synchronising HashiCorp Consul and Vault images to a local Pulp registry.
Adds a new variable
stackhpc_pulp_sync_for_local_container_buildwhich, when set totrue, configures the local Pulp server to sync all package repositories required for building kolla containers on a local kolla build host.
Enable TLS for the Seed Pulp service. Set
pulp_enable_tls: trueand provide paths to a TLS certificate and key usingpulp_cert_pathandpulp_key_pathrespectively.
Adds a standard LVM configuration that is compatible with the new overcloud host image.
adds helm client into magnum container
Adds support for Manila in the ci-multinode environment using the CephFS native backend. This is disabled by default, but can be enabled by setting the following variables in the kayobe configuration: kolla_enable_manila: true kolla_enable_manila_backend_cephfs_native: true
Updated the documentation for the ci-multinode to include instructions on how to set up and test Magnum.
Added support for Wazuh in the ci-multinode environment.
Updates Prometheus Node exporter to version 1.5.0.
Adds NTP alerts to prometheus alertmanager.
Adds alerts for Octavia load balancers and amphorae. Alerts are triggered when load balancers enter the ERROR or DEGRADED states, or when amphorae enter the ERROR state.
Adds a new Grafana dashboard for Octavia. This dashboard is used to monitor the load balancers as well as the amphorae.
Adds a standard overcloud Diskimage Builder (DIB) host image configuration.
Prebuilt overcloud host images can now be pulled from Ark using the stackhpc_download_overcloud_host_images variable. The image is selected based on os_distribution and os_release.
Re-enable Pulp Ubuntu repositories.
Package repositories and container images for CentOS Stream based deployments have been updated. Key packages to note are:
Kernel
version: 4.18.0
release: 448.el8
Libvirt
version: 8.0.0
release: 6.module_el8.7.0+1140+ff0772f9
OVS
version: 2.17.0
release: 71.el8s
OVN
version: 22.09.0
release: 11.el8s
Container images for Ubuntu based deployments have been updated. Key packages to note are:
Libvirt
version: 8.0.0
release: 1ubuntu7.4~cloud0
OVS
version: 2.17.3
release: 0ubuntu0.22.04.1~cloud0
OVN (unchanged since last container build)
version: 22.03.0
release: 0ubuntu1~cloud0
Sync Rocky Linux 8.7 RPM repositories to local Pulp servers.
Enables SMART monitoring. Manual action is required, please see the monitoring documentation for the procedure.
Split cephadm_commands into cephadm_commands_pre and cephadm_commands_post commands. This allows the user to run commands that must be run before the rest of the post-deployment configuration, as well as commands that rely on resources created by the post-deployment config.
Updates Grafana to 9.4.7 version.
Upgrades Pulp from
3.21to3.22.
Disables Pulp analytics.
Sets Pulp worker based on available CPU cores. This may improve performance when pulling container images to many hosts simultaneously.
Upgrades Pulp from
3.22to3.23.
Upgrades Pulp from
3.23to3.24.
Adds support for package repository snapshots via Pulp. A local Pulp server is deployed on the seed, which syncs package repositories and container images from the StackHPC Ark Pulp server. Control plane servers pull packages and container images from the local Pulp server.
The EPEL package repository is disabled by default. It may be enabled by setting
dnf_enable_epeltotrue.
Uses StackHPC source code repositories for kolla, kolla-ansible, and bifrost.
Supports Kolla CentOS Stream 8 source container images.
Adds custom playbooks for compute host maintenance:
nova-compute-drain.ymlnova-compute-disable.ymlnova-compute-enable.ymlreboot.yml
Adds a custom playbook to run the Anomaly Detection Visualiser (ADVise),
advise-run.yml.
Adds a custom playbook to reset the RabbitMQ cluster and restart OpenStack services that use it,
rabbitmq-reset.yml.
Adds a custom playbook to configure swap,
swap.yml.
Adds the Kayobe Automation Git repository as a submodule, and provides some basic configuration for it in an
.automation.confdirectory.
Adds support for deploying a Squid caching proxy as a custom container on the seed.
Enables Elasticsearch, Grafana, Kibana, Prometheus by default. Provides standard dashboards for Grafana and alerting rules for Prometheus.
Upgrade Notes¶
Bumped focal package versions due to unmet depenencies
Bumps octavia container versions
Bumped rocky 9 package versions due to missing snapshot
container tags for magnum capi changes
Updates Ceph Pacific container image to v16.2.11.
Automatically install Quincy if the node is running Ubuntu 22.04, else install Pacific.
Increase stackhpc.cephadm collection to version 1.12.2.
Enables Docker live restore by default. This may be disabled by setting
docker_daemon_live_restoretofalseindocker.yml.
The flag
om_enable_rabbitmq_high_availabilityis now set totrue. As this enables durable queues, RabbitMQ will need to be reset, and the services which use it restarted. Tags are added to update the RabbitMQ containers to version 3.9.22.
The overcloud host image build workflow now uploads the built image to SMS as well as ARK, allowing it to be tested both manually and through AIO CI jobs.
Updated OVN package version from 22.06 to 22.09.
openvswitchversion has been updated to ~2.17.5 on all distributions (CentOS/Rocky9 are two patches ahead of 2.17.5). Images include fixes for CVE-2023-1668.Ubuntu repository versions for focal and ubuntu cloud archive have been updated to 20230515.
Kolla tag overrides have been refactored to allow kolla-ansible to resolve them individually by host. This means that mixed clouds can be deployed which allows for migration between distributions.
Dont pull apt packages from pulp for Ubuntu Jammy until Jammy packages are published.
Dont pull ceph packages from ceph official repos for Ubuntu Jammy until Jammy packages are published.
Updates the smartmon-tools.yml playbook to ensure that cron is installed before attempting to configure crontab.
Updated Ubuntu package repository versions.
Bug Fixes¶
Added
NetworkManager-config-serverpackage to Rocky Linux 9 deployment image. Which prevents NetworkManager from automatically running DHCP on unconfigured ethernet devices and allows connections with static IP addresses to be brought up even on ethernet devices with no carrier.
Fixed a syntax error in Prometheus SMART monitoring rules.
Caps the number of Pulp API and content workers to 32 each to avoid errors on hosts with many CPUs.
Fixes the hardware overview dashboard to use the correct metric for displaying drive temps. Now uses an or to display whichever metric is compatible with the drives in the system. The two metrics are temperature_case_raw_value and temperature_celsius_raw_value.
Fixes the issue with using SAML2 federation in Keystone against NetIQ IdP.
Fixes internet connectivity for VMs deployed in the
ci-multinodeenvironment.
Fixes creation of over 1TB memory VMs on AMD with IOMMU enabled on Rocky Linux 9.
Fixes the smartmon script to be case insensitive when checking for the inital SMART info. This is to ensure that the script works correctly on systems where the output of smartctl -i is not capitalised as previously expected by the script. This leads to badly formatted .prom files which lead to node_exporter failing to scrape the file.
Fixes the InstanceDown alerting rule wait time to be consistent with the alert message. The alert message says “for 5 minutes” but the rule was set to wait for 1 minute.
Updates nova image to bring in a fix for parsing mdev uuids when using libvirt>=7.7. See bug for more details.
Adds Ironic images tags to fix a bug with online upgrades for Bios/Traits. See patch: https://review.opendev.org/c/openstack/ironic/+/877409
Add unit to LowMemory alert description.
Fixes Octavia health monitors not being created on cluster spawn.
Fixes CoreDNS for Magnum clusters crashing on startup.
Allows cinder-csi nodeplugin to start on the same Magnum cluster host as cinder-csi controllerplugin.
Corrects ClusterRole rules for Magnum cluster-autoscaler, and sets cluster-autoscaler pods to use hostNetwork.
Disables metadata proxy over IPv6 inside Neutron DHCP agent to work around bug 1953165.
Fixes a Prometheus Node exporter crash which may affect nodes with AMD processors (first seen on HPE DL385).
Fix for nova resize API not parsing the new flavor on resize - bug 1805969.
Fix creation of VM instances with UEFI enabled and Secure Boot disabled.
Fixes documentation builds on Read the Docs.
Fixes synchronisation and DNF configuration of the Rocky Linux 9 CRB repository.
HAProxy alerting rules have been updated to use the server name that is down, rather than the name of the instance that reported the down server.
Other Notes¶
deployment guide docs added for new capi driver
Reduced verbosity in etc/kayobe/pulp.yml
Changes the Grafana OpenStack dashboard to show HTTP status 300 as green instead of orange.
Adds a
ci-aioenvironment for CI testing.
Adds a
ci-builderenvironment for building Kolla container images in CI.