Skip to content

Add cloud-hypervisor v51.1 with backwards-compatible version flag (CVE-2026-27211)#200

Open
ulziibay-kernel wants to merge 6 commits into
mainfrom
hypeship/upgrade-ch-v50.1
Open

Add cloud-hypervisor v51.1 with backwards-compatible version flag (CVE-2026-27211)#200
ulziibay-kernel wants to merge 6 commits into
mainfrom
hypeship/upgrade-ch-v50.1

Conversation

@ulziibay-kernel
Copy link
Copy Markdown
Contributor

@ulziibay-kernel ulziibay-kernel commented Apr 24, 2026

Summary

Adds Cloud Hypervisor v51.1 alongside existing v49.0 with a config flag to control which version new instances use. This enables a safe, no-downtime upgrade path for CVE-2026-27211.

What changed

  • Both v49.0 and v51.1 binaries are embedded -- no need to drain standby instances before deploy
  • New config flag: hypervisor.cloud_hypervisor_version (default: v49.0)
    • Env override: HYPERVISOR__CLOUD_HYPERVISOR_VERSION=v51.1
  • Existing standby instances restore using their stored version (already the case)
  • New instances use whichever version the flag is set to
  • VMM client regenerated from v51.1 OpenAPI spec with ImageType: Raw on all disks (CVE fix)

Upgrade path

  1. Deploy this PR -- all instances continue on v49.0, no disruption
  2. When ready, set HYPERVISOR__CLOUD_HYPERVISOR_VERSION=v51.1
  3. New instances get v51.1; existing standbys still restore on v49.0
  4. Once all v49.0 standbys have expired, v49.0 can be removed in a follow-up

Security fix (CVE-2026-27211)

Fixes GHSA-jmr4-g2hv-mjj6: arbitrary host file exfiltration via crafted QCOW2 disk headers. Affects CH versions 34.0 through 50.0. Fixed in 50.1+.

v51.1 additionally includes:

  • QCOW2 v3 improvements (live resize, variable refcount, dirty bit)
  • DISCARD/WRITE_ZEROES support for virtio-blk
  • THP for anonymous shared memory (performance)
  • ACPI NUMA affinity for VFIO-PCI (GPU passthrough)

Files changed

  • Makefile -- download both v49.0 and v51.1 binaries
  • cmd/api/config/config.go -- add cloud_hypervisor_version field (default v49.0)
  • lib/vmm/binaries_linux.go -- embed both v49.0 and v51.1
  • lib/vmm/binaries_darwin.go -- version constants for compile compat
  • lib/vmm/version.go -- ParseVersion matches both versions
  • lib/vmm/vmm.go -- regenerated from v51.1 OpenAPI spec
  • lib/vmm/client_test.go -- tests for both versions
  • lib/hypervisor/cloudhypervisor/process.go -- SetDefaultVersion/GetDefaultVersion, configurable GetVersion
  • lib/hypervisor/cloudhypervisor/config.go -- ImageType: Raw on all disks (CVE fix, safe for v49.0)
  • lib/providers/providers.go -- wire config flag at startup
  • lib/instances/version_upgrade_test.go -- new E2E test: create v49.0 -> standby -> switch to v51.1 -> restore -> verify version preserved -> create new instance on v51.1

Test plan

  • go build ./lib/vmm/... ./lib/hypervisor/cloudhypervisor/...
  • go vet clean
  • go test ./lib/vmm/... -- both versions extract and start
  • go test ./lib/hypervisor/cloudhypervisor/... passes
  • TestCloudHypervisorVersionUpgradeRestore -- standby/restore across version change
  • Full go test ./lib/instances/... (requires KVM, CI)

Note

Medium Risk
Updates embedded Cloud Hypervisor binaries/client API and introduces a runtime-selectable default version; mistakes could break VM boot/restore behavior or produce version mismatches for snapshots despite added coverage.

Overview
Adds dual embedded Cloud Hypervisor support by bundling v51.1 alongside v49.0, updating build tooling to download/ensure both versions and switching the downloaded OpenAPI spec to v51.1.

Introduces a new config flag hypervisor.cloud_hypervisor_version (default v49.0) that sets the default CH version for new instances via cloudhypervisor.SetDefaultVersion, while keeping existing instances tied to their stored HypervisorVersion during restore/fork/snapshot flows.

Regenerates the lib/vmm OpenAPI client from the newer spec (new fields + vm.resize-disk endpoint) and hard-sets CH disk ImageType to Raw when building VM configs. Adds/updates tests, including a new Linux E2E test validating standby/restore across a default-version flip.

Reviewed by Cursor Bugbot for commit 5966b75. Bugbot is set up for automated code reviews on this repo. Configure here.

@ulziibay-kernel ulziibay-kernel marked this pull request as ready for review April 27, 2026 19:02
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR updates Cloud Hypervisor binaries and related build tooling, but does not modify API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) that the filter targets.

To monitor this PR anyway, reply with @firetiger monitor this.

@ulziibay-kernel
Copy link
Copy Markdown
Contributor Author

ulziibay-kernel commented Apr 27, 2026

Changes between v49.0 and v51.1 -- reviewer notes

Update (May 12): This PR is now backwards-compatible. Both v49.0 and v51.1 are embedded. A config flag (hypervisor.cloud_hypervisor_version, default v49.0) controls which version new instances use. Existing standby instances restore using their stored version. No drain strategy needed.

Breaking changes addressed

Change Version Impact on us Action taken
Sector-zero write prevention on autodetected raw images v50.1+ BREAKING -- overlay disk (vdb) fails with I/O errors Fixed: set ImageType: Raw explicitly on all disk configs
backing_files defaults to off v50.1+ Low risk -- we use raw images, not QCOW2 No action needed
Snapshot restore requires exact CH version match v50.0+ HIGH -- standbys on v49.0 can't restore with v51.1 binary Fixed: both binaries embedded, restore uses stored version

Upgrade path

  1. Deploy this PR -- all instances continue on v49.0, no disruption
  2. Set HYPERVISOR__CLOUD_HYPERVISOR_VERSION=v51.1 when ready
  3. New instances get v51.1; existing standbys still restore on v49.0
  4. Once all v49.0 standbys expire, v49.0 can be removed in a follow-up

Medium Impact

Change Version Impact on us Action needed
Byte-range advisory locks on block devices (was whole-file locks) v50.0 Low risk -- better compat with network storage Monitor for lock-related errors
Seccomp filter fixes v50.0 Positive -- fixes seccomp violations in vsock thread Monitor for seccomp kills in CH vmm.log
CPUID fixes in guest v50.0 Positive -- unlikely to break anything None expected
Nested virtualization now configurable (`nested=on off, default on`) v50.0 No change in behavior

Low Impact / Positive (v50.0-v51.1)

  • Live migration performance improvement
  • Live disk resizing API (/vm.resize-disk)
  • QCOW2 v3 improvements (live resize, variable refcount, dirty bit)
  • DISCARD/WRITE_ZEROES support for virtio-blk
  • THP for anonymous shared memory (performance)
  • ACPI NUMA affinity for VFIO-PCI (GPU passthrough)

Comment thread lib/vmm/version.go Outdated
@ulziibay-kernel ulziibay-kernel changed the title Upgrade cloud-hypervisor to v50.1 (CVE-2026-27211) Upgrade cloud-hypervisor to v51.1 (CVE-2026-27211) Apr 28, 2026
ulziibay-kernel and others added 5 commits May 12, 2026 11:12
Fixes GHSA-jmr4-g2hv-mjj6 (CVE-2026-27211): VMM host file
exfiltration via malicious QCOW2 headers. Affects versions 34.0
through 50.0; fixed in 50.1.

- Drop embedded v48.0 and v49.0 binaries; embed v50.1 only
- Update Makefile downloads, spec source, and ensure-ch-binaries
  check to v50.1
- Update SupportedVersions, ParseVersion, and the default
  GetVersion() returned by the cloud-hypervisor Starter
- Update tests and docs to reference v50.1

Cloud Hypervisor API remains at v0.3.0 (new /vm.resize-disk
endpoint and optional `nested` field are additive, no regen
needed unless the new surface is used).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CH v50.1 prevents sector-zero writes on autodetected raw images as
part of the CVE-2026-27211 fix. Without explicit image_type, the
overlay disk (vdb) fails with I/O errors because CH treats it as a
potential QCOW2 spoof:

  I/O error, dev vdb, sector 0 op 0x1:(WRITE)
  EXT4-fs (vdb): mount failed
  FATAL: dropping to shell for debugging

Fix:
1. Regenerate lib/vmm/vmm.go from the v50.1 OpenAPI spec to pick up
   the new image_type and backing_files fields in DiskConfig
2. Fix malformed enum in the upstream spec (type: enum [...] -> type:
   string with enum list) matching cloud-hypervisor PR #7734
3. Set ImageType: Raw on all disk configs in ToVMConfig so CH skips
   format autodetection and allows sector-zero writes on raw images

Made-with: Cursor
v51.1 is the latest release and includes:
- Same CVE-2026-27211 fix as v50.1
- Fixed image_type enum in OpenAPI spec (PR #7734) -- we no longer
  need our manual spec patch
- QCOW2 v3 improvements (live resize, variable refcount, dirty bit)
- DISCARD/WRITE_ZEROES support for virtio-blk
- THP for anonymous shared memory (performance)
- ACPI NUMA affinity for VFIO-PCI (GPU passthrough)

Regenerated lib/vmm/vmm.go from v51.1's native spec (includes
Unknown as an additional ImageType variant).

Made-with: Cursor
Instead of replacing v49.0 with v51.1, embed both versions and add a
config flag (hypervisor.cloud_hypervisor_version) to control which
version new instances use. Default remains v49.0 for safe rollout.

- Re-add v49.0 binaries alongside v51.1 in Makefile and embed directives
- Add CloudHypervisorVersion field to HypervisorConfig (default "v49.0")
- Add SetDefaultVersion/GetDefaultVersion to CH starter, wired from
  providers.go at startup
- Existing standby instances restore using their stored version
  (already the case via stored.HypervisorVersion in restore.go)
- New E2E test (TestCloudHypervisorVersionUpgradeRestore) verifies:
  create on v49.0 -> standby -> switch default to v51.1 -> restore
  still uses v49.0 -> new instance uses v51.1
- ImageType: Raw (v51.1 CVE fix) is safe for v49.0 (serde ignores
  unknown fields)

Operators can flip to v51.1 when ready:
  HYPERVISOR__CLOUD_HYPERVISOR_VERSION=v51.1

No drain strategy needed -- snapshot restore works across both versions.

Made-with: Cursor
@ulziibay-kernel ulziibay-kernel force-pushed the hypeship/upgrade-ch-v50.1 branch from bb7f53e to ef4be42 Compare May 12, 2026 17:19
@ulziibay-kernel ulziibay-kernel changed the title Upgrade cloud-hypervisor to v51.1 (CVE-2026-27211) Add cloud-hypervisor v51.1 with backwards-compatible version flag (CVE-2026-27211) May 12, 2026
Comment thread lib/hypervisor/cloudhypervisor/process.go
Cursor Bugbot correctly identified that RestoreSnapshot and
ForkFromSnapshot in snapshot.go call starter.GetVersion() and
overwrite HypervisorVersion. With the configurable default, flipping
to v51.1 would break standby snapshot restores of v49.0 instances.

Fix: only overwrite HypervisorVersion when the target hypervisor TYPE
differs from the source (stopped snapshot cross-hypervisor restore).
When the type is the same, preserve the source snapshot's version so
CH gets the exact binary match it requires.

Also fix ParseVersion to use regex extraction + SupportedVersions
lookup instead of substring Contains, addressing the Bugbot note about
"v50.1" matching "v50.10" (now moot but the pattern was still fragile).

Made-with: Cursor
Copy link
Copy Markdown
Collaborator

@sjmiller609 sjmiller609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent, just a few small feedbacks

Comment thread cmd/api/config/config.go
Default: "cloud-hypervisor",
FirecrackerBinaryPath: "",
Default: "cloud-hypervisor",
CloudHypervisorVersion: "v49.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the new version? EDIT: no, it makes more sense for this to be empty or none (since this is an optional configuration), then we are treating no config to mean use the latest version. That way we are only specifying the latest version in one place (not in this file, somewhere else)


var (
defaultVersionMu sync.RWMutex
defaultVersion = vmm.V49_0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this block and the default version will be somewhere else

// use the version stored in their metadata.
func (s *Starter) GetVersion(p *paths.Paths) (string, error) {
return string(vmm.V49_0), nil
return string(GetDefaultVersion()), nil
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default version should be only set in one place, and I think it should be in binaries_linux, e.g.:

type CHVersion string

const (
	V48_0 CHVersion = "v48.0"
	V49_0 CHVersion = "v49.0"
)

const DefaultVersion = V49_0

var SupportedVersions = []CHVersion{V48_0, V49_0}

(using the actual correct versions being set in this PR)

then what this function should do is return the default for new instances, otherwise return what's set in the instance metadata

OverlaySize: 10 * 1024 * 1024 * 1024,
Vcpus: 1,
NetworkEnabled: false,
Hypervisor: hypervisor.TypeCloudHypervisor,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this line can specify the version

Comment thread lib/vmm/README.md
Comment on lines -78 to -81
│ ├── v48.0/
│ │ ├── x86_64/cloud-hypervisor (4.5MB)
│ │ └── aarch64/cloud-hypervisor (3.3MB)
│ └── v49.0/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines should remain since we are showing how we can support multiple versions

Comment on lines +166 to +181
/vm.resize-disk:
put:
summary: Resize a disk
requestBody:
description: Resizes a disk attached to the VM
content:
application/json:
schema:
$ref: "#/components/schemas/VmResizeDisk"
required: true
responses:
204:
description: The disk was successfully resized.
500:
description: The disk could not be resized.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to deal with different APIs for each CH version. Here are the options that I thought about:

  1. Multiple API specs and mappings between which CH version to which spec version
  2. Pin API version per CH version
  3. Use the same api version for every CH version (which is what this PR is doing).

Breaking changes: This API spec bump does not introduce breaking changes. I think we can stick with 3, then revisit 2 or 1 if there are breaking changes in an API. So, breaking changes are being ignored until actual happens so we don't introduce YAGNI right now, at that point we can worry about different cloud hypervisor API versions being supported simultaneously.

New features only supported in some versions: There are new features not supported by the old version (disk resize).

We don't have any API in hypeman that will use this feature, so we don't need to worry about it right now. In the future, we should handle this using Capabilities, like how we do in the case of each VMM. So if we did introduce disk resize as a feature, then we would have Capabilities also wired as version-specific.

type Capabilities struct {
    Snapshot       bool
    MemoryHotplug  bool
    DiskResize     bool   // ← new, true only for v51.1+
    DiskRateLimit  bool
    // ...
}

func capabilities(v vmm.CHVersion) Capabilities {
    caps := baseCH()
    switch v {
    case vmm.V49_0:
        // (default — disk resize stays false)
    case vmm.V51_1:
        caps.DiskResize = true
    }
    return caps
}

Action requested: if it's not complicated / or a lot of LOC: it might be nice to add that capability in just to establish the pattern of per-version capabilities. Since we are introducing the disk resize API, we want AI to realize it's not supported on 49.

Actual changes in how we call previous version So we are actually sending one new field to the old version: "image type". So in the case of Image type: raw for example, I think this field is basically ignored or it actually works, and that should be OK either way. But that's the thing to confirm, basically so long as we can still launch the VM with that field set, then we are good.

Comment thread Makefile
@mkdir -p specs/cloud-hypervisor/api-v0.3.0
@curl -L -o specs/cloud-hypervisor/api-v0.3.0/cloud-hypervisor.yaml \
https://raw.githubusercontent.com/cloud-hypervisor/cloud-hypervisor/refs/tags/v48.0/vmm/src/api/openapi/cloud-hypervisor.yaml
https://raw.githubusercontent.com/cloud-hypervisor/cloud-hypervisor/refs/tags/v51.1/vmm/src/api/openapi/cloud-hypervisor.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment above this that multiple API versions will only be needed once there are breaking changes introduced, and we should use version-specific Capabilities when new features are introduced that aren't supported on previous versions.

Comment thread cmd/api/config/config.go
FirecrackerBinaryPath string `koanf:"firecracker_binary_path"`
Memory HypervisorMemoryConfig `koanf:"memory"`
Default string `koanf:"default"`
CloudHypervisorVersion string `koanf:"cloud_hypervisor_version"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this config should be named "cloud_hypervisor_default_version"

disk := vmm.DiskConfig{
Path: ptr(d.Path),
Path: ptr(d.Path),
ImageType: ptr(vmm.Raw),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one actual change in how we touch v49. I'm assuming it's ignored, but maybe it's actually used. So the thing to check is that v49 still launches a VM then we are good.

Comment on lines +28 to +44
// SetDefaultVersion sets the default Cloud Hypervisor version for new instances.
// Only updates if the provided version is in SupportedVersions.
func SetDefaultVersion(v string) {
defaultVersionMu.Lock()
defer defaultVersionMu.Unlock()
chv := vmm.CHVersion(v)
if vmm.IsVersionSupported(chv) {
defaultVersion = chv
}
}

// GetDefaultVersion returns the current default Cloud Hypervisor version.
func GetDefaultVersion() vmm.CHVersion {
defaultVersionMu.RLock()
defer defaultVersionMu.RUnlock()
return defaultVersion
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the version specified when we create the instance, not here

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5966b75. Configure here.

if vmm.IsVersionSupported(chv) {
defaultVersion = chv
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent misconfiguration of security-critical version flag

Medium Severity

SetDefaultVersion silently ignores unsupported version strings with no error return, no log warning, and no feedback to the caller. In ProvideInstanceManager, the result is unchecked. A typo like "51.1" instead of "v51.1" in HYPERVISOR__CLOUD_HYPERVISOR_VERSION would silently keep the vulnerable v49.0 default, and the operator would have no way to know the CVE-2026-27211 upgrade didn't take effect.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5966b75. Configure here.

@ulziibay-kernel ulziibay-kernel removed the request for review from rgarcia May 12, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants