Skip to content

feat: Add networkPerformanceOptions support to EC2NodeClass#9089

Open
knkarthik01 wants to merge 9 commits intoaws:mainfrom
knkarthik01:feat/network-performance-options
Open

feat: Add networkPerformanceOptions support to EC2NodeClass#9089
knkarthik01 wants to merge 9 commits intoaws:mainfrom
knkarthik01:feat/network-performance-options

Conversation

@knkarthik01
Copy link
Copy Markdown

What

Adds spec.networkPerformanceOptions.bandwidthWeighting to EC2NodeClass, enabling bandwidth weighting configuration (vpc-1, ebs-1) for 8th-gen EC2 instance types.

Why

8th-gen instances (M8, C8, R8, X8 families) support configurable bandwidth weighting that shifts baseline bandwidth between networking and EBS. For workloads using local NVMe (no EBS), vpc-1 provides ~25% more network bandwidth at no cost.

Currently there is no way to configure this in Karpenter:

  • EC2NodeClass does not expose networkPerformanceOptions
  • Custom launch templates were removed in v0.33+
  • modify-instance-network-performance-options API requires stopped instances, so userData won't work

Resolves #9088

How

Wires NetworkPerformanceOptions from EC2NodeClass spec through to the generated launch template. Same pattern as MetadataOptions and DetailedMonitoring.

Changes

  • pkg/apis/v1/ec2nodeclass.go — field + type definition
  • pkg/providers/amifamily/resolver.go — struct field + wiring
  • pkg/providers/launchtemplate/types.go — LT data + helper function
  • pkg/providers/launchtemplate/suite_test.go — 3 tests + hash uniqueness

Testing

  • Validated vpc-1 works at EC2 launch time on R8g instances ✓
  • AWS SDK v1.215.0+ has LaunchTemplateNetworkPerformanceOptionsRequest type ✓
  • Built and tested patched controller binary on v1.5.2 ✓

Usage

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: high-network
spec:
  networkPerformanceOptions:
    bandwidthWeighting: vpc-1

Note

CRD regeneration (make generate) may be needed — happy to add that as a follow-up commit once CI confirms.

Add spec.networkPerformanceOptions.bandwidthWeighting to EC2NodeClass,
enabling bandwidth weighting configuration (vpc-1, ebs-1) for 8th-gen
EC2 instance types (M8, C8, R8, X8 families). When set, Karpenter
includes NetworkPerformanceOptions in the generated launch template.

This allows workloads using local NVMe (no EBS) to shift baseline
bandwidth from EBS to networking at no additional cost. Currently
there is no way to configure this in Karpenter since custom launch
templates were removed in v0.33+ and the modify API requires stopped
instances.

Resolves aws#9088
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 18, 2026

Preview deployment ready!

Preview URL: https://pr-9089.d18coufmbnnaag.amplifyapp.com

Built from commit 13f1ea7e03f56f13d81c29b892840dc3255ce58b

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for configuring EC2 8th-gen instance bandwidth weighting by exposing spec.networkPerformanceOptions.bandwidthWeighting on EC2NodeClass and wiring it through to the generated EC2 Launch Template request.

Changes:

  • Add NetworkPerformanceOptions to EC2NodeClassSpec (with enum validation for bandwidthWeighting).
  • Thread NetworkPerformanceOptions through AMI family resolver into launch template generation.
  • Add launch template provider tests to validate default behavior and explicit vpc-1 / ebs-1 settings, plus LT name hash uniqueness.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/apis/v1/ec2nodeclass.go Adds API field + type for networkPerformanceOptions with validation markers.
pkg/providers/amifamily/resolver.go Includes NetworkPerformanceOptions in resolved launch template options.
pkg/providers/launchtemplate/types.go Sets NetworkPerformanceOptions on RequestLaunchTemplateData when configured.
pkg/providers/launchtemplate/suite_test.go Adds coverage for default + configured bandwidth weighting and LT name uniqueness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MetadataOptions *MetadataOptions `json:"metadataOptions,omitempty"`
// NetworkPerformanceOptions configure the network performance options for instances
// launched with this EC2NodeClass. Allows configuring bandwidth weighting between
// networking and EBS for supported 8th-gen instance types (M8, C8, R8, X8 families).
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — CRDs regenerated and deepcopy methods added in subsequent commits.

Comment thread pkg/apis/v1/ec2nodeclass.go Outdated
knkarthik01 and others added 3 commits April 17, 2026 17:41
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Run controller-gen to update EC2NodeClass CRD with the new
networkPerformanceOptions field and bandwidthWeighting enum.
Add DeepCopyInto/DeepCopy for NetworkPerformanceOptions type.
Regenerate EC2NodeClass CRDs with controller-gen.
@knkarthik01
Copy link
Copy Markdown
Author

CI is failing on make verify because the generated CRD and deepcopy files don't match exactly — I generated them with controller-gen locally but the CI toolchain produces slightly different output.

The 2 remaining diffs are:

  • pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml (CRD schema)
  • pkg/apis/v1/zz_generated.deepcopy.go (deepcopy methods)

The actual code changes (4 files) are correct and the tests pass when the CRD is properly registered. Could a maintainer run make verify locally and commit the generated files, or point me to the exact toolchain versions used in CI so I can match the output?

Alternatively, I can revert my generated files and just include the code changes — the CI DryRunGenPR workflow seems to handle generation automatically.

@knkarthik01
Copy link
Copy Markdown
Author

CI update: make verify now passes ✅ — regenerated CRDs and deepcopy with CI toolchain.

7/8 ci-test jobs pass. The single failure (1.30.x) is an unrelated flaky test:

[FAIL] CloudProvider MinValues [It] CreateFleet input should respect minValues for In operator requirement from NodePool

This is in pkg/cloudprovider, not in our changes. Our Network Performance Options tests all pass across all K8s versions.

Ready for review @DerekFrank.

Copy link
Copy Markdown
Contributor

@DerekFrank DerekFrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment on the parent issue, this warrants a small design prior to PR review:

#9088 (comment)

You can submit the design alongside the PR if you'd like, or stage them. Up to you

Addresses design questions from review:
- Mixed fleet support (no instance type filtering)
- Conditional launch template generation per instance type
- Instance type discovery via DescribeInstanceTypes
- NodeClass-level configuration (not pod-level)
- Node labeling for observability and optional pod affinity
- Drift detection
@knkarthik01
Copy link
Copy Markdown
Author

@DerekFrank Added a design doc: designs/bandwidth-weighting-support.md

Key design decisions:

  • No instance type filtering — mixed fleets (e.g., R8gd + R6gd) stay in the same NodePool. Karpenter conditionally includes NetworkPerformanceOptions in the launch template only for instance types that support it.
  • Discovery — prefer DescribeInstanceTypes (NetworkInfo) to dynamically detect support; static family list as fallback.
  • NodeClass-level, not pod-level — bandwidth weighting is an infrastructure decision. Pods express bandwidth needs via existing instance-network-bandwidth labels.
  • Labelkarpenter.k8s.aws/instance-bandwidth-weighting: vpc-1 for observability and optional pod affinity.
  • Drift — detected when node label doesn't match EC2NodeClass spec.

Happy to discuss at the next Working Group meeting.

@AndrewCharlesHay
Copy link
Copy Markdown
Contributor

Feature itself is tightly scoped — enum: [default, vpc-1, ebs-1] in the CRD is the right shape, and the design doc is thorough. A few review notes with a security/operational lens:

  1. Drift detection on an "at launch only" field. Per the AWS docs, NetworkPerformanceOptions.BandwidthWeighting can only be set at launch or while the instance is Stopped — not modifiable on a running instance. If an operator changes bandwidthWeighting from default to vpc-1 on an existing EC2NodeClass, the design doc says drift triggers replacement. Worth confirming the NodeClass controller handles this as "replacement on drift" rather than attempting an in-place modify that will fail with InvalidState — and that the drift signal is separate from other drift sources so operators can see why their nodes were replaced.

  2. Silent-ignore on unsupported instance types. The design relies on EC2 silently ignoring NetworkPerformanceOptions for R6gd etc. This is AWS API behavior, but if EC2 ever changes this to a hard error (e.g. via a new validator), a mixed fleet NodePool would start failing launches for legitimate R6gd capacity. Consider either (a) a requirement filter so bandwidthWeighting != default requires 8th-gen instance families, or (b) an explicit release note calling out the reliance on silent-ignore.

  3. CRD-level validation only. enum: [default, vpc-1, ebs-1] is enforced by the apiserver — great — but is there any runtime guard if the EC2 API adds a new value (e.g. ebs-2) and an operator tries to use it via a CRD with an updated schema? The NodeClass reconciler should gracefully refuse unknown values rather than passing them through to RunInstances.

  4. From a tenancy angle: bandwidthWeighting: vpc-1 shifts bandwidth allocation at the EC2 instance level — no cross-tenant impact inside a cluster, and no additional IAM permissions required. Clean surface.

Nothing blocking — mostly want to make sure drift behavior is spelled out in the design doc before merge.

@knkarthik01
Copy link
Copy Markdown
Author

knkarthik01 commented Apr 26, 2026

Pushed an update addressing the review points, main changes are explicit replacement-only drift with a distinct BandwidthWeightingDrift reason, a new Validation section covering the runtime guard for unknown enum values, a Discovery Misclassifications table + release note for the silent-ignore reliance, and clarified default-vs-unset semantics so existing NodeClasses don't churn on upgrade. Full doc: bandwidth-weighting-support.md.

Heads up: I'm traveling next week, so the earliest WG I can join is the week of 5/4. Happy to keep this moving offline in the thread if that works for you all - otherwise no rush, we can address this during first week of May
@AndrewCharlesHay @DerekFrank PTAL when you have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for EC2 bandwidth weighting configuration (NetworkPerformanceOptions.BandwidthWeighting) in the EC2NodeClass spec

4 participants