feat(scheduling): migrate CR and failover calls to options-based pipeline selection#950
feat(scheduling): migrate CR and failover calls to options-based pipeline selection#950mblos wants to merge 12 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a ChangesPlacement Context Skip, Capacity Options Merge, and HANA Pipeline Inference
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
as of now, tests are redundant. no strong opinion on either joint testing across filters like done in |
|
changed semantic of failover calls.. now selecting gp/hana pipelines. Please check if we want to do it like this or always go with gp pipeline |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/scheduling/reservations/failover/integration_test.go (2)
1371-1399:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame duplicate pipeline issue as in
newIntegrationTestEnv.Lines 1371-1383 and 1384-1399 both define pipelines named
PipelineNewFailoverReservation. Remove the first duplicate definition here as well.🔧 Remove the duplicate pipeline definition
pipelines := []v1alpha1.Pipeline{ { ObjectMeta: metav1.ObjectMeta{ Name: "test-filter-pipeline", }, Spec: v1alpha1.PipelineSpec{ Type: v1alpha1.PipelineTypeFilterWeigher, Filters: []v1alpha1.FilterSpec{ {Name: "filter_has_enough_capacity"}, {Name: "filter_has_requested_traits"}, {Name: "filter_correct_az"}, }, Weighers: []v1alpha1.WeigherSpec{ {Name: "kvm_failover_evacuation"}, }, }, }, - { - ObjectMeta: metav1.ObjectMeta{ - Name: PipelineNewFailoverReservation, - }, - Spec: v1alpha1.PipelineSpec{ - Type: v1alpha1.PipelineTypeFilterWeigher, - Filters: []v1alpha1.FilterSpec{ - {Name: "filter_has_requested_traits"}, - {Name: "filter_correct_az"}, - }, - Weighers: []v1alpha1.WeigherSpec{}, - }, - }, { ObjectMeta: metav1.ObjectMeta{ Name: PipelineNewFailoverReservation, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1371 - 1399, There are two pipeline definitions with the same name PipelineNewFailoverReservation in the test data. Remove the first pipeline definition (the one with only filter_has_requested_traits and filter_correct_az filters and no weighers) and keep the second one which contains the complete pipeline specification with filter_has_enough_capacity, filter_has_requested_traits, filter_correct_az, and the kvm_failover_evacuation weigher.
1170-1198:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDuplicate pipeline definitions — first one is dead code.
After the rename at line 1172, both pipelines at lines 1170-1182 and 1183-1198 use the same name
PipelineNewFailoverReservation. The loop at lines 1213-1225 will overwrite the first definition with the second, making the first pipeline block unreachable dead code. The two have different filter/weigher configurations.🔧 Remove the duplicate pipeline definition
pipelines := []v1alpha1.Pipeline{ { ObjectMeta: metav1.ObjectMeta{ Name: "test-filter-pipeline", }, Spec: v1alpha1.PipelineSpec{ Type: v1alpha1.PipelineTypeFilterWeigher, Filters: []v1alpha1.FilterSpec{ {Name: "filter_has_enough_capacity"}, {Name: "filter_correct_az"}, }, Weighers: []v1alpha1.WeigherSpec{ {Name: "kvm_failover_evacuation"}, }, }, }, - { - ObjectMeta: metav1.ObjectMeta{ - Name: PipelineNewFailoverReservation, - }, - Spec: v1alpha1.PipelineSpec{ - Type: v1alpha1.PipelineTypeFilterWeigher, - Filters: []v1alpha1.FilterSpec{ - {Name: "filter_has_requested_traits"}, - {Name: "filter_correct_az"}, - }, - Weighers: []v1alpha1.WeigherSpec{}, - }, - }, { ObjectMeta: metav1.ObjectMeta{ Name: PipelineNewFailoverReservation, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1170 - 1198, There are two pipeline definitions with the identical name PipelineNewFailoverReservation in the test fixtures, causing the first definition to be dead code that gets overwritten by the second when processed. Remove the first pipeline definition block that contains only the filters filter_has_requested_traits and filter_correct_az with an empty weighers list, keeping the second pipeline definition that includes the additional filter_has_enough_capacity filter and the kvm_failover_evacuation weigher, as it is the intended configuration that will actually be used.
🧹 Nitpick comments (1)
internal/scheduling/reservations/capacity/config.go (1)
17-19: ⚡ Quick winUpdate stale pipeline example in
TotalPipelinecomment.The example still mentions
kvm-report-capacity, but defaults now usekvm-general-purpose-load-balancing. Updating this avoids operator confusion.Suggested patch
- // TotalPipeline is the scheduler pipeline used for the empty-state probe. - // This pipeline should ignore current VM allocations (e.g. kvm-report-capacity). + // TotalPipeline is the scheduler pipeline used for the empty-state probe. + // This pipeline should ignore current VM allocations (e.g. kvm-general-purpose-load-balancing with AssumeEmptyHosts). TotalPipeline string `json:"capacityTotalPipeline"`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/capacity/config.go` around lines 17 - 19, The comment for the TotalPipeline field references an outdated pipeline example `kvm-report-capacity` which no longer reflects the current defaults. Update the comment text to replace the outdated example with the current default pipeline name `kvm-general-purpose-load-balancing` to keep the documentation accurate and avoid operator confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/scheduling/reservations/failover/reservation_scheduling.go`:
- Around line 25-32: The inferFailoverPipeline function performs a
case-sensitive string comparison on the extra specs value, but the upstream
GetFlavorType function uses strings.ToLower for case-insensitive comparison,
creating an inconsistency. Modify the comparison in inferFailoverPipeline to
convert the extraSpecs value to lowercase using strings.ToLower before comparing
it to "required", ensuring that values like "Required" or "REQUIRED" are handled
correctly and consistently with the upstream implementation.
---
Outside diff comments:
In `@internal/scheduling/reservations/failover/integration_test.go`:
- Around line 1371-1399: There are two pipeline definitions with the same name
PipelineNewFailoverReservation in the test data. Remove the first pipeline
definition (the one with only filter_has_requested_traits and filter_correct_az
filters and no weighers) and keep the second one which contains the complete
pipeline specification with filter_has_enough_capacity,
filter_has_requested_traits, filter_correct_az, and the kvm_failover_evacuation
weigher.
- Around line 1170-1198: There are two pipeline definitions with the identical
name PipelineNewFailoverReservation in the test fixtures, causing the first
definition to be dead code that gets overwritten by the second when processed.
Remove the first pipeline definition block that contains only the filters
filter_has_requested_traits and filter_correct_az with an empty weighers list,
keeping the second pipeline definition that includes the additional
filter_has_enough_capacity filter and the kvm_failover_evacuation weigher, as it
is the intended configuration that will actually be used.
---
Nitpick comments:
In `@internal/scheduling/reservations/capacity/config.go`:
- Around line 17-19: The comment for the TotalPipeline field references an
outdated pipeline example `kvm-report-capacity` which no longer reflects the
current defaults. Update the comment text to replace the outdated example with
the current default pipeline name `kvm-general-purpose-load-balancing` to keep
the documentation accurate and avoid operator confusion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a58fd721-2f3d-43a1-b4d4-38c0ed4cee3b
📒 Files selected for processing (25)
api/scheduling/options.gohelm/bundles/cortex-nova/values.yamlinternal/scheduling/nova/plugins/filters/filter_aggregate_metadata.gointernal/scheduling/nova/plugins/filters/filter_aggregate_metadata_test.gointernal/scheduling/nova/plugins/filters/filter_allowed_projects.gointernal/scheduling/nova/plugins/filters/filter_allowed_projects_test.gointernal/scheduling/nova/plugins/filters/filter_external_customer.gointernal/scheduling/nova/plugins/filters/filter_external_customer_test.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity_test.gointernal/scheduling/nova/plugins/filters/filter_instance_group_affinity.gointernal/scheduling/nova/plugins/filters/filter_instance_group_affinity_test.gointernal/scheduling/nova/plugins/filters/filter_instance_group_anti_affinity.gointernal/scheduling/nova/plugins/filters/filter_instance_group_anti_affinity_test.gointernal/scheduling/nova/plugins/filters/filter_live_migratable.gointernal/scheduling/nova/plugins/filters/filter_live_migratable_test.gointernal/scheduling/nova/plugins/filters/filter_quota_enforcement.gointernal/scheduling/nova/plugins/filters/filter_quota_enforcement_test.gointernal/scheduling/nova/plugins/filters/filter_requested_destination.gointernal/scheduling/nova/plugins/filters/filter_requested_destination_test.gointernal/scheduling/nova/plugins/filters/filter_skip_placement_context_test.gointernal/scheduling/reservations/capacity/config.gointernal/scheduling/reservations/capacity/controller.gointernal/scheduling/reservations/failover/integration_test.gointernal/scheduling/reservations/failover/reservation_scheduling.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/scheduling/reservations/commitments/integration_test.go`:
- Around line 1109-1128: The test in the captureScheduler handler ignores decode
errors on line 1111 and has no assertion that verifies the scheduler was
actually called. This allows the test to pass with a zero-value capturedReq even
if the scheduler was never invoked. Add a boolean flag (such as schedulerCalled)
that gets set to true when the captureScheduler handler executes, then assert
this flag is true before checking
capturedReq.Options.SkipPlacementContextFilters. This ensures the test can only
pass if the scheduler was actually called and the request was properly captured.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c023c681-2b76-4618-bcff-96c4b36da0b5
📒 Files selected for processing (5)
api/scheduling/options.gointernal/scheduling/reservations/capacity/controller_test.gointernal/scheduling/reservations/commitments/integration_test.gointernal/scheduling/reservations/commitments/reservation_controller.gointernal/scheduling/reservations/failover/reservation_scheduling.go
🚧 Files skipped from review as they are similar to previous changes (2)
- api/scheduling/options.go
- internal/scheduling/reservations/failover/reservation_scheduling.go
| }, scheduling.Options{SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) | ||
| }, scheduling.Options{ | ||
| ReadOnly: true, | ||
| AssumeEmptyHosts: ignoreAllocations, |
There was a problem hiding this comment.
If AssumeEmptyHosts and ignoreAllocations are synonymous, we should probably stick with one and get rid of the other
There was a problem hiding this comment.
Yes, they're synonymous — IgnoreAllocations in YAML and AssumeEmptyHosts in Options both do the same thing, merged with OR in the filter. IgnoreAllocations can be removed once kvm-report-capacity is deleted
We keep all the dedicated pipelines for now for migration purposes..
| logger := LoggerFromContext(ctx) | ||
|
|
||
| validHypervisors, err := c.queryHypervisorsFromScheduler(ctx, vm, allHypervisors, PipelineReuseFailoverReservation, resSpec, scheduling.Options{ReadOnly: true, SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) | ||
| validHypervisors, err := c.queryHypervisorsFromScheduler(ctx, vm, allHypervisors, inferFailoverPipeline(vm.FlavorExtraSpecs), resSpec, scheduling.Options{ReadOnly: true, SkipPlacementContextFilters: true, SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) |
There was a problem hiding this comment.
Not sure if this is correct.
PipelineReuseFailoverReservation skips capacity related checks like: has_enough_capacity or quota but still needs to do e.g., anti affinity (not sure if affinity is reasonable during failover). Looking at the yaml the anti-affinity is missing currently. I think that was not yet active and is a good point why we do not want duplicated pipelines.
| "pipeline", scheduleReq.Pipeline) | ||
|
|
||
| resp, err := c.SchedulerClient.ScheduleReservation(ctx, scheduleReq, scheduling.Options{ReadOnly: true, LockReservations: true, SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) | ||
| resp, err := c.SchedulerClient.ScheduleReservation(ctx, scheduleReq, scheduling.Options{ReadOnly: true, LockReservations: true, SkipPlacementContextFilters: true, SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) |
There was a problem hiding this comment.
Again not sure about SkipPlacementContextFilters: true.
Test Coverage ReportTest Coverage 📊: 69.9% |
Eliminating dedicated CR and failover pipelines by encoding their behavioral differences as call-time scheduling options, so the standard GP and HANA pipelines are sufficient for all internal scheduling calls.