Skip to content

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874

Open
ZealSV wants to merge 1 commit into
aws:masterfrom
ZealSV:feature/lumen-ai-inference-recommender
Open

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874
ZealSV wants to merge 1 commit into
aws:masterfrom
ZealSV:feature/lumen-ai-inference-recommender

Conversation

@ZealSV
Copy link
Copy Markdown
Contributor

@ZealSV ZealSV commented May 19, 2026

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over the auto-generated AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources in sagemaker-core.

ModelBuilder gains two methods:

job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
job = mb.start_inference_recommendation(workload, throughput,
instance_types=[ml.g6.12xlarge])

After the job reaches a terminal state, customers retrieve results via constructors that wrap the auto-gen job resource:

result = BenchmarkResult.from_job(job)
rec = Recommendation.from_job(job)
endpoint = rec.deploy(role=...)

Public surface added under sagemaker.serve:

  • Workload — typed factory (synthetic) that builds the WorkloadSpec inline JSON envelope. Extra AIPerf parameters flow through **params unchecked and are validated server-side.
  • BenchmarkResult / BenchmarkMetrics / BenchmarkMetric — parses the AIPerf profile_export_aiperf.json out of the output.tar.gz artifact.
  • Recommendation — wrapper around one row of an AIRecommendationJob's recommendations list. .deploy() prefers the ModelPackage path, falls back to a raw image_uri + S3 channels container definition.
  • Secret — helper around AWS Secrets Manager for hf_token round-trip.
  • BenchmarkJob, RecommendationJob — re-exports of the auto-gen classes without the AI prefix.
  • FeatureGatedError, WorkloadValidationError — typed exceptions.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ZealSV ZealSV changed the title feat(serve): add SageMaker GenAI inference benchmarking and recommend… feat(serve): add SageMaker GenAI inference benchmarking and recommendation May 19, 2026
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from c0cfc77 to 747baeb Compare May 20, 2026 18:58
…ation

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over the auto-generated AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources in sagemaker-core.

ModelBuilder gains two methods:

  job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
  job = mb.start_inference_recommendation(workload, throughput,
                                          instance_types=[ml.g6.12xlarge])

After the job reaches a terminal state, customers retrieve results via
constructors that wrap the auto-gen job resource:

  result = BenchmarkResult.from_job(job)
  rec    = Recommendation.from_job(job)
  endpoint = rec.deploy(role=...)

Public surface added under sagemaker.serve:

* Workload — typed factory (synthetic) that builds the WorkloadSpec
  inline JSON envelope. Extra AIPerf parameters flow through **params
  unchecked and are validated server-side.
* BenchmarkResult / BenchmarkMetrics / BenchmarkMetric — parses the
  AIPerf profile_export_aiperf.json out of the output.tar.gz artifact.
* Recommendation — wrapper around one row of an AIRecommendationJob's
  recommendations list. .deploy() prefers the ModelPackage path,
  falls back to a raw image_uri + S3 channels container definition.
* Secret — helper around AWS Secrets Manager for hf_token round-trip.
* BenchmarkJob, RecommendationJob — re-exports of the auto-gen classes
  without the AI prefix.
* FeatureGatedError, WorkloadValidationError — typed exceptions.
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from 747baeb to bb8c26a Compare May 20, 2026 20:34
@ZealSV ZealSV requested a deployment to manual-approval May 20, 2026 20:35 — with GitHub Actions Waiting
@ZealSV ZealSV requested a deployment to manual-approval May 20, 2026 20:35 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant