feat(serve): add SageMaker GenAI inference benchmarking and recommendation by ZealSV · Pull Request #5874 · aws/sagemaker-python-sdk

ZealSV · 2026-05-19T17:38:39Z

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over the auto-generated AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources in sagemaker-core.

ModelBuilder gains two methods:

job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
job = mb.start_inference_recommendation(workload, throughput,
instance_types=[ml.g6.12xlarge])

After the job reaches a terminal state, customers retrieve results via constructors that wrap the auto-gen job resource:

result = BenchmarkResult.from_job(job)
rec = Recommendation.from_job(job)
endpoint = rec.deploy(role=...)

Public surface added under sagemaker.serve:

Workload — typed factory (synthetic) that builds the WorkloadSpec inline JSON envelope. Extra AIPerf parameters flow through **params unchecked and are validated server-side.
BenchmarkResult / BenchmarkMetrics / BenchmarkMetric — parses the AIPerf profile_export_aiperf.json out of the output.tar.gz artifact.
Recommendation — wrapper around one row of an AIRecommendationJob's recommendations list. .deploy() prefers the ModelPackage path, falls back to a raw image_uri + S3 channels container definition.
Secret — helper around AWS Secrets Manager for hf_token round-trip.
BenchmarkJob, RecommendationJob — re-exports of the auto-gen classes without the AI prefix.
FeatureGatedError, WorkloadValidationError — typed exceptions.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ation Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer over the auto-generated AIBenchmarkJob, AIRecommendationJob, and AIWorkloadConfig resources in sagemaker-core. ModelBuilder gains two methods: job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...)) job = mb.start_inference_recommendation(workload, throughput, instance_types=[ml.g6.12xlarge]) After the job reaches a terminal state, customers retrieve results via constructors that wrap the auto-gen job resource: result = BenchmarkResult.from_job(job) rec = Recommendation.from_job(job) endpoint = rec.deploy(role=...) Public surface added under sagemaker.serve: * Workload — typed factory (synthetic) that builds the WorkloadSpec inline JSON envelope. Extra AIPerf parameters flow through **params unchecked and are validated server-side. * BenchmarkResult / BenchmarkMetrics / BenchmarkMetric — parses the AIPerf profile_export_aiperf.json out of the output.tar.gz artifact. * Recommendation — wrapper around one row of an AIRecommendationJob's recommendations list. .deploy() prefers the ModelPackage path, falls back to a raw image_uri + S3 channels container definition. * Secret — helper around AWS Secrets Manager for hf_token round-trip. * BenchmarkJob, RecommendationJob — re-exports of the auto-gen classes without the AI prefix. * FeatureGatedError, WorkloadValidationError — typed exceptions.

ZealSV had a problem deploying to manual-approval May 19, 2026 17:38 — with GitHub Actions Error

ZealSV changed the title ~~feat(serve): add SageMaker GenAI inference benchmarking and recommend…~~ feat(serve): add SageMaker GenAI inference benchmarking and recommendation May 19, 2026

ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from c0cfc77 to 747baeb Compare May 20, 2026 18:58

ZealSV had a problem deploying to manual-approval May 20, 2026 18:58 — with GitHub Actions Error

ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from 747baeb to bb8c26a Compare May 20, 2026 20:34

ZealSV requested a deployment to manual-approval May 20, 2026 20:35 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874
ZealSV wants to merge 1 commit into
aws:masterfrom
ZealSV:feature/lumen-ai-inference-recommender

ZealSV commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZealSV commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZealSV commented May 19, 2026 •

edited

Loading