This documentation is a draft for private preview for regions in the AWS European Sovereign Cloud. Documentation content will continue to evolve. Published: December 31, 2025.
Real-time inference
Real-time inference is ideal for inference workloads where you have real-time,
interactive, low latency requirements. You can deploy your model to SageMaker AI hosting services
and get an endpoint that can be used for inference. These endpoints are fully managed and
support autoscaling (see Automatic scaling of Amazon SageMaker AI models).