Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.Deploy a custom model for on-demand inference
After you create a custom model with a model customization job or import a SageMaker AI-trained custom Amazon Nova model, you can
set up on-demand inference for the model. With on-demand inference, you only pay for what you use and you don't need to set up
provisioned compute resources.
To set up on-demand inference for a custom model, you deploy it with a custom model deployment. After
you deploy your custom model, you use the deployment's Amazon Resource Name (ARN) as the modelId
parameter when you submit prompts and generate responses with model inference.
For information about on-demand inference pricing, see Amazon Bedrock pricing.
You can deploy a custom model for on-demand inference in the following Regions (for more information about Regions supported in Amazon Bedrock, see Amazon Bedrock endpoints and quotas):
-
US East (N. Virginia)
-
US West (Oregon)
Prerequisites for deploying a custom model for on-demand inference
Before you can deploy a custom model for on-demand inference, make sure you meet the following requirements:
-
You must use the US East (N. Virginia) or US West (Oregon) region.
-
You must customize the model on or after 7/16/2025. For supported models, see Supported base models.
-
Your account must have permission to access the model that you are deploying. For more information about model customization access and security, see Model customization access and
security.
-
If the model is encrypted with a AWS KMS key, you must have permission to use that key. For more information, see Encryption of custom models.
Supported base models
You can set up on-demand inference for the following base models:
Deploy a custom model
You can deploy a custom model with the Amazon Bedrock console, AWS Command Line Interface, or AWS SDKs. For information about using the deployment for inference, see Use a deployment for on-demand inference.
- Console
-
You deploy a custom model from the Custom models page as follows. You can also deploy a model from the Custom model on-demand page with the same fields.
To find this page, under Infer in the navigation pane, choose Custom model on-demand.
To deploy a custom model
-
Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at
https://eusc-de-east-1.console.amazonaws-eusc.eu/bedrock.
-
From the left navigation pane, choose Custom models under Tune.
-
In the Models tab, choose the radio button for the model you want to deploy.
-
Choose Set up inference and choose Deploy for on-demand.
-
In Deployment details, provide the following information:
-
Deployment Name (required) – Enter a unique name for your deployment.
-
Description (optional) – Enter a description for your deployment.
-
Tags (optional) – Add tags for cost allocation and resource management.
-
Choose Create.
When the deployment's status is Active, your custom model is ready for on-demand inference. For
more information about using the custom model, see Use a deployment for on-demand inference.
- CLI
-
To deploy a custom model for on-demand inference using the AWS Command Line Interface, use the
create-custom-model-deployment command with your custom model's Amazon Resource Name (ARN).
This command uses the CreateCustomModelDeployment API operation. The response includes the deployment's ARN. When the deployment is active, you use this ARN as the modelId when
making inference requests. For information about using the deployment for inference, see Use a deployment for on-demand inference.
aws bedrock create-custom-model-deployment \
--model-deployment-name "Unique name" \
--model-arn "Custom Model ARN" \
--description "Deployment description" \
--tags '[
{
"key": "Environment",
"value": "Production"
},
{
"key": "Team",
"value": "ML-Engineering"
},
{
"key": "Project",
"value": "CustomerSupport"
}
]' \
--client-request-token "unique-deployment-token" \
--region region
- API
-
To deploy a custom model for on-demand inference, use the
CreateCustomModelDeployment API operation with your custom model's Amazon Resource Name (ARN).
The response includes the deployment's ARN. When the deployment is active, you use this ARN as the modelId when
making inference requests. For information about using the deployment for inference, see Use a deployment for on-demand inference.
The following code shows how to use the SDK for Python (Boto3) to deploy a custom model.
def create_custom_model_deployment(bedrock_client):
"""Create a custom model deployment
Args:
bedrock_client: A boto3 Amazon Bedrock client for making API calls
Returns:
str: The ARN of the new custom model deployment
Raises:
Exception: If there is an error creating the deployment
"""
try:
response = bedrock_client.create_custom_model_deployment(
modelDeploymentName="Unique deployment name",
modelArn="Custom Model ARN",
description="Deployment description",
tags=[
{'key': 'Environment', 'value': 'Production'},
{'key': 'Team', 'value': 'ML-Engineering'},
{'key': 'Project', 'value': 'CustomerSupport'}
],
clientRequestToken=f"deployment-{uuid.uuid4()}"
)
deployment_arn = response['customModelDeploymentArn']
print(f"Deployment created: {deployment_arn}")
return deployment_arn
except Exception as e:
print(f"Error creating deployment: {str(e)}")
raise
Use a deployment for on-demand inference
After
you deploy your custom model, you use the deployment's Amazon Resource Name (ARN) as the modelId
parameter when you submit prompts and generate responses with model inference.
For information about making inference requests, see the following topics:
Delete a custom model deployment
After you are finished using your model for on-demand inference, you can delete the deployment.
After you delete the deployment, you can't use it for on-demand inference but deployment deletion doesn't delete the underlying custom
model.
You can delete a custom model deployment with the Amazon Bedrock console, AWS Command Line Interface, or AWS SDKs.
Deleting a custom model deployment is irreversible. Make sure you no longer need the deployment before
proceeding with the deletion. If you need to use the custom model for on-demand inference again, you
must create a new deployment.
- Console
-
To delete a custom model deployment
-
In the navigation pane, under Infer, choose
Custom model on-demand.
-
Choose the custom model deployment you want to delete.
-
Choose Delete.
-
In the confirmation dialog, enter the deployment name to confirm the deletion.
-
Choose Delete to confirm deletion.
- CLI
-
To delete a custom model deployment using the AWS Command Line Interface, use the
delete-custom-model-deployment command with your deployment identifier.
This command uses the
DeleteCustomModelDeployment API operation.
aws bedrock delete-custom-model-deployment \
--custom-model-deployment-identifier "deployment-arn-or-name" \
--region region
- API
-
To delete a custom model deployment programmatically, use the DeleteCustomModelDeployment
API operation with the deployment's Amazon Resource Name (ARN) or name. The following code shows how to use the SDK for Python (Boto3) to delete a custom model deployment.
def delete_custom_model_deployment(bedrock_client):
"""Delete a custom model deployment
Args:
bedrock_client: A boto3 Amazon Bedrock client for making API calls
Returns:
dict: The response from the delete operation
Raises:
Exception: If there is an error deleting the deployment
"""
try:
response = bedrock_client.delete_custom_model_deployment(
customModelDeploymentIdentifier="Deployment identifier"
)
print("Deleting deployment...")
return response
except Exception as e:
print(f"Error deleting deployment: {str(e)}")
raise