Endpoints supported by Amazon Bedrock - Amazon Bedrock
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Endpoints supported by Amazon Bedrock

Amazon Bedrock supports various endpoints for performing inference operations.

Inference operations

Amazon Bedrock supports the following primary two end points for performing inference programmatically:

Endpoint Supported APIs Description
bedrock-mantle.{region}.api.aws Responses API / Chat Completions API / Messages API Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the OpenAI-compatible endpoints and the Anthropic Messages API.
bedrock-runtime.{region}.amazonaws.com InvokeModel / Converse / Chat Completions / Messages API Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the InvokeModel/Converse/Chat Completions/Messages APIs. Read more on Amazon Bedrock Runtime APIs here.

For new applications, we recommend the bedrock-mantle endpoint. It supports OpenAI-compatible APIs (Responses and Chat Completions) and the Anthropic Messages API, includes built-in tool use and stateful conversation management, and lets you bring an existing OpenAI SDK codebase to Amazon Bedrock by changing only the base URL and API key. The bedrock-runtime endpoint remains fully supported and is the right choice when you're using the Bedrock-native InvokeModel or Converse APIs, or when the model you want isn't yet available on bedrock-mantle. To see which endpoint each model supports, see Endpoint availability.

The following tables compare what's available on each endpoint.

Throughput and quota approach

Each endpoint uses a different approach to managing throughput.

  • bedrock-runtime – In many traditional multi-tenant services, the architecture is designed around per-account quotas to manage fair-share access to shared resources. This is the approach used with bedrock-runtime. Each model has fixed throughput quotas (RPM and TPM) that you can request increases for. For details, see Quotas for the bedrock-runtime endpoint.

  • bedrock-mantle – This endpoint is architected with advanced scheduling and work-queueing mechanisms that deliver fair-share distribution while supporting higher initial throughput limits. This design also allows bedrock-mantle to host a broad set of models and deliver the full breadth of capabilities available across the model catalog. In most cases, requests are served immediately. In some cases, a request may be briefly queued while in-flight workloads complete and throughput becomes available. For details, see Quotas for the bedrock-mantle endpoint and Scaling and throughput best practices.

Pricing

Per-token pricing for the same model is identical on bedrock-runtime and bedrock-mantle. Choose an endpoint based on the APIs and capabilities you need, not cost. For current pricing, see Amazon Bedrock pricing.

When to choose each endpoint

Start with bedrock-mantle when you want to:

  • Use the Responses API, Chat Completions API, or Messages API with stateful, multi-turn conversations.

  • Bring existing OpenAI SDK code to Amazon Bedrock by changing only the base URL and API key.

  • Run asynchronous or long-running inference workloads.

  • Build agentic workflows with server-side tool use or pre-configured tools.

  • Use Projects (OpenAI-compatible) or Workspaces (Anthropic-compatible) to isolate workloads and track cost and usage at the application level.

Use bedrock-runtime when you want to:

Both endpoints can be used together from the same application — choose per use case.

Reduce data egress costs with VPC interface endpoints

If you are calling Amazon Bedrock from within a VPC, consider using VPC interface endpoints (AWS PrivateLink) to keep traffic within the AWS network and avoid data egress charges associated with NAT gateways or internet gateways.