Endpoints supported by Amazon Bedrock
Amazon Bedrock supports various endpoints for performing inference operations.
Inference operations
Amazon Bedrock supports the following primary two end points for performing inference programmatically:
| Endpoint | Supported APIs | Description |
|---|---|---|
bedrock-mantle.{region}.api.aws |
Responses API / Chat Completions API / Messages API | Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the OpenAI-compatible endpoints and the Anthropic Messages API. |
bedrock-runtime.{region}.amazonaws.com |
InvokeModel / Converse / Chat Completions / Messages API | Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the InvokeModel/Converse/Chat Completions/Messages APIs. Read more on Amazon Bedrock Runtime APIs here |
For new applications, we recommend the bedrock-mantle endpoint. It supports OpenAI-compatible APIs (Responses and Chat Completions) and the Anthropic Messages API, includes built-in tool use and stateful conversation management, and lets you bring an existing OpenAI SDK codebase to Amazon Bedrock by changing only the base URL and API key. The bedrock-runtime endpoint remains fully supported and is the right choice when you're using the Bedrock-native InvokeModel or Converse APIs, or when the model you want isn't yet available on bedrock-mantle. To see which endpoint each model supports, see Endpoint availability.
The following tables compare what's available on each endpoint.
| API | bedrock-runtime |
bedrock-mantle |
|---|---|---|
| InvokeModel | ||
| Converse / ConverseStream | ||
| Chat Completions (OpenAI-compatible) | ||
| Responses API (OpenAI-compatible) | ||
| Messages API (Anthropic-native) |
| Capability | bedrock-runtime |
bedrock-mantle |
|---|---|---|
| Cross-region inference (geographic and global profiles) | ||
| Stateful conversation management | ||
| Asynchronous (long-running) inference | ||
| Client-side tool use | ||
| Server-side tool use | ||
| Pre-configured ready-to-use tools | ||
| Projects | ||
| Workspaces |
| Item | bedrock-runtime |
bedrock-mantle |
|---|---|---|
| AWS SigV4 |
||
| Bedrock API key (also works with OpenAI SDK) | ||
| Usage attribution | IAM, per-request metadata tagging | Projects, Workspaces |
Throughput and quota approach
Each endpoint uses a different approach to managing throughput.
-
bedrock-runtime– In many traditional multi-tenant services, the architecture is designed around per-account quotas to manage fair-share access to shared resources. This is the approach used withbedrock-runtime. Each model has fixed throughput quotas (RPM and TPM) that you can request increases for. For details, see Quotas for the bedrock-runtime endpoint. -
bedrock-mantle– This endpoint is architected with advanced scheduling and work-queueing mechanisms that deliver fair-share distribution while supporting higher initial throughput limits. This design also allowsbedrock-mantleto host a broad set of models and deliver the full breadth of capabilities available across the model catalog. In most cases, requests are served immediately. In some cases, a request may be briefly queued while in-flight workloads complete and throughput becomes available. For details, see Quotas for the bedrock-mantle endpoint and Scaling and throughput best practices.
Pricing
Per-token pricing for the same model is identical on bedrock-runtime and bedrock-mantle. Choose an endpoint based on the APIs and capabilities you need, not cost. For current pricing, see Amazon Bedrock pricing
When to choose each endpoint
Start with bedrock-mantle when you want to:
Use the Responses API, Chat Completions API, or Messages API with stateful, multi-turn conversations.
Bring existing OpenAI SDK code to Amazon Bedrock by changing only the base URL and API key.
Run asynchronous or long-running inference workloads.
Build agentic workflows with server-side tool use or pre-configured tools.
Use Projects (OpenAI-compatible) or Workspaces (Anthropic-compatible) to isolate workloads and track cost and usage at the application level.
Use bedrock-runtime when you want to:
Continue using the Bedrock-native InvokeModel or Converse APIs.
Use a model that isn't yet available on
bedrock-mantle. See Endpoint availability.
Both endpoints can be used together from the same application — choose per use case.
Reduce data egress costs with VPC interface endpoints
If you are calling Amazon Bedrock from within a VPC, consider using VPC interface endpoints (AWS PrivateLink) to keep traffic within the AWS network and avoid data egress charges associated with NAT gateways or internet gateways.