Endpoints supported by Amazon Bedrock

Amazon Bedrock supports various endpoints for performing inference operations.

Inference operations

Amazon Bedrock supports the following primary two end points for performing inference programmatically:

Endpoint	Supported APIs	Description
`bedrock-mantle.{region}.api.aws`	Responses API / Chat Completions API / Messages API	Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the OpenAI-compatible endpoints and the Anthropic Messages API.
`bedrock-runtime.{region}.amazonaws.com`	InvokeModel / Converse / Chat Completions / Messages API	Region-specific endpoints for making inference requests for models hosted in Amazon Bedrock using the InvokeModel/Converse/Chat Completions/Messages APIs. Read more on Amazon Bedrock Runtime APIs here.

For new applications, we recommend the bedrock-mantle endpoint. It supports OpenAI-compatible APIs (Responses and Chat Completions) and the Anthropic Messages API, includes built-in tool use and stateful conversation management, and lets you bring an existing OpenAI SDK codebase to Amazon Bedrock by changing only the base URL and API key. The bedrock-runtime endpoint remains fully supported and is the right choice when you're using the Bedrock-native InvokeModel or Converse APIs, or when the model you want isn't yet available on bedrock-mantle. To see which endpoint each model supports, see Endpoint availability.

The following tables compare what's available on each endpoint.

API support
API	`bedrock-runtime`	`bedrock-mantle`
InvokeModel
Converse / ConverseStream
Chat Completions (OpenAI-compatible)
Responses API (OpenAI-compatible)
Messages API (Anthropic-native)

Inference capabilities
Capability	`bedrock-runtime`	`bedrock-mantle`
Cross-region inference (geographic and global profiles)
Stateful conversation management
Asynchronous (long-running) inference
Client-side tool use
Server-side tool use
Pre-configured ready-to-use tools
Projects
Workspaces

Operational
Item	`bedrock-runtime`	`bedrock-mantle`
AWS SigV4 authentication
Bedrock API key (also works with OpenAI SDK)
Usage attribution	IAM, per-request metadata tagging	Projects, Workspaces

Throughput and quota approach

Each endpoint uses a different approach to managing throughput.

bedrock-runtime – In many traditional multi-tenant services, the architecture is designed around per-account quotas to manage fair-share access to shared resources. This is the approach used with bedrock-runtime. Each model has fixed throughput quotas (RPM and TPM) that you can request increases for. For details, see Quotas for the bedrock-runtime endpoint.
bedrock-mantle – This endpoint is architected with advanced scheduling and work-queueing mechanisms that deliver fair-share distribution while supporting higher initial throughput limits. This design also allows bedrock-mantle to host a broad set of models and deliver the full breadth of capabilities available across the model catalog. In most cases, requests are served immediately. In some cases, a request may be briefly queued while in-flight workloads complete and throughput becomes available. For details, see Quotas for the bedrock-mantle endpoint and Scaling and throughput best practices.

Pricing

Per-token pricing for the same model is identical on bedrock-runtime and bedrock-mantle. Choose an endpoint based on the APIs and capabilities you need, not cost. For current pricing, see Amazon Bedrock pricing.

When to choose each endpoint

Start with bedrock-mantle when you want to:

Use the Responses API, Chat Completions API, or Messages API with stateful, multi-turn conversations.
Bring existing OpenAI SDK code to Amazon Bedrock by changing only the base URL and API key.
Run asynchronous or long-running inference workloads.
Build agentic workflows with server-side tool use or pre-configured tools.
Use Projects (OpenAI-compatible) or Workspaces (Anthropic-compatible) to isolate workloads and track cost and usage at the application level.

Use bedrock-runtime when you want to:

Continue using the Bedrock-native InvokeModel or Converse APIs.
Use a model that isn't yet available on bedrock-mantle. See Endpoint availability.

Both endpoints can be used together from the same application — choose per use case.

Reduce data egress costs with VPC interface endpoints

If you are calling Amazon Bedrock from within a VPC, consider using VPC interface endpoints (AWS PrivateLink) to keep traffic within the AWS network and avoid data egress charges associated with NAT gateways or internet gateways.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Build

APIs