Embed v4 - Amazon Bedrock
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Embed v4

Cohere — Embed v4

Model Details

Embed v4 is Cohere's unified multimodal embedding model that processes text, images, and mixed content in a single model for search and RAG. For more information about model development and performance, see the model/service card.

  • Model launch date: Apr 15, 2025

  • Model EOL date: N/A

  • End User License Agreements and Terms of Use: View

  • Model lifecycle: Active

  • Context window: 128K tokens

Input Modalities Output Modalities APIs supported Endpoints supported
No AudioYes EmbeddingNo ResponsesYes bedrock-runtime
Yes ImageNo ImageNo Chat CompletionsNo bedrock-mantle
No SpeechNo SpeechYes Invoke
Yes TextNo TextNo Converse
No VideoNo Video

Capabilities and Features

Bedrock Features

Features supported using bedrock-runtime endpoint

Pricing

For pricing, please refer to the Amazon Bedrock Pricing page.

Programmatic Access

Use the following model IDs and endpoint URLs to access this model programmatically. For more information about the available APIs and endpoints, see APIs supported and Endpoints supported.

Endpoint Model ID In-Region endpoint URL Geo inference ID Global inference ID
bedrock-runtime cohere.embed-v4:0 https://bedrock-runtime.{region}.amazonaws.com

us.cohere.embed-v4:0

eu.cohere.embed-v4:0

global.cohere.embed-v4:0

For example, if region is us-east-1 (N. Virginia), then the bedrock-runtime endpoint URL will be "https://bedrock-runtime.us-east-1.amazonaws.com" and for bedrock-mantle will be "https://bedrock-mantle.us-east-1.api.aws/v1".

Service Tiers

Amazon Bedrock offers multiple service tiers to match your workload requirements. Standard provides pay-per-token access with no commitment. Priority offers higher throughput with a time-based commitment. Flex provides lower-cost access for flexible, non-time-sensitive workloads. Reserved provides dedicated throughput with a term commitment for predictable workloads. For more information, see service tiers.

Standard Priority Flex Reserved
Yes No No No

Regional Availability

Regional availability at a glance

Bedrock offers three inference options: In-Region keeps requests within a single Region for strict compliance, Geo Cross-Region routes across Regions within a geography (US, EU, etc.) for higher throughput while respecting data residency, and Global Cross-Region routes anywhere worldwide for maximum throughput when there are no residency constraints. Refer to the Regional availability page for more details.

Region In-Region Geo Global
us-east-1 (N. Virginia)YesYesYes
us-east-2 (Ohio)NoYesYes
us-west-1 (N. California)NoYesYes
us-west-2 (Oregon)NoYesYes
ca-central-1 (Canada)NoNoYes
eu-central-1 (Frankfurt)NoYesYes
eu-central-2 (Zurich)NoYesYes
eu-north-1 (Stockholm)NoYesYes
eu-south-1 (Milan)NoYesYes
eu-south-2 (Spain)NoYesYes
eu-west-1 (Ireland)YesYesYes
eu-west-2 (London)NoYesYes
eu-west-3 (Paris)NoYesYes
ap-northeast-1 (Tokyo)YesNoYes
ap-northeast-2 (Seoul)NoNoYes
ap-northeast-3 (Osaka)NoNoYes
ap-south-1 (Mumbai)NoNoYes
ap-south-2 (Hyderabad)NoNoYes
ap-southeast-1 (Singapore)NoNoYes
ap-southeast-2 (Sydney)NoNoYes
ap-southeast-3 (Jakarta)NoNoYes
ap-southeast-4 (Melbourne)NoNoYes
sa-east-1 (São Paulo)NoNoYes

Geo inference details

Geo: US

Geo Inference ID: us.cohere.embed-v4:0

Source Region Destination Regions
us-east-1 (N. Virginia)us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon)
us-east-2 (Ohio)us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon)
us-west-1 (N. California)us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-1 (N. California), us-west-2 (Oregon)
us-west-2 (Oregon)us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon)

Geo: EU

Geo Inference ID: eu.cohere.embed-v4:0

Source Region Destination Regions
eu-central-1 (Frankfurt)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-central-2 (Zurich)eu-central-1 (Frankfurt), eu-central-2 (Zurich), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-north-1 (Stockholm)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-south-1 (Milan)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-south-2 (Spain)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-west-1 (Ireland)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)
eu-west-2 (London)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris)
eu-west-3 (Paris)eu-central-1 (Frankfurt), eu-north-1 (Stockholm), eu-south-1 (Milan), eu-south-2 (Spain), eu-west-1 (Ireland), eu-west-3 (Paris)

Global inference details

Global Inference ID Americas EMEA Asia Pacific
global.cohere.embed-v4:0
  • us-east-1 (N. Virginia)

  • us-east-2 (Ohio)

  • us-west-1 (N. California)

  • us-west-2 (Oregon)

  • ca-central-1 (Canada)

  • sa-east-1 (São Paulo)

  • eu-central-1 (Frankfurt)

  • eu-central-2 (Zurich)

  • eu-north-1 (Stockholm)

  • eu-south-1 (Milan)

  • eu-south-2 (Spain)

  • eu-west-1 (Ireland)

  • eu-west-2 (London)

  • eu-west-3 (Paris)

  • ap-northeast-1 (Tokyo)

  • ap-northeast-2 (Seoul)

  • ap-northeast-3 (Osaka)

  • ap-south-1 (Mumbai)

  • ap-south-2 (Hyderabad)

  • ap-southeast-1 (Singapore)

  • ap-southeast-2 (Sydney)

  • ap-southeast-3 (Jakarta)

  • ap-southeast-4 (Melbourne)

Quotas and Limits

Your AWS account has default quotas to maintain the performance of the service and to ensure appropriate usage of Amazon Bedrock. The default quotas assigned to an account might be updated depending on regional factors, payment history, fraudulent usage, and/or approval of a quota increase request. For more details, please refer to Quotas documentation.

Quota Default value
On-demand requests per minute1,000
On-demand tokens per minute150,000
Cross-region requests per minute2,000
Cross-region tokens per minute300,000
Max tokens per day216,000,000

These are default quotas shown for us-east-1. To see quotas and limits for your account, please log in to your AWS Console.

Sample Code

Step 1 - AWS Account: If you have an AWS account already, skip this step. If you are new to AWS, sign up for an AWS account.

Step 2 - API key: Go to the Amazon Bedrock console and generate a long-term API key.

Step 3 - Get the SDK: To use this getting started guide, you must have Python already installed. Then install the relevant software depending on the APIs you are using.

pip install boto3

Step 4 - Set environment variables: Configure your environment to use the API key for authentication.

AWS_BEARER_TOKEN_BEDROCK="<provide your Bedrock API key>"

Step 5 - Run your first inference request: Save the file as bedrock-first-request.py

Invoke API
import json import boto3 client = boto3.client('bedrock-runtime', region_name='us-east-1') response = client.invoke_model( modelId='cohere.embed-v4:0', body=json.dumps({ 'messages': [{ 'role': 'user', 'content': 'Can you explain the features of Amazon Bedrock?'}], 'max_tokens': 1024 }) ) print(json.loads(response['body'].read()))