Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.Create and manage fine-tuning jobs for Amazon Nova models
You can create a reinforcement fine-tuning (RFT) job using the Amazon Bedrock console or API. The RFT job can take a few
hours depending on the size of your training data, number of epochs, and complexity of your reward functions.
Prerequisites
-
Create an IAM service role with the required permissions. For comprehensive security and permissions information including
RFT-specific permissions, see Access and security for Amazon Nova models.
-
(Optional) Encrypt input and output data, your RFT job, or inference requests made to custom models.
For more information, see
Encryption of custom models.
Create your RFT job
Choose the tab for your preferred method, and then follow the steps:
- Console
-
To submit an RFT job in the console, carry out the following steps:
-
Open the Amazon Bedrock console and navigate to Custom models under Tune.
-
Choose Create, then Create reinforcement fine-tuning job.
-
In the Model details section, choose Amazon Nova 2 Lite as your base model.
-
In the Customization details section, enter the customization name.
-
In the Training data section, choose your data source. Either select from your
available invocation logs stored in Amazon S3, or select the Amazon S3 location of your training
dataset file, or upload a file directly from your device.
Your training dataset should be in the OpenAI Chat Completions data format. If you provide
invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to
the Chat Completions format.
-
In the Reward function section, set up your reward mechanism:
-
Model as judge (RLAIF) - Select a Bedrock hosted base model as judge
and configure the instructions for evaluation. Use this for subjective tasks like content
moderation.
The console's Model as judge option automatically converts your configuration into a Lambda
function during training.
-
Custom code (RLVR) - Create custom reward functions using Python
code executed through Lambda functions. Use this for objective tasks like code generation.
For more information, see Setting up reward functions for Amazon Nova models.
-
(Optional) In the Hyperparameters section, adjust training parameters or use
default values.
-
In the Output data section, enter the Amazon S3 location where Amazon Bedrock should save
job outputs.
-
In the Role configuration section, either choose an existing role from the dropdown
list or enter a name for the service role to create.
-
(Optional) In the Additional configuration section, configure the validation data by
pointing to an Amazon S3 bucket, KMS encryption settings, and job and model tags.
-
Choose Create reinforcement fine-tuning job to begin the job.
- API
-
Send a CreateModelCustomizationJob request with customizationType set to REINFORCEMENT_FINE_TUNING.
Required fields: roleArn, baseModelIdentifier, customModelName, jobName, trainingDataConfig, outputDataConfig, rftConfig
Example request:
{
"roleArn": "arn:aws:iam::123456789012:role/BedrockRFTRole",
"baseModelIdentifier": "amazon.nova-2.0",
"customModelName": "my-rft-model",
"jobName": "my-rft-job",
"customizationType": "REINFORCEMENT_FINE_TUNING",
"trainingDataConfig": {
"s3Uri": "s3://my-bucket/training-data.jsonl"
},
"customizationConfig": {
"rftConfig" : {
"graderConfig": {
"lambdaGrader": {
"lambdaArn": "arn:aws:lambda:us-east-1:123456789012:function:function-name"
}
},
"hyperParameters": {
"batchSize": 64,
"epochCount": 2,
"evalInterval": 10,
"inferenceMaxTokens": 8192,
"learningRate": 0.00001,
"maxPromptLength": 4096,
"reasoningEffort": "high",
"trainingSamplePerPrompt": 4
}
}
},
"outputDataConfig": {
"s3Uri": "s3://my-bucket/rft-output/"
}
}
Python API sample request:
import boto3
bedrock = boto3.client(service_name='bedrock')
# Set parameters
customizationType = "REINFORCEMENT_FINE_TUNING"
baseModelIdentifier = "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-lite-v1:0:256k"
roleArn = "${your-customization-role-arn}"
jobName = "MyFineTuningJob"
customModelName = "MyCustomModel"
customizationConfig = {
'rftConfig' : {
'graderConfig': {
'lambdaGrader': {
'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:function-name'
}
},
'hyperParameters': {
'batchSize': 64,
'epochCount': 2,
'evalInterval': 10,
'inferenceMaxTokens': 8192,
'learningRate':0.00001,
'maxPromptLength': 4096,
'reasoningEffort': 'high',
'trainingSamplePerPrompt':4
}
}
}
trainingDataConfig = {"s3Uri": "s3://${training-bucket}/myInputData/train.jsonl"}
outputDataConfig = {"s3Uri": "s3://${output-bucket}/myOutputData"}
# Create job
response_ft = bedrock.create_model_customization_job(
jobName=jobName,
customModelName=customModelName,
roleArn=roleArn,
baseModelIdentifier=baseModelIdentifier,
customizationConfig=customizationConfig,
trainingDataConfig=trainingDataConfig,
outputDataConfig=outputDataConfig,
customizationType=customizationType
)
jobArn = response_ft['jobArn']
Monitor your RFT training job
Amazon Bedrock provides real-time monitoring with visual graphs and metrics during RFT training. These metrics help you understand whether the model converges properly and if the reward function effectively guides the learning process.
Job status tracking
You can monitor your RFT job status through the validation and training phases in the Amazon Bedrock console.
Completion indicators:
-
Job status changes to Completed when training completes successfully
-
Custom model ARN becomes available for deployment
-
Training metrics reach convergence thresholds
Real-time training metrics
Amazon Bedrock provides real-time monitoring during RFT training with visual graphs displaying training and validation metrics.
Core training metrics
-
Training loss - Measures how well the model is learning from the training data
-
Training reward statistics - Shows reward scores assigned by your reward functions
-
Reward margin - Measures the difference between good and bad response rewards
-
Accuracy on training and validation sets - Shows model performance on both the training and held-out data
Detailed metric categories
Reward metrics – critic/rewards/mean, critic/rewards/max, critic/rewards/min (reward distribution), and val-score/rewards/mean@1 (validation rewards)
Model behavior – actor/entropy (policy variation; higher equals more exploratory)
Training health – actor/pg_loss (policy gradient loss), actor/pg_clipfrac (frequency of clipped updates), and actor/grad_norm (gradient magnitude)
Response characteristics – prompt_length/mean, prompt_length/max, prompt_length/min (input token statistics), response_length/mean, response_length/max, response_length/min (output token statistics), and response/aborted_ratio (incomplete generation rate; 0 equals all completed)
Performance – perf/throughput (training throughput), perf/time_per_step (time per training step), and timing_per_token_ms/* (per-token processing times)
Resource usage – perf/max_memory_allocated_gb, perf/max_memory_reserved_gb (GPU memory), and perf/cpu_memory_used_gb (CPU memory)
Training progress visualization
The console displays interactive graphs that update in real-time as your RFT job progresses. These visualizations can help you:
-
Track convergence toward optimal performance
-
Identify potential training issues early
-
Determine optimal stopping points
-
Compare performance across different epochs
Set up inference
After job completion, deploy the RFT model for on-demand inference or use Provisioned Throughput for consistent performance.
For setting up inference, see Set up inference for a custom model.
Use Test in Playground to evaluate and compare responses with the base model.
For evaluating your completed RFT model, see Evaluate your RFT model.