Interface ReinforcementTuningHyperParameters

Hyperparameters for Reinforcement Tuning.

interface ReinforcementTuningHyperParameters {
    adapterSize?: AdapterSize;
    batchSize?: number;
    checkpointInterval?: number;
    epochCount?: string;
    evaluateInterval?: number;
    learningRateMultiplier?: number;
    maxOutputTokens?: number;
    samplesPerPrompt?: number;
    thinkingLevel?: ReinforcementTuningThinkingLevel;
}

Properties

adapterSize?: AdapterSize

Adapter size for Reinforcement Tuning.

batchSize?: number

Batch size for the tuning job. How many prompts to process at a train step. If not set, the batch size will be determined automatically.

checkpointInterval?: number

How often (in steps) to save checkpoints during training. If not set, one checkpoint per epoch will be saved.

epochCount?: string

Number of training epochs for the tuning job.

evaluateInterval?: number

How often (in steps) to evaluate the tuning job during training. If not set, evaluation will run per epoch.

learningRateMultiplier?: number

Learning rate multiplier for Reinforcement Learning.

maxOutputTokens?: number

The maximum number of tokens to generate per prompt. If not set, defaults to 32768.

samplesPerPrompt?: number

Number of different responses to generate per prompt during tuning.

Indicates the maximum thinking depth. Use with earlier models shall result in error.