OptionaladapterAdapter size for Reinforcement Tuning.
OptionalbatchBatch size for the tuning job. How many prompts to process at a train step. If not set, the batch size will be determined automatically.
OptionalcheckpointHow often (in steps) to save checkpoints during training. If not set, one checkpoint per epoch will be saved.
OptionalepochNumber of training epochs for the tuning job.
OptionalevaluateHow often (in steps) to evaluate the tuning job during training. If not set, evaluation will run per epoch.
OptionallearningLearning rate multiplier for Reinforcement Learning.
OptionalmaxThe maximum number of tokens to generate per prompt. If not set, defaults to 32768.
OptionalsamplesNumber of different responses to generate per prompt during tuning.
OptionalthinkingIndicates the maximum thinking depth. Use with earlier models shall result in error.
Hyperparameters for Reinforcement Tuning.