OptionalautoraterScores parsed responses for autorater use cases by using a model to compute the reward.
OptionalcloudScores parsed responses by calling a Cloud Run service.
OptionalcodeScores parsed responses for code execution use cases.
OptionalparseDefines how to parse sample response.
OptionalrewardA unique reward name used to identify each single reinforcement tuning reward.
OptionalstringScores parsed responses for simple string matching use cases against reference answer without writing python code.
Single reinforcement tuning reward config.