Returns the operations Resource.
activate(name, body=None, x__xgafv=None)
Activates an OnlineEvaluator.
Close httplib2 connections.
create(parent, body=None, x__xgafv=None)
Creates an OnlineEvaluator in the given project and location.
Deletes an OnlineEvaluator.
Gets details of an OnlineEvaluator.
list(parent, filter=None, orderBy=None, pageSize=None, pageToken=None, x__xgafv=None)
Lists the OnlineEvaluators for the given project and location.
Retrieves the next page of results.
patch(name, body=None, updateMask=None, x__xgafv=None)
Updates the fields of an OnlineEvaluator.
suspend(name, body=None, x__xgafv=None)
Suspends an OnlineEvaluator. When an OnlineEvaluator is suspended, it won't run any evaluations until it is activated again.
activate(name, body=None, x__xgafv=None)
Activates an OnlineEvaluator.
Args:
name: string, Required. The name of the OnlineEvaluator to activate. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}. (required)
body: object, The request body.
The object takes the form of:
{ # Request message for ActivateOnlineEvaluator.
}
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}
close()
Close httplib2 connections.
create(parent, body=None, x__xgafv=None)
Creates an OnlineEvaluator in the given project and location.
Args:
parent: string, Required. The parent resource where the OnlineEvaluator will be created. Format: projects/{project}/locations/{location}. (required)
body: object, The request body.
The object takes the form of:
{ # An OnlineEvaluator contains the configuration for an Online Evaluation.
"agentResource": "A String", # Required. Immutable. The name of the agent that the OnlineEvaluator evaluates periodically. This value is used to filter the traces with a matching cloud.resource_id and link the evaluation results with relevant dashboards/UIs. This field is immutable. Once set, it cannot be changed.
"cloudObservability": { # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging). # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging).
"logView": "A String", # Optional. Optional log view that will be used to query logs. If empty, the `_Default` view will be used.
"openTelemetry": { # Configuration for data source following OpenTelemetry. # Data source follows OpenTelemetry convention.
"semconvVersion": "A String", # Required. Defines which version OTel Semantic Convention the data follows. Can be "1.39.0" or newer.
},
"traceScope": { # If chosen, the online evaluator will evaluate single traces matching specified `filter`. # Scope online evaluation to single traces.
"filter": [ # Optional. A list of predicates to filter traces. Multiple predicates are combined using AND. The maximum number of predicates is 10.
{ # Defines a single filter predicate.
"duration": { # Defines a predicate for filtering based on a numeric value. # Filter on the duration of a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
"totalTokenUsage": { # Defines a predicate for filtering based on a numeric value. # Filter on the total token usage within a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
},
],
},
"traceView": "A String", # Optional. Optional trace view that will be used to query traces. If empty, the `_Default` view will be used.
},
"config": { # Configuration for sampling behavior of the OnlineEvaluator. The OnlineEvaluator runs at a fixed interval of 10 minutes. # Required. Configuration for the OnlineEvaluator.
"maxEvaluatedSamplesPerRun": "A String", # Optional. The maximum number of evaluations to perform per run. If set to 0, the number is unbounded.
"randomSampling": { # Configuration for random sampling. # Random sampling method.
"percentage": 42, # Required. The percentage of traces to sample for evaluation. Must be an integer between `1` and `100`.
},
},
"createTime": "A String", # Output only. Timestamp when the OnlineEvaluator was created.
"displayName": "A String", # Optional. Human-readable name for the `OnlineEvaluator`. The name doesn't have to be unique. The name can consist of any UTF-8 characters. The maximum length is `63` characters. If the display name exceeds max characters, an `INVALID_ARGUMENT` error is returned.
"metricSources": [ # Required. A list of metric sources to be used for evaluating samples. At least one MetricSource must be provided. Right now, only predefined metrics and registered metrics are supported. Every registered metric must have `display_name` (or `title`) and `score_range` defined. Otherwise, the evaluations will fail. The maximum number of `metric_sources` is 25.
{ # The metric source used for evaluation.
"metric": { # The metric used for running evaluations. # Inline metric config.
"aggregationMetrics": [ # Optional. The aggregation metrics to use.
"A String",
],
"bleuSpec": { # Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1. # Spec for bleu metric.
"useEffectiveOrder": True or False, # Optional. Whether to use_effective_order to compute bleu score.
},
"computationBasedMetricSpec": { # Specification for a computation based metric. # Spec for a computation based metric.
"parameters": { # Optional. A map of parameters for the metric, e.g. {"rouge_type": "rougeL"}.
"a_key": "", # Properties of the object.
},
"type": "A String", # Required. The type of the computation based metric.
},
"customCodeExecutionSpec": { # Specificies a metric that is populated by evaluating user-defined Python code. # Spec for Custom Code Execution metric.
"evaluationFunction": "A String", # Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[field_name]. Example: Example input: ``` instance= EvaluationInstance( response=EvaluationInstance.InstanceData(text="The answer is 4."), reference=EvaluationInstance.InstanceData(text="4") ) ``` Example converted input: ``` { 'response': {'text': 'The answer is 4.'}, 'reference': {'text': '4'} } ``` Example python function: ``` def evaluate(instance: dict[str, Any]) -> float: if instance'response' == instance'reference': return 1.0 return 0.0 ``` CustomCodeExecutionSpec is also supported in Batch Evaluation (EvalDataset RPC) and Tuning Evaluation. Each line in the input jsonl file will be converted to dict[str, Any] and passed to the evaluation function.
},
"exactMatchSpec": { # Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0. # Spec for exact match metric.
},
"llmBasedMetricSpec": { # Specification for an LLM based metric. # Spec for an LLM based metric.
"additionalConfig": { # Optional. Optional additional configuration for the metric.
"a_key": "", # Properties of the object.
},
"judgeAutoraterConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Optional. Optional configuration for the judge LLM (Autorater).
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"metricPromptTemplate": "A String", # Required. Template for the prompt sent to the judge model.
"predefinedRubricGenerationSpec": { # The spec for a pre-defined metric. # Dynamically generate rubrics using a predefined spec.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"resultParserConfig": { # Config for parsing LLM responses. It can be used to parse the LLM response to be evaluated, or the LLM response from LLM-based metrics/Autoraters. # Optional. The parser config for the metric result.
"customCodeParserConfig": { # Configuration for parsing the LLM response using custom code. # Optional. Use custom code to parse the LLM response.
"parsingFunction": "A String", # Required. Python function for parsing results. The function should be defined within this string. The function takes a list of strings (LLM responses) and should return either a list of dictionaries (for rubrics) or a single dictionary (for a metric result). Example function signature: def parse(responses: list[str]) -> list[dict[str, Any]] | dict[str, Any]: When parsing rubrics, return a list of dictionaries, where each dictionary represents a Rubric. Example for rubrics: [ { "content": {"property": {"description": "The response is factual."}}, "type": "FACTUALITY", "importance": "HIGH" }, { "content": {"property": {"description": "The response is fluent."}}, "type": "FLUENCY", "importance": "MEDIUM" } ] When parsing critique results, return a dictionary representing a MetricResult. Example for a metric result: { "score": 0.8, "explanation": "The model followed most instructions.", "rubric_verdicts": [...] } ... code for result extraction and aggregation
},
},
"rubricGenerationSpec": { # Specification for how rubrics should be generated. # Dynamically generate rubrics using this specification.
"modelConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"promptTemplate": "A String", # Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
"rubricContentType": "A String", # The type of rubric content to be generated.
"rubricTypeOntology": [ # Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies `include_rubric_type` should be true, and the generated rubric types should be chosen from this ontology.
"A String",
],
},
"rubricGroupKey": "A String", # Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubric_groups map of EvaluationInstance.
"systemInstruction": "A String", # Optional. System instructions for the judge model.
},
"metadata": { # Metadata about the metric, used for visualization and organization. # Optional. Metadata about the metric, used for visualization and organization.
"otherMetadata": { # Optional. Flexible metadata for user-defined attributes.
"a_key": "", # Properties of the object.
},
"scoreRange": { # The range of possible scores for this metric, used for plotting. # Optional. The range of possible scores for this metric, used for plotting.
"description": "A String", # Optional. The description of the score explaining the directionality etc.
"max": 3.14, # Required. The maximum value of the score range (inclusive).
"min": 3.14, # Required. The minimum value of the score range (inclusive).
"step": 3.14, # Optional. The distance between discrete steps in the range. If unset, the range is assumed to be continuous.
},
"title": "A String", # Optional. The user-friendly name for the metric. If not set for a registered metric, it will default to the metric's display name.
},
"pairwiseMetricSpec": { # Spec for pairwise metric. # Spec for pairwise metric.
"baselineResponseFieldName": "A String", # Optional. The field name of the baseline response.
"candidateResponseFieldName": "A String", # Optional. The field name of the candidate response.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the `pairwise_choice` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pairwise metric.
"systemInstruction": "A String", # Optional. System instructions for pairwise metric.
},
"pointwiseMetricSpec": { # Spec for pointwise metric. # Spec for pointwise metric.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the `score` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pointwise metric.
"systemInstruction": "A String", # Optional. System instructions for pointwise metric.
},
"predefinedMetricSpec": { # The spec for a pre-defined metric. # The spec for a pre-defined metric.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"rougeSpec": { # Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1. # Spec for rouge metric.
"rougeType": "A String", # Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
"splitSummaries": True or False, # Optional. Whether to split summaries while using rougeLsum.
"useStemmer": True or False, # Optional. Whether to use stemmer to compute rouge score.
},
},
"metricResourceName": "A String", # Resource name for registered metric.
},
],
"name": "A String", # Identifier. The resource name of the OnlineEvaluator. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}.
"state": "A String", # Output only. The state of the OnlineEvaluator.
"stateDetails": [ # Output only. Contains additional information about the state of the OnlineEvaluator. This is used to provide more details in the event of a failure.
{ # Contains additional information about the state of the OnlineEvaluator.
"message": "A String", # Output only. Human-readable message describing the state of the OnlineEvaluator.
},
],
"updateTime": "A String", # Output only. Timestamp when the OnlineEvaluator was last updated.
}
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}
delete(name, x__xgafv=None)
Deletes an OnlineEvaluator.
Args:
name: string, Required. The name of the OnlineEvaluator to delete. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}. (required)
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}
get(name, x__xgafv=None)
Gets details of an OnlineEvaluator.
Args:
name: string, Required. The name of the OnlineEvaluator to retrieve. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}. (required)
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # An OnlineEvaluator contains the configuration for an Online Evaluation.
"agentResource": "A String", # Required. Immutable. The name of the agent that the OnlineEvaluator evaluates periodically. This value is used to filter the traces with a matching cloud.resource_id and link the evaluation results with relevant dashboards/UIs. This field is immutable. Once set, it cannot be changed.
"cloudObservability": { # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging). # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging).
"logView": "A String", # Optional. Optional log view that will be used to query logs. If empty, the `_Default` view will be used.
"openTelemetry": { # Configuration for data source following OpenTelemetry. # Data source follows OpenTelemetry convention.
"semconvVersion": "A String", # Required. Defines which version OTel Semantic Convention the data follows. Can be "1.39.0" or newer.
},
"traceScope": { # If chosen, the online evaluator will evaluate single traces matching specified `filter`. # Scope online evaluation to single traces.
"filter": [ # Optional. A list of predicates to filter traces. Multiple predicates are combined using AND. The maximum number of predicates is 10.
{ # Defines a single filter predicate.
"duration": { # Defines a predicate for filtering based on a numeric value. # Filter on the duration of a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
"totalTokenUsage": { # Defines a predicate for filtering based on a numeric value. # Filter on the total token usage within a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
},
],
},
"traceView": "A String", # Optional. Optional trace view that will be used to query traces. If empty, the `_Default` view will be used.
},
"config": { # Configuration for sampling behavior of the OnlineEvaluator. The OnlineEvaluator runs at a fixed interval of 10 minutes. # Required. Configuration for the OnlineEvaluator.
"maxEvaluatedSamplesPerRun": "A String", # Optional. The maximum number of evaluations to perform per run. If set to 0, the number is unbounded.
"randomSampling": { # Configuration for random sampling. # Random sampling method.
"percentage": 42, # Required. The percentage of traces to sample for evaluation. Must be an integer between `1` and `100`.
},
},
"createTime": "A String", # Output only. Timestamp when the OnlineEvaluator was created.
"displayName": "A String", # Optional. Human-readable name for the `OnlineEvaluator`. The name doesn't have to be unique. The name can consist of any UTF-8 characters. The maximum length is `63` characters. If the display name exceeds max characters, an `INVALID_ARGUMENT` error is returned.
"metricSources": [ # Required. A list of metric sources to be used for evaluating samples. At least one MetricSource must be provided. Right now, only predefined metrics and registered metrics are supported. Every registered metric must have `display_name` (or `title`) and `score_range` defined. Otherwise, the evaluations will fail. The maximum number of `metric_sources` is 25.
{ # The metric source used for evaluation.
"metric": { # The metric used for running evaluations. # Inline metric config.
"aggregationMetrics": [ # Optional. The aggregation metrics to use.
"A String",
],
"bleuSpec": { # Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1. # Spec for bleu metric.
"useEffectiveOrder": True or False, # Optional. Whether to use_effective_order to compute bleu score.
},
"computationBasedMetricSpec": { # Specification for a computation based metric. # Spec for a computation based metric.
"parameters": { # Optional. A map of parameters for the metric, e.g. {"rouge_type": "rougeL"}.
"a_key": "", # Properties of the object.
},
"type": "A String", # Required. The type of the computation based metric.
},
"customCodeExecutionSpec": { # Specificies a metric that is populated by evaluating user-defined Python code. # Spec for Custom Code Execution metric.
"evaluationFunction": "A String", # Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[field_name]. Example: Example input: ``` instance= EvaluationInstance( response=EvaluationInstance.InstanceData(text="The answer is 4."), reference=EvaluationInstance.InstanceData(text="4") ) ``` Example converted input: ``` { 'response': {'text': 'The answer is 4.'}, 'reference': {'text': '4'} } ``` Example python function: ``` def evaluate(instance: dict[str, Any]) -> float: if instance'response' == instance'reference': return 1.0 return 0.0 ``` CustomCodeExecutionSpec is also supported in Batch Evaluation (EvalDataset RPC) and Tuning Evaluation. Each line in the input jsonl file will be converted to dict[str, Any] and passed to the evaluation function.
},
"exactMatchSpec": { # Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0. # Spec for exact match metric.
},
"llmBasedMetricSpec": { # Specification for an LLM based metric. # Spec for an LLM based metric.
"additionalConfig": { # Optional. Optional additional configuration for the metric.
"a_key": "", # Properties of the object.
},
"judgeAutoraterConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Optional. Optional configuration for the judge LLM (Autorater).
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"metricPromptTemplate": "A String", # Required. Template for the prompt sent to the judge model.
"predefinedRubricGenerationSpec": { # The spec for a pre-defined metric. # Dynamically generate rubrics using a predefined spec.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"resultParserConfig": { # Config for parsing LLM responses. It can be used to parse the LLM response to be evaluated, or the LLM response from LLM-based metrics/Autoraters. # Optional. The parser config for the metric result.
"customCodeParserConfig": { # Configuration for parsing the LLM response using custom code. # Optional. Use custom code to parse the LLM response.
"parsingFunction": "A String", # Required. Python function for parsing results. The function should be defined within this string. The function takes a list of strings (LLM responses) and should return either a list of dictionaries (for rubrics) or a single dictionary (for a metric result). Example function signature: def parse(responses: list[str]) -> list[dict[str, Any]] | dict[str, Any]: When parsing rubrics, return a list of dictionaries, where each dictionary represents a Rubric. Example for rubrics: [ { "content": {"property": {"description": "The response is factual."}}, "type": "FACTUALITY", "importance": "HIGH" }, { "content": {"property": {"description": "The response is fluent."}}, "type": "FLUENCY", "importance": "MEDIUM" } ] When parsing critique results, return a dictionary representing a MetricResult. Example for a metric result: { "score": 0.8, "explanation": "The model followed most instructions.", "rubric_verdicts": [...] } ... code for result extraction and aggregation
},
},
"rubricGenerationSpec": { # Specification for how rubrics should be generated. # Dynamically generate rubrics using this specification.
"modelConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"promptTemplate": "A String", # Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
"rubricContentType": "A String", # The type of rubric content to be generated.
"rubricTypeOntology": [ # Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies `include_rubric_type` should be true, and the generated rubric types should be chosen from this ontology.
"A String",
],
},
"rubricGroupKey": "A String", # Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubric_groups map of EvaluationInstance.
"systemInstruction": "A String", # Optional. System instructions for the judge model.
},
"metadata": { # Metadata about the metric, used for visualization and organization. # Optional. Metadata about the metric, used for visualization and organization.
"otherMetadata": { # Optional. Flexible metadata for user-defined attributes.
"a_key": "", # Properties of the object.
},
"scoreRange": { # The range of possible scores for this metric, used for plotting. # Optional. The range of possible scores for this metric, used for plotting.
"description": "A String", # Optional. The description of the score explaining the directionality etc.
"max": 3.14, # Required. The maximum value of the score range (inclusive).
"min": 3.14, # Required. The minimum value of the score range (inclusive).
"step": 3.14, # Optional. The distance between discrete steps in the range. If unset, the range is assumed to be continuous.
},
"title": "A String", # Optional. The user-friendly name for the metric. If not set for a registered metric, it will default to the metric's display name.
},
"pairwiseMetricSpec": { # Spec for pairwise metric. # Spec for pairwise metric.
"baselineResponseFieldName": "A String", # Optional. The field name of the baseline response.
"candidateResponseFieldName": "A String", # Optional. The field name of the candidate response.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the `pairwise_choice` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pairwise metric.
"systemInstruction": "A String", # Optional. System instructions for pairwise metric.
},
"pointwiseMetricSpec": { # Spec for pointwise metric. # Spec for pointwise metric.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the `score` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pointwise metric.
"systemInstruction": "A String", # Optional. System instructions for pointwise metric.
},
"predefinedMetricSpec": { # The spec for a pre-defined metric. # The spec for a pre-defined metric.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"rougeSpec": { # Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1. # Spec for rouge metric.
"rougeType": "A String", # Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
"splitSummaries": True or False, # Optional. Whether to split summaries while using rougeLsum.
"useStemmer": True or False, # Optional. Whether to use stemmer to compute rouge score.
},
},
"metricResourceName": "A String", # Resource name for registered metric.
},
],
"name": "A String", # Identifier. The resource name of the OnlineEvaluator. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}.
"state": "A String", # Output only. The state of the OnlineEvaluator.
"stateDetails": [ # Output only. Contains additional information about the state of the OnlineEvaluator. This is used to provide more details in the event of a failure.
{ # Contains additional information about the state of the OnlineEvaluator.
"message": "A String", # Output only. Human-readable message describing the state of the OnlineEvaluator.
},
],
"updateTime": "A String", # Output only. Timestamp when the OnlineEvaluator was last updated.
}
list(parent, filter=None, orderBy=None, pageSize=None, pageToken=None, x__xgafv=None)
Lists the OnlineEvaluators for the given project and location.
Args:
parent: string, Required. The parent resource of the OnlineEvaluators to list. Format: projects/{project}/locations/{location}. (required)
filter: string, Optional. Standard list filter. Supported fields: * `create_time` * `update_time` * `agent_resource` Example: `create_time>"2026-01-01T00:00:00-04:00"` where the timestamp is in RFC 3339 format) Based on aip.dev/160.
orderBy: string, Optional. A comma-separated list of fields to order by. The default sorting order is ascending. Use "desc" after a field name for descending. Supported fields: * `create_time` * `update_time` Example: `create_time desc`. Based on aip.dev/132.
pageSize: integer, Optional. The maximum number of OnlineEvaluators to return. The service may return fewer than this value. If unspecified, at most 50 OnlineEvaluators will be returned. The maximum value is 100; values above 100 will be coerced to 100. Based on aip.dev/158.
pageToken: string, Optional. A token identifying a page of results the server should return. Based on aip.dev/158.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # Response message for ListOnlineEvaluators.
"nextPageToken": "A String", # A token to retrieve the next page. Absence of this field indicates there are no subsequent pages.
"onlineEvaluators": [ # A list of OnlineEvaluators matching the request.
{ # An OnlineEvaluator contains the configuration for an Online Evaluation.
"agentResource": "A String", # Required. Immutable. The name of the agent that the OnlineEvaluator evaluates periodically. This value is used to filter the traces with a matching cloud.resource_id and link the evaluation results with relevant dashboards/UIs. This field is immutable. Once set, it cannot be changed.
"cloudObservability": { # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging). # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging).
"logView": "A String", # Optional. Optional log view that will be used to query logs. If empty, the `_Default` view will be used.
"openTelemetry": { # Configuration for data source following OpenTelemetry. # Data source follows OpenTelemetry convention.
"semconvVersion": "A String", # Required. Defines which version OTel Semantic Convention the data follows. Can be "1.39.0" or newer.
},
"traceScope": { # If chosen, the online evaluator will evaluate single traces matching specified `filter`. # Scope online evaluation to single traces.
"filter": [ # Optional. A list of predicates to filter traces. Multiple predicates are combined using AND. The maximum number of predicates is 10.
{ # Defines a single filter predicate.
"duration": { # Defines a predicate for filtering based on a numeric value. # Filter on the duration of a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
"totalTokenUsage": { # Defines a predicate for filtering based on a numeric value. # Filter on the total token usage within a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
},
],
},
"traceView": "A String", # Optional. Optional trace view that will be used to query traces. If empty, the `_Default` view will be used.
},
"config": { # Configuration for sampling behavior of the OnlineEvaluator. The OnlineEvaluator runs at a fixed interval of 10 minutes. # Required. Configuration for the OnlineEvaluator.
"maxEvaluatedSamplesPerRun": "A String", # Optional. The maximum number of evaluations to perform per run. If set to 0, the number is unbounded.
"randomSampling": { # Configuration for random sampling. # Random sampling method.
"percentage": 42, # Required. The percentage of traces to sample for evaluation. Must be an integer between `1` and `100`.
},
},
"createTime": "A String", # Output only. Timestamp when the OnlineEvaluator was created.
"displayName": "A String", # Optional. Human-readable name for the `OnlineEvaluator`. The name doesn't have to be unique. The name can consist of any UTF-8 characters. The maximum length is `63` characters. If the display name exceeds max characters, an `INVALID_ARGUMENT` error is returned.
"metricSources": [ # Required. A list of metric sources to be used for evaluating samples. At least one MetricSource must be provided. Right now, only predefined metrics and registered metrics are supported. Every registered metric must have `display_name` (or `title`) and `score_range` defined. Otherwise, the evaluations will fail. The maximum number of `metric_sources` is 25.
{ # The metric source used for evaluation.
"metric": { # The metric used for running evaluations. # Inline metric config.
"aggregationMetrics": [ # Optional. The aggregation metrics to use.
"A String",
],
"bleuSpec": { # Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1. # Spec for bleu metric.
"useEffectiveOrder": True or False, # Optional. Whether to use_effective_order to compute bleu score.
},
"computationBasedMetricSpec": { # Specification for a computation based metric. # Spec for a computation based metric.
"parameters": { # Optional. A map of parameters for the metric, e.g. {"rouge_type": "rougeL"}.
"a_key": "", # Properties of the object.
},
"type": "A String", # Required. The type of the computation based metric.
},
"customCodeExecutionSpec": { # Specificies a metric that is populated by evaluating user-defined Python code. # Spec for Custom Code Execution metric.
"evaluationFunction": "A String", # Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[field_name]. Example: Example input: ``` instance= EvaluationInstance( response=EvaluationInstance.InstanceData(text="The answer is 4."), reference=EvaluationInstance.InstanceData(text="4") ) ``` Example converted input: ``` { 'response': {'text': 'The answer is 4.'}, 'reference': {'text': '4'} } ``` Example python function: ``` def evaluate(instance: dict[str, Any]) -> float: if instance'response' == instance'reference': return 1.0 return 0.0 ``` CustomCodeExecutionSpec is also supported in Batch Evaluation (EvalDataset RPC) and Tuning Evaluation. Each line in the input jsonl file will be converted to dict[str, Any] and passed to the evaluation function.
},
"exactMatchSpec": { # Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0. # Spec for exact match metric.
},
"llmBasedMetricSpec": { # Specification for an LLM based metric. # Spec for an LLM based metric.
"additionalConfig": { # Optional. Optional additional configuration for the metric.
"a_key": "", # Properties of the object.
},
"judgeAutoraterConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Optional. Optional configuration for the judge LLM (Autorater).
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"metricPromptTemplate": "A String", # Required. Template for the prompt sent to the judge model.
"predefinedRubricGenerationSpec": { # The spec for a pre-defined metric. # Dynamically generate rubrics using a predefined spec.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"resultParserConfig": { # Config for parsing LLM responses. It can be used to parse the LLM response to be evaluated, or the LLM response from LLM-based metrics/Autoraters. # Optional. The parser config for the metric result.
"customCodeParserConfig": { # Configuration for parsing the LLM response using custom code. # Optional. Use custom code to parse the LLM response.
"parsingFunction": "A String", # Required. Python function for parsing results. The function should be defined within this string. The function takes a list of strings (LLM responses) and should return either a list of dictionaries (for rubrics) or a single dictionary (for a metric result). Example function signature: def parse(responses: list[str]) -> list[dict[str, Any]] | dict[str, Any]: When parsing rubrics, return a list of dictionaries, where each dictionary represents a Rubric. Example for rubrics: [ { "content": {"property": {"description": "The response is factual."}}, "type": "FACTUALITY", "importance": "HIGH" }, { "content": {"property": {"description": "The response is fluent."}}, "type": "FLUENCY", "importance": "MEDIUM" } ] When parsing critique results, return a dictionary representing a MetricResult. Example for a metric result: { "score": 0.8, "explanation": "The model followed most instructions.", "rubric_verdicts": [...] } ... code for result extraction and aggregation
},
},
"rubricGenerationSpec": { # Specification for how rubrics should be generated. # Dynamically generate rubrics using this specification.
"modelConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"promptTemplate": "A String", # Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
"rubricContentType": "A String", # The type of rubric content to be generated.
"rubricTypeOntology": [ # Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies `include_rubric_type` should be true, and the generated rubric types should be chosen from this ontology.
"A String",
],
},
"rubricGroupKey": "A String", # Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubric_groups map of EvaluationInstance.
"systemInstruction": "A String", # Optional. System instructions for the judge model.
},
"metadata": { # Metadata about the metric, used for visualization and organization. # Optional. Metadata about the metric, used for visualization and organization.
"otherMetadata": { # Optional. Flexible metadata for user-defined attributes.
"a_key": "", # Properties of the object.
},
"scoreRange": { # The range of possible scores for this metric, used for plotting. # Optional. The range of possible scores for this metric, used for plotting.
"description": "A String", # Optional. The description of the score explaining the directionality etc.
"max": 3.14, # Required. The maximum value of the score range (inclusive).
"min": 3.14, # Required. The minimum value of the score range (inclusive).
"step": 3.14, # Optional. The distance between discrete steps in the range. If unset, the range is assumed to be continuous.
},
"title": "A String", # Optional. The user-friendly name for the metric. If not set for a registered metric, it will default to the metric's display name.
},
"pairwiseMetricSpec": { # Spec for pairwise metric. # Spec for pairwise metric.
"baselineResponseFieldName": "A String", # Optional. The field name of the baseline response.
"candidateResponseFieldName": "A String", # Optional. The field name of the candidate response.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the `pairwise_choice` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pairwise metric.
"systemInstruction": "A String", # Optional. System instructions for pairwise metric.
},
"pointwiseMetricSpec": { # Spec for pointwise metric. # Spec for pointwise metric.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the `score` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pointwise metric.
"systemInstruction": "A String", # Optional. System instructions for pointwise metric.
},
"predefinedMetricSpec": { # The spec for a pre-defined metric. # The spec for a pre-defined metric.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"rougeSpec": { # Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1. # Spec for rouge metric.
"rougeType": "A String", # Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
"splitSummaries": True or False, # Optional. Whether to split summaries while using rougeLsum.
"useStemmer": True or False, # Optional. Whether to use stemmer to compute rouge score.
},
},
"metricResourceName": "A String", # Resource name for registered metric.
},
],
"name": "A String", # Identifier. The resource name of the OnlineEvaluator. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}.
"state": "A String", # Output only. The state of the OnlineEvaluator.
"stateDetails": [ # Output only. Contains additional information about the state of the OnlineEvaluator. This is used to provide more details in the event of a failure.
{ # Contains additional information about the state of the OnlineEvaluator.
"message": "A String", # Output only. Human-readable message describing the state of the OnlineEvaluator.
},
],
"updateTime": "A String", # Output only. Timestamp when the OnlineEvaluator was last updated.
},
],
}
list_next()
Retrieves the next page of results.
Args:
previous_request: The request for the previous page. (required)
previous_response: The response from the request for the previous page. (required)
Returns:
A request object that you can call 'execute()' on to request the next
page. Returns None if there are no more items in the collection.
patch(name, body=None, updateMask=None, x__xgafv=None)
Updates the fields of an OnlineEvaluator.
Args:
name: string, Identifier. The resource name of the OnlineEvaluator. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}. (required)
body: object, The request body.
The object takes the form of:
{ # An OnlineEvaluator contains the configuration for an Online Evaluation.
"agentResource": "A String", # Required. Immutable. The name of the agent that the OnlineEvaluator evaluates periodically. This value is used to filter the traces with a matching cloud.resource_id and link the evaluation results with relevant dashboards/UIs. This field is immutable. Once set, it cannot be changed.
"cloudObservability": { # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging). # Data source for the OnlineEvaluator, based on GCP Observability stack (Cloud Trace & Cloud Logging).
"logView": "A String", # Optional. Optional log view that will be used to query logs. If empty, the `_Default` view will be used.
"openTelemetry": { # Configuration for data source following OpenTelemetry. # Data source follows OpenTelemetry convention.
"semconvVersion": "A String", # Required. Defines which version OTel Semantic Convention the data follows. Can be "1.39.0" or newer.
},
"traceScope": { # If chosen, the online evaluator will evaluate single traces matching specified `filter`. # Scope online evaluation to single traces.
"filter": [ # Optional. A list of predicates to filter traces. Multiple predicates are combined using AND. The maximum number of predicates is 10.
{ # Defines a single filter predicate.
"duration": { # Defines a predicate for filtering based on a numeric value. # Filter on the duration of a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
"totalTokenUsage": { # Defines a predicate for filtering based on a numeric value. # Filter on the total token usage within a trace.
"comparisonOperator": "A String", # Required. The comparison operator to apply.
"value": 3.14, # Required. The value to compare against.
},
},
],
},
"traceView": "A String", # Optional. Optional trace view that will be used to query traces. If empty, the `_Default` view will be used.
},
"config": { # Configuration for sampling behavior of the OnlineEvaluator. The OnlineEvaluator runs at a fixed interval of 10 minutes. # Required. Configuration for the OnlineEvaluator.
"maxEvaluatedSamplesPerRun": "A String", # Optional. The maximum number of evaluations to perform per run. If set to 0, the number is unbounded.
"randomSampling": { # Configuration for random sampling. # Random sampling method.
"percentage": 42, # Required. The percentage of traces to sample for evaluation. Must be an integer between `1` and `100`.
},
},
"createTime": "A String", # Output only. Timestamp when the OnlineEvaluator was created.
"displayName": "A String", # Optional. Human-readable name for the `OnlineEvaluator`. The name doesn't have to be unique. The name can consist of any UTF-8 characters. The maximum length is `63` characters. If the display name exceeds max characters, an `INVALID_ARGUMENT` error is returned.
"metricSources": [ # Required. A list of metric sources to be used for evaluating samples. At least one MetricSource must be provided. Right now, only predefined metrics and registered metrics are supported. Every registered metric must have `display_name` (or `title`) and `score_range` defined. Otherwise, the evaluations will fail. The maximum number of `metric_sources` is 25.
{ # The metric source used for evaluation.
"metric": { # The metric used for running evaluations. # Inline metric config.
"aggregationMetrics": [ # Optional. The aggregation metrics to use.
"A String",
],
"bleuSpec": { # Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1. # Spec for bleu metric.
"useEffectiveOrder": True or False, # Optional. Whether to use_effective_order to compute bleu score.
},
"computationBasedMetricSpec": { # Specification for a computation based metric. # Spec for a computation based metric.
"parameters": { # Optional. A map of parameters for the metric, e.g. {"rouge_type": "rougeL"}.
"a_key": "", # Properties of the object.
},
"type": "A String", # Required. The type of the computation based metric.
},
"customCodeExecutionSpec": { # Specificies a metric that is populated by evaluating user-defined Python code. # Spec for Custom Code Execution metric.
"evaluationFunction": "A String", # Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[field_name]. Example: Example input: ``` instance= EvaluationInstance( response=EvaluationInstance.InstanceData(text="The answer is 4."), reference=EvaluationInstance.InstanceData(text="4") ) ``` Example converted input: ``` { 'response': {'text': 'The answer is 4.'}, 'reference': {'text': '4'} } ``` Example python function: ``` def evaluate(instance: dict[str, Any]) -> float: if instance'response' == instance'reference': return 1.0 return 0.0 ``` CustomCodeExecutionSpec is also supported in Batch Evaluation (EvalDataset RPC) and Tuning Evaluation. Each line in the input jsonl file will be converted to dict[str, Any] and passed to the evaluation function.
},
"exactMatchSpec": { # Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0. # Spec for exact match metric.
},
"llmBasedMetricSpec": { # Specification for an LLM based metric. # Spec for an LLM based metric.
"additionalConfig": { # Optional. Optional additional configuration for the metric.
"a_key": "", # Properties of the object.
},
"judgeAutoraterConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Optional. Optional configuration for the judge LLM (Autorater).
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"metricPromptTemplate": "A String", # Required. Template for the prompt sent to the judge model.
"predefinedRubricGenerationSpec": { # The spec for a pre-defined metric. # Dynamically generate rubrics using a predefined spec.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"resultParserConfig": { # Config for parsing LLM responses. It can be used to parse the LLM response to be evaluated, or the LLM response from LLM-based metrics/Autoraters. # Optional. The parser config for the metric result.
"customCodeParserConfig": { # Configuration for parsing the LLM response using custom code. # Optional. Use custom code to parse the LLM response.
"parsingFunction": "A String", # Required. Python function for parsing results. The function should be defined within this string. The function takes a list of strings (LLM responses) and should return either a list of dictionaries (for rubrics) or a single dictionary (for a metric result). Example function signature: def parse(responses: list[str]) -> list[dict[str, Any]] | dict[str, Any]: When parsing rubrics, return a list of dictionaries, where each dictionary represents a Rubric. Example for rubrics: [ { "content": {"property": {"description": "The response is factual."}}, "type": "FACTUALITY", "importance": "HIGH" }, { "content": {"property": {"description": "The response is fluent."}}, "type": "FLUENCY", "importance": "MEDIUM" } ] When parsing critique results, return a dictionary representing a MetricResult. Example for a metric result: { "score": 0.8, "explanation": "The model followed most instructions.", "rubric_verdicts": [...] } ... code for result extraction and aggregation
},
},
"rubricGenerationSpec": { # Specification for how rubrics should be generated. # Dynamically generate rubrics using this specification.
"modelConfig": { # The configs for autorater. This is applicable to both EvaluateInstances and EvaluateDataset. # Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
"autoraterModel": "A String", # Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: `projects/{project}/locations/{location}/publishers/*/models/*` Tuned model endpoint format: `projects/{project}/locations/{location}/endpoints/{endpoint}`
"flipEnabled": True or False, # Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias.
"generationConfig": { # Configuration for content generation. This message contains all the parameters that control how the model generates content. It allows you to influence the randomness, length, and structure of the output. # Optional. Configuration options for model generation and outputs.
"audioTimestamp": True or False, # Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response.
"candidateCount": 42, # Optional. The number of candidate responses to generate. A higher `candidate_count` can provide more options to choose from, but it also consumes more resources. This can be useful for generating a variety of responses and selecting the best one.
"enableAffectiveDialog": True or False, # Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response.
"frequencyPenalty": 3.14, # Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0].
"imageConfig": { # Configuration for image generation. This message allows you to control various aspects of image generation, such as the output format, aspect ratio, and whether the model can generate images of people. # Optional. Config for image generation features.
"aspectRatio": "A String", # Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9"
"imageOutputOptions": { # The image output format for generated images. # Optional. The image output format for generated images.
"compressionQuality": 42, # Optional. The compression quality of the output image.
"mimeType": "A String", # Optional. The image format that the output should be saved as.
},
"imageSize": "A String", # Optional. Specifies the size of generated images. Supported values are `1K`, `2K`, `4K`. If not specified, the model will use default value `1K`.
"personGeneration": "A String", # Optional. Controls whether the model can generate people.
"prominentPeople": "A String", # Optional. Controls whether prominent people (celebrities) generation is allowed. If used with personGeneration, personGeneration enum would take precedence. For instance, if ALLOW_NONE is set, all person generation would be blocked. If this field is unspecified, the default behavior is to allow prominent people.
},
"logprobs": 42, # Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response.
"maxOutputTokens": 42, # Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses.
"mediaResolution": "A String", # Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model.
"modelConfig": { # Config for model selection. # Optional. Config for model selection.
"featureSelectionPreference": "A String", # Required. Feature selection preference.
},
"presencePenalty": 3.14, # Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0].
"responseJsonSchema": "", # Optional. When this field is set, response_schema must be omitted and response_mime_type must be set to `application/json`.
"responseLogprobs": True or False, # Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging.
"responseMimeType": "A String", # Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
"responseModalities": [ # Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to `[TEXT, IMAGE]`, the response will include both text and an image.
"A String",
],
"responseSchema": { # Defines the schema of input and output data. This is a subset of the [OpenAPI 3.0 Schema Object](https://spec.openapis.org/oas/v3.0.3#schema-object). # Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the [OpenAPI 3.0 schema object](https://spec.openapis.org/oas/v3.0.3#schema) object. When this field is set, you must also set the `response_mime_type` to `application/json`.
"additionalProperties": "", # Optional. If `type` is `OBJECT`, specifies how to handle properties not defined in `properties`. If it is a boolean `false`, no additional properties are allowed. If it is a schema, additional properties are allowed if they conform to the schema.
"anyOf": [ # Optional. The instance must be valid against any (one or more) of the subschemas listed in `any_of`.
# Object with schema name: GoogleCloudAiplatformV1beta1Schema
],
"default": "", # Optional. Default value to use if the field is not specified.
"defs": { # Optional. `defs` provides a map of schema definitions that can be reused by `ref` elsewhere in the schema. Only allowed at root level of the schema.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"description": "A String", # Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt.
"enum": [ # Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set `format` to `enum` and provide the list of possible values in `enum`. For example: 1. To define directions: `{type:STRING, format:enum, enum:["EAST", "NORTH", "SOUTH", "WEST"]}` 2. To define apartment numbers: `{type:INTEGER, format:enum, enum:["101", "201", "301"]}`
"A String",
],
"example": "", # Optional. Example of an instance of this schema.
"format": "A String", # Optional. The format of the data. For `NUMBER` type, format can be `float` or `double`. For `INTEGER` type, format can be `int32` or `int64`. For `STRING` type, format can be `email`, `byte`, `date`, `date-time`, `password`, and other formats to further refine the data type.
"items": # Object with schema name: GoogleCloudAiplatformV1beta1Schema # Optional. If type is `ARRAY`, `items` specifies the schema of elements in the array.
"maxItems": "A String", # Optional. If type is `ARRAY`, `max_items` specifies the maximum number of items in an array.
"maxLength": "A String", # Optional. If type is `STRING`, `max_length` specifies the maximum length of the string.
"maxProperties": "A String", # Optional. If type is `OBJECT`, `max_properties` specifies the maximum number of properties that can be provided.
"maximum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `maximum` specifies the maximum allowed value.
"minItems": "A String", # Optional. If type is `ARRAY`, `min_items` specifies the minimum number of items in an array.
"minLength": "A String", # Optional. If type is `STRING`, `min_length` specifies the minimum length of the string.
"minProperties": "A String", # Optional. If type is `OBJECT`, `min_properties` specifies the minimum number of properties that can be provided.
"minimum": 3.14, # Optional. If type is `INTEGER` or `NUMBER`, `minimum` specifies the minimum allowed value.
"nullable": True or False, # Optional. Indicates if the value of this field can be null.
"pattern": "A String", # Optional. If type is `STRING`, `pattern` specifies a regular expression that the string must match.
"properties": { # Optional. If type is `OBJECT`, `properties` is a map of property names to schema definitions for each property of the object.
"a_key": # Object with schema name: GoogleCloudAiplatformV1beta1Schema
},
"propertyOrdering": [ # Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties.
"A String",
],
"ref": "A String", # Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in `defs`. For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring
"required": [ # Optional. If type is `OBJECT`, `required` lists the names of properties that must be present.
"A String",
],
"title": "A String", # Optional. Title for the schema.
"type": "A String", # Optional. Data type of the schema field.
},
"routingConfig": { # The configuration for routing the request to a specific model. This can be used to control which model is used for the generation, either automatically or by specifying a model name. # Optional. Routing configuration.
"autoMode": { # The configuration for automated routing. When automated routing is specified, the routing will be determined by the pretrained routing model and customer provided model routing preference. # In this mode, the model is selected automatically based on the content of the request.
"modelRoutingPreference": "A String", # The model routing preference.
},
"manualMode": { # The configuration for manual routing. When manual routing is specified, the model will be selected based on the model name provided. # In this mode, the model is specified manually.
"modelName": "A String", # The name of the model to use. Only public LLM models are accepted.
},
},
"seed": 42, # Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like `temperature`, which control the *level* of randomness. `seed` ensures that the "random" choices the model makes are the same on every run, making it essential for testing and ensuring reproducible results.
"speechConfig": { # Configuration for speech generation. # Optional. The speech generation config.
"languageCode": "A String", # Optional. The language code (ISO 639-1) for the speech synthesis.
"multiSpeakerVoiceConfig": { # Configuration for a multi-speaker text-to-speech request. # The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with `voice_config`.
"speakerVoiceConfigs": [ # Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided.
{ # Configuration for a single speaker in a multi-speaker setup.
"speaker": "A String", # Required. The name of the speaker. This should be the same as the speaker name used in the prompt.
"voiceConfig": { # Configuration for a voice. # Required. The configuration for the voice of this speaker.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
],
},
"voiceConfig": { # Configuration for a voice. # The configuration for the voice to use.
"prebuiltVoiceConfig": { # Configuration for a prebuilt voice. # The configuration for a prebuilt voice.
"voiceName": "A String", # The name of the prebuilt voice to use.
},
"replicatedVoiceConfig": { # The configuration for the replicated voice to use. # Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample.
"mimeType": "A String", # Optional. The mimetype of the voice sample. The only currently supported value is `audio/wav`. This represents 16-bit signed little-endian wav data, with a 24kHz sampling rate. `mime_type` will default to `audio/wav` if not set.
"voiceSampleAudio": "A String", # Optional. The sample of the custom voice.
},
},
},
"stopSequences": [ # Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker.
"A String",
],
"temperature": 3.14, # Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0].
"thinkingConfig": { # Configuration for the model's thinking features. "Thinking" is a process where the model breaks down a complex task into smaller, manageable steps. This allows the model to reason about the task, plan its approach, and execute the plan to generate a high-quality response. # Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking.
"includeThoughts": True or False, # Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available.
"thinkingBudget": 42, # Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency.
"thinkingLevel": "A String", # Optional. The number of thoughts tokens that the model should generate.
},
"topK": 3.14, # Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a `top_k` of 40 means the model will choose the next word from the 40 most likely words.
"topP": 3.14, # Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least `top_p`. This helps generate more diverse and less repetitive responses. For example, a `top_p` of 0.9 means the model considers tokens until the cumulative probability of the tokens to select from reaches 0.9. It's recommended to adjust either temperature or `top_p`, but not both.
},
"samplingCount": 42, # Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32.
},
"promptTemplate": "A String", # Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
"rubricContentType": "A String", # The type of rubric content to be generated.
"rubricTypeOntology": [ # Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies `include_rubric_type` should be true, and the generated rubric types should be chosen from this ontology.
"A String",
],
},
"rubricGroupKey": "A String", # Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubric_groups map of EvaluationInstance.
"systemInstruction": "A String", # Optional. System instructions for the judge model.
},
"metadata": { # Metadata about the metric, used for visualization and organization. # Optional. Metadata about the metric, used for visualization and organization.
"otherMetadata": { # Optional. Flexible metadata for user-defined attributes.
"a_key": "", # Properties of the object.
},
"scoreRange": { # The range of possible scores for this metric, used for plotting. # Optional. The range of possible scores for this metric, used for plotting.
"description": "A String", # Optional. The description of the score explaining the directionality etc.
"max": 3.14, # Required. The maximum value of the score range (inclusive).
"min": 3.14, # Required. The minimum value of the score range (inclusive).
"step": 3.14, # Optional. The distance between discrete steps in the range. If unset, the range is assumed to be continuous.
},
"title": "A String", # Optional. The user-friendly name for the metric. If not set for a registered metric, it will default to the metric's display name.
},
"pairwiseMetricSpec": { # Spec for pairwise metric. # Spec for pairwise metric.
"baselineResponseFieldName": "A String", # Optional. The field name of the baseline response.
"candidateResponseFieldName": "A String", # Optional. The field name of the candidate response.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the `pairwise_choice` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pairwise metric.
"systemInstruction": "A String", # Optional. System instructions for pairwise metric.
},
"pointwiseMetricSpec": { # Spec for pointwise metric. # Spec for pointwise metric.
"customOutputFormatConfig": { # Spec for custom output format configuration. # Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the `score` and `explanation` fields in the corresponding metric result will be empty.
"returnRawOutput": True or False, # Optional. Whether to return raw output.
},
"metricPromptTemplate": "A String", # Required. Metric prompt template for pointwise metric.
"systemInstruction": "A String", # Optional. System instructions for pointwise metric.
},
"predefinedMetricSpec": { # The spec for a pre-defined metric. # The spec for a pre-defined metric.
"metricSpecName": "A String", # Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
"metricSpecParameters": { # Optional. The parameters needed to run the pre-defined metric.
"a_key": "", # Properties of the object.
},
},
"rougeSpec": { # Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1. # Spec for rouge metric.
"rougeType": "A String", # Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
"splitSummaries": True or False, # Optional. Whether to split summaries while using rougeLsum.
"useStemmer": True or False, # Optional. Whether to use stemmer to compute rouge score.
},
},
"metricResourceName": "A String", # Resource name for registered metric.
},
],
"name": "A String", # Identifier. The resource name of the OnlineEvaluator. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}.
"state": "A String", # Output only. The state of the OnlineEvaluator.
"stateDetails": [ # Output only. Contains additional information about the state of the OnlineEvaluator. This is used to provide more details in the event of a failure.
{ # Contains additional information about the state of the OnlineEvaluator.
"message": "A String", # Output only. Human-readable message describing the state of the OnlineEvaluator.
},
],
"updateTime": "A String", # Output only. Timestamp when the OnlineEvaluator was last updated.
}
updateMask: string, Optional. Field mask is used to control which fields get updated. If the mask is not present, all fields will be updated.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}
suspend(name, body=None, x__xgafv=None)
Suspends an OnlineEvaluator. When an OnlineEvaluator is suspended, it won't run any evaluations until it is activated again.
Args:
name: string, Required. The name of the OnlineEvaluator to suspend. Format: projects/{project}/locations/{location}/onlineEvaluators/{id}. (required)
body: object, The request body.
The object takes the form of:
{ # Request message for SuspendOnlineEvaluator.
}
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format
Returns:
An object of the form:
{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}