Generates recommended data quality rules based on the results of a data profiling scan.Use the recommendations to build rules for a data quality scan.

Method Details

close()

Close httplib2 connections.

generateDataQualityRules(name, body=None, x__xgafv=None)

Generates recommended data quality rules based on the results of a data profiling scan.Use the recommendations to build rules for a data quality scan.

Args:
name: string, Required. The name must be one of the following: The name of a data scan with at least one successful, completed data profiling job The name of a successful, completed data profiling job (a data scan job where the job type is data profiling) (required)
body: object, The request body.
The object takes the form of:

{ # Request details for generating data quality rule recommendations.
}

x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format

Returns:
An object of the form:

{ # Response details for data quality rule recommendations.
"rule": [ # The data quality rules that Dataplex generates based on the results of a data profiling scan.
{ # A rule captures data quality intent about a data source.
"column": "A String", # Optional. The unnested column which this rule is evaluated against.
"description": "A String", # Optional. Description of the rule. The maximum length is 1,024 characters.
"dimension": "A String", # Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
"ignoreNull": True or False, # Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
"name": "A String", # Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
"nonNullExpectation": { # Evaluates whether each column value is null. # Row-level rule which evaluates whether each column value is null.
},
"rangeExpectation": { # Evaluates whether each column value lies between a specified range. # Row-level rule which evaluates whether each column value lies between a specified range.
"maxValue": "A String", # Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"strictMaxEnabled": True or False, # Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"regexExpectation": { # Evaluates whether each column value matches a specified regex. # Row-level rule which evaluates whether each column value matches a specified regex.
"regex": "A String", # Optional. A regular expression the column value is expected to match.
},
"rowConditionExpectation": { # Evaluates whether each row passes the specified condition.The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.Example: col1 >= 0 AND col2 < 10 # Row-level rule which evaluates whether each row in a table passes the specified condition.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"setExpectation": { # Evaluates whether each column value is contained by a specified set. # Row-level rule which evaluates whether each column value is contained by a specified set.
"values": [ # Optional. Expected values for the column value.
"A String",
],
},
"sqlAssertion": { # A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter (https://cloud.google.com/dataplex/docs/auto-data-quality-overview#data-reference-parameter).Example: SELECT * FROM ${data()} WHERE price < 0 # Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
"sqlStatement": "A String", # Optional. The SQL statement.
},
"statisticRangeExpectation": { # Evaluates whether the column aggregate statistic lies between a specified range. # Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
"maxValue": "A String", # Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"statistic": "A String", # Optional. The aggregate metric to evaluate.
"strictMaxEnabled": True or False, # Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"suspended": True or False, # Optional. Whether the Rule is active or suspended. Default is false.
"tableConditionExpectation": { # Evaluates whether the provided expression is true.The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.Example: MIN(col1) >= 0 # Aggregate rule which evaluates whether the provided expression is true for a table.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"threshold": 3.14, # Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
"uniquenessExpectation": { # Evaluates whether the column has duplicates. # Row-level rule which evaluates whether each column value is unique.
},
},
],
}

get(name, view=None, x__xgafv=None)

Gets a DataScanJob resource.

Args:
name: string, Required. The resource name of the DataScanJob: projects/{project}/locations/{location_id}/dataScans/{data_scan_id}/jobs/{data_scan_job_id} where project refers to a project_id or project_number and location_id refers to a GCP region. (required)
view: string, Optional. Select the DataScanJob view to return. Defaults to BASIC.
Allowed values
DATA_SCAN_JOB_VIEW_UNSPECIFIED - The API will default to the BASIC view.
BASIC - Basic view that does not include spec and result.
FULL - Include everything.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format

Returns:
An object of the form:

{ # A DataScanJob represents an instance of DataScan execution.
"createTime": "A String", # Output only. The time when the DataScanJob was created.
"dataDiscoveryResult": { # The output of a data discovery scan. # Output only. The result of a data discovery scan.
"bigqueryPublishing": { # Describes BigQuery publishing configurations. # Output only. Configuration for metadata publishing.
"dataset": "A String", # Output only. The BigQuery dataset to publish to. It takes the form projects/{project_id}/datasets/{dataset_id}. If not set, the service creates a default publishing dataset.
},
},
"dataDiscoverySpec": { # Spec for a data discovery scan. # Output only. Settings for a data discovery scan.
"bigqueryPublishingConfig": { # Describes BigQuery publishing configurations. # Optional. Configuration for metadata publishing.
"connection": "A String", # Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{project_id}/locations/{location_id}/connections/{connection_id}
"tableType": "A String", # Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.
},
"storageConfig": { # Configurations related to Cloud Storage as the data source. # Cloud Storage related configurations.
"csvOptions": { # Describes CSV and similar semi-structured data formats. # Optional. Configuration for CSV data.
"delimiter": "A String", # Optional. The delimiter that is used to separate values. The default is , (comma).
"encoding": "A String", # Optional. The character encoding of the data. The default is UTF-8.
"headerRows": 42, # Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.
"quote": "A String", # Optional. The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).
"typeInferenceDisabled": True or False, # Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.
},
"excludePatterns": [ # Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
"A String",
],
"includePatterns": [ # Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
"A String",
],
"jsonOptions": { # Describes JSON data format. # Optional. Configuration for JSON data.
"encoding": "A String", # Optional. The character encoding of the data. The default is UTF-8.
"typeInferenceDisabled": True or False, # Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).
},
},
},
"dataProfileResult": { # DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result. # Output only. The result of a data profile scan.
"postScanActionsResult": { # The result of post scan actions of DataProfileScan job. # Output only. The result of post scan actions.
"bigqueryExportResult": { # The result of BigQuery export post scan action. # Output only. The result of BigQuery export post scan action.
"message": "A String", # Output only. Additional information about the BigQuery exporting.
"state": "A String", # Output only. Execution state for the BigQuery exporting.
},
},
"profile": { # Contains name, type, mode and field type specific profile information. # The profile information per field.
"fields": [ # List of fields with structural and profile information for each field.
{ # A field within a table.
"mode": "A String", # The mode of the field. Possible values include: REQUIRED, if it is a required field. NULLABLE, if it is an optional field. REPEATED, if it is a repeated field.
"name": "A String", # The name of the field.
"profile": { # The profile information for each field type. # Profile information for the corresponding field.
"distinctRatio": 3.14, # Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
"doubleProfile": { # The profile information for a double type field. # Double type field information.
"average": 3.14, # Average of non-null values in the scanned data. NaN, if the field has a NaN.
"max": 3.14, # Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
"min": 3.14, # Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
"quartiles": [ # A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.
3.14,
],
"standardDeviation": 3.14, # Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
},
"integerProfile": { # The profile information for an integer type field. # Integer type field information.
"average": 3.14, # Average of non-null values in the scanned data. NaN, if the field has a NaN.
"max": "A String", # Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
"min": "A String", # Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
"quartiles": [ # A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.
"A String",
],
"standardDeviation": 3.14, # Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
},
"nullRatio": 3.14, # Ratio of rows with null value against total scanned rows.
"stringProfile": { # The profile information for a string type field. # String type field information.
"averageLength": 3.14, # Average length of non-null values in the scanned data.
"maxLength": "A String", # Maximum length of non-null values in the scanned data.
"minLength": "A String", # Minimum length of non-null values in the scanned data.
},
"topNValues": [ # The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
{ # Top N non-null values in the scanned data.
"count": "A String", # Count of the corresponding value in the scanned data.
"ratio": 3.14, # Ratio of the corresponding value in the field against the total number of rows in the scanned data.
"value": "A String", # String value of a top N non-null value.
},
],
},
"type": "A String", # The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema (https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#tablefieldschema). For a Dataplex Entity, it is the Entity Schema (https://cloud.google.com/dataplex/docs/reference/rpc/google.cloud.dataplex.v1#type_3).
},
],
},
"rowCount": "A String", # The count of rows scanned.
"scannedData": { # The data scanned during processing (e.g. in incremental DataScan) # The data scanned for this result.
"incrementalField": { # A data range denoted by a pair of start/end values of a field. # The range denoted by values of an incremental field
"end": "A String", # Value that marks the end of the range.
"field": "A String", # The field that contains values which monotonically increases over time (e.g. a timestamp column).
"start": "A String", # Value that marks the start of the range.
},
},
},
"dataProfileSpec": { # DataProfileScan related setting. # Output only. Settings for a data profile scan.
"excludeFields": { # The specification for fields to include or exclude in data profile scan. # Optional. The fields to exclude from data profile.If specified, the fields will be excluded from data profile, regardless of include_fields value.
"fieldNames": [ # Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.
"A String",
],
},
"includeFields": { # The specification for fields to include or exclude in data profile scan. # Optional. The fields to include in data profile.If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.
"fieldNames": [ # Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.
"A String",
],
},
"postScanActions": { # The configuration of post scan actions of DataProfileScan job. # Optional. Actions to take upon job completion..
"bigqueryExport": { # The configuration of BigQuery export post scan action. # Optional. If set, results will be exported to the provided BigQuery table.
"resultsTable": "A String", # Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
},
},
"rowFilter": "A String", # Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10
"samplingPercent": 3.14, # Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
},
"dataQualityResult": { # The output of a DataQualityScan. # Output only. The result of a data quality scan.
"columns": [ # Output only. A list of results at the column level.A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the 'column' field set to it.
{ # DataQualityColumnResult provides a more detailed, per-column view of the results.
"column": "A String", # Output only. The column specified in the DataQualityRule.
"score": 3.14, # Output only. The column-level data quality score for this data scan job if and only if the 'column' field is set.The score ranges between between 0, 100 (up to two decimal points).
},
],
"dimensions": [ # A list of results at the dimension level.A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the 'dimension' field set to it.
{ # DataQualityDimensionResult provides a more detailed, per-dimension view of the results.
"dimension": { # A dimension captures data quality intent about a defined subset of the rules specified. # Output only. The dimension config specified in the DataQualitySpec, as is.
"name": "A String", # The dimension name a rule belongs to. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
},
"passed": True or False, # Whether the dimension passed or failed.
"score": 3.14, # Output only. The dimension-level data quality score for this data scan job if and only if the 'dimension' field is set.The score ranges between 0, 100 (up to two decimal points).
},
],
"passed": True or False, # Overall data quality result -- true if all rules passed.
"postScanActionsResult": { # The result of post scan actions of DataQualityScan job. # Output only. The result of post scan actions.
"bigqueryExportResult": { # The result of BigQuery export post scan action. # Output only. The result of BigQuery export post scan action.
"message": "A String", # Output only. Additional information about the BigQuery exporting.
"state": "A String", # Output only. Execution state for the BigQuery exporting.
},
},
"rowCount": "A String", # The count of rows processed.
"rules": [ # A list of all the rules in a job, and their results.
{ # DataQualityRuleResult provides a more detailed, per-rule view of the results.
"assertionRowCount": "A String", # Output only. The number of rows returned by the SQL statement in a SQL assertion rule.This field is only valid for SQL assertion rules.
"evaluatedCount": "A String", # The number of rows a rule was evaluated against.This field is only valid for row-level type rules.Evaluated count can be configured to either include all rows (default) - with null rows automatically failing rule evaluation, or exclude null rows from the evaluated_count, by setting ignore_nulls = true.
"failingRowsQuery": "A String", # The query to find rows that did not pass this rule.This field is only valid for row-level type rules.
"nullCount": "A String", # The number of rows with null values in the specified column.
"passRatio": 3.14, # The ratio of passed_count / evaluated_count.This field is only valid for row-level type rules.
"passed": True or False, # Whether the rule passed or failed.
"passedCount": "A String", # The number of rows which passed a rule evaluation.This field is only valid for row-level type rules.
"rule": { # A rule captures data quality intent about a data source. # The rule specified in the DataQualitySpec, as is.
"column": "A String", # Optional. The unnested column which this rule is evaluated against.
"description": "A String", # Optional. Description of the rule. The maximum length is 1,024 characters.
"dimension": "A String", # Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
"ignoreNull": True or False, # Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
"name": "A String", # Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
"nonNullExpectation": { # Evaluates whether each column value is null. # Row-level rule which evaluates whether each column value is null.
},
"rangeExpectation": { # Evaluates whether each column value lies between a specified range. # Row-level rule which evaluates whether each column value lies between a specified range.
"maxValue": "A String", # Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"strictMaxEnabled": True or False, # Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"regexExpectation": { # Evaluates whether each column value matches a specified regex. # Row-level rule which evaluates whether each column value matches a specified regex.
"regex": "A String", # Optional. A regular expression the column value is expected to match.
},
"rowConditionExpectation": { # Evaluates whether each row passes the specified condition.The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.Example: col1 >= 0 AND col2 < 10 # Row-level rule which evaluates whether each row in a table passes the specified condition.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"setExpectation": { # Evaluates whether each column value is contained by a specified set. # Row-level rule which evaluates whether each column value is contained by a specified set.
"values": [ # Optional. Expected values for the column value.
"A String",
],
},
"sqlAssertion": { # A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter (https://cloud.google.com/dataplex/docs/auto-data-quality-overview#data-reference-parameter).Example: SELECT * FROM ${data()} WHERE price < 0 # Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
"sqlStatement": "A String", # Optional. The SQL statement.
},
"statisticRangeExpectation": { # Evaluates whether the column aggregate statistic lies between a specified range. # Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
"maxValue": "A String", # Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"statistic": "A String", # Optional. The aggregate metric to evaluate.
"strictMaxEnabled": True or False, # Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"suspended": True or False, # Optional. Whether the Rule is active or suspended. Default is false.
"tableConditionExpectation": { # Evaluates whether the provided expression is true.The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.Example: MIN(col1) >= 0 # Aggregate rule which evaluates whether the provided expression is true for a table.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"threshold": 3.14, # Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
"uniquenessExpectation": { # Evaluates whether the column has duplicates. # Row-level rule which evaluates whether each column value is unique.
},
},
},
],
"scannedData": { # The data scanned during processing (e.g. in incremental DataScan) # The data scanned for this result.
"incrementalField": { # A data range denoted by a pair of start/end values of a field. # The range denoted by values of an incremental field
"end": "A String", # Value that marks the end of the range.
"field": "A String", # The field that contains values which monotonically increases over time (e.g. a timestamp column).
"start": "A String", # Value that marks the start of the range.
},
},
"score": 3.14, # Output only. The overall data quality score.The score ranges between 0, 100 (up to two decimal points).
},
"dataQualitySpec": { # DataQualityScan related setting. # Output only. Settings for a data quality scan.
"postScanActions": { # The configuration of post scan actions of DataQualityScan. # Optional. Actions to take upon job completion.
"bigqueryExport": { # The configuration of BigQuery export post scan action. # Optional. If set, results will be exported to the provided BigQuery table.
"resultsTable": "A String", # Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
},
"notificationReport": { # The configuration of notification report post scan action. # Optional. If set, results will be sent to the provided notification receipts upon triggers.
"jobEndTrigger": { # This trigger is triggered whenever a scan job run ends, regardless of the result. # Optional. If set, report will be sent when a scan job ends.
},
"jobFailureTrigger": { # This trigger is triggered when the scan job itself fails, regardless of the result. # Optional. If set, report will be sent when a scan job fails.
},
"recipients": { # The individuals or groups who are designated to receive notifications upon triggers. # Required. The recipients who will receive the notification report.
"emails": [ # Optional. The email recipients who will receive the DataQualityScan results report.
"A String",
],
},
"scoreThresholdTrigger": { # This trigger is triggered when the DQ score in the job result is less than a specified input score. # Optional. If set, report will be sent when score threshold is met.
"scoreThreshold": 3.14, # Optional. The score range is in 0,100.
},
},
},
"rowFilter": "A String", # Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10
"rules": [ # Required. The list of rules to evaluate against a data source. At least one rule is required.
{ # A rule captures data quality intent about a data source.
"column": "A String", # Optional. The unnested column which this rule is evaluated against.
"description": "A String", # Optional. Description of the rule. The maximum length is 1,024 characters.
"dimension": "A String", # Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
"ignoreNull": True or False, # Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
"name": "A String", # Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
"nonNullExpectation": { # Evaluates whether each column value is null. # Row-level rule which evaluates whether each column value is null.
},
"rangeExpectation": { # Evaluates whether each column value lies between a specified range. # Row-level rule which evaluates whether each column value lies between a specified range.
"maxValue": "A String", # Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"strictMaxEnabled": True or False, # Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"regexExpectation": { # Evaluates whether each column value matches a specified regex. # Row-level rule which evaluates whether each column value matches a specified regex.
"regex": "A String", # Optional. A regular expression the column value is expected to match.
},
"rowConditionExpectation": { # Evaluates whether each row passes the specified condition.The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.Example: col1 >= 0 AND col2 < 10 # Row-level rule which evaluates whether each row in a table passes the specified condition.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"setExpectation": { # Evaluates whether each column value is contained by a specified set. # Row-level rule which evaluates whether each column value is contained by a specified set.
"values": [ # Optional. Expected values for the column value.
"A String",
],
},
"sqlAssertion": { # A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter (https://cloud.google.com/dataplex/docs/auto-data-quality-overview#data-reference-parameter).Example: SELECT * FROM ${data()} WHERE price < 0 # Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
"sqlStatement": "A String", # Optional. The SQL statement.
},
"statisticRangeExpectation": { # Evaluates whether the column aggregate statistic lies between a specified range. # Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
"maxValue": "A String", # Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"statistic": "A String", # Optional. The aggregate metric to evaluate.
"strictMaxEnabled": True or False, # Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"suspended": True or False, # Optional. Whether the Rule is active or suspended. Default is false.
"tableConditionExpectation": { # Evaluates whether the provided expression is true.The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.Example: MIN(col1) >= 0 # Aggregate rule which evaluates whether the provided expression is true for a table.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"threshold": 3.14, # Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
"uniquenessExpectation": { # Evaluates whether the column has duplicates. # Row-level rule which evaluates whether each column value is unique.
},
},
],
"samplingPercent": 3.14, # Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
},
"endTime": "A String", # Output only. The time when the DataScanJob ended.
"message": "A String", # Output only. Additional information about the current state.
"name": "A String", # Output only. Identifier. The relative resource name of the DataScanJob, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}/jobs/{job_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.
"startTime": "A String", # Output only. The time when the DataScanJob was started.
"state": "A String", # Output only. Execution state for the DataScanJob.
"type": "A String", # Output only. The type of the parent DataScan.
"uid": "A String", # Output only. System generated globally unique ID for the DataScanJob.
}

list(parent, filter=None, pageSize=None, pageToken=None, x__xgafv=None)

Lists DataScanJobs under the given DataScan.

Args:
parent: string, Required. The resource name of the parent environment: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region. (required)
filter: string, Optional. An expression for filtering the results of the ListDataScanJobs request.If unspecified, all datascan jobs will be returned. Multiple filters can be applied (with AND, OR logical operators). Filters are case-sensitive.Allowed fields are: start_time end_timestart_time and end_time expect RFC-3339 formatted strings (e.g. 2018-10-08T18:30:00-07:00).For instance, 'start_time > 2018-10-08T00:00:00.123456789Z AND end_time < 2018-10-09T00:00:00.123456789Z' limits results to DataScanJobs between specified start and end times.
pageSize: integer, Optional. Maximum number of DataScanJobs to return. The service may return fewer than this value. If unspecified, at most 10 DataScanJobs will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.
pageToken: string, Optional. Page token received from a previous ListDataScanJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataScanJobs must match the call that provided the page token.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format

Returns:
An object of the form:

{ # List DataScanJobs response.
"dataScanJobs": [ # DataScanJobs (BASIC view only) under a given dataScan.
{ # A DataScanJob represents an instance of DataScan execution.
"createTime": "A String", # Output only. The time when the DataScanJob was created.
"dataDiscoveryResult": { # The output of a data discovery scan. # Output only. The result of a data discovery scan.
"bigqueryPublishing": { # Describes BigQuery publishing configurations. # Output only. Configuration for metadata publishing.
"dataset": "A String", # Output only. The BigQuery dataset to publish to. It takes the form projects/{project_id}/datasets/{dataset_id}. If not set, the service creates a default publishing dataset.
},
},
"dataDiscoverySpec": { # Spec for a data discovery scan. # Output only. Settings for a data discovery scan.
"bigqueryPublishingConfig": { # Describes BigQuery publishing configurations. # Optional. Configuration for metadata publishing.
"connection": "A String", # Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{project_id}/locations/{location_id}/connections/{connection_id}
"tableType": "A String", # Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.
},
"storageConfig": { # Configurations related to Cloud Storage as the data source. # Cloud Storage related configurations.
"csvOptions": { # Describes CSV and similar semi-structured data formats. # Optional. Configuration for CSV data.
"delimiter": "A String", # Optional. The delimiter that is used to separate values. The default is , (comma).
"encoding": "A String", # Optional. The character encoding of the data. The default is UTF-8.
"headerRows": 42, # Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.
"quote": "A String", # Optional. The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).
"typeInferenceDisabled": True or False, # Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.
},
"excludePatterns": [ # Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
"A String",
],
"includePatterns": [ # Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
"A String",
],
"jsonOptions": { # Describes JSON data format. # Optional. Configuration for JSON data.
"encoding": "A String", # Optional. The character encoding of the data. The default is UTF-8.
"typeInferenceDisabled": True or False, # Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).
},
},
},
"dataProfileResult": { # DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result. # Output only. The result of a data profile scan.
"postScanActionsResult": { # The result of post scan actions of DataProfileScan job. # Output only. The result of post scan actions.
"bigqueryExportResult": { # The result of BigQuery export post scan action. # Output only. The result of BigQuery export post scan action.
"message": "A String", # Output only. Additional information about the BigQuery exporting.
"state": "A String", # Output only. Execution state for the BigQuery exporting.
},
},
"profile": { # Contains name, type, mode and field type specific profile information. # The profile information per field.
"fields": [ # List of fields with structural and profile information for each field.
{ # A field within a table.
"mode": "A String", # The mode of the field. Possible values include: REQUIRED, if it is a required field. NULLABLE, if it is an optional field. REPEATED, if it is a repeated field.
"name": "A String", # The name of the field.
"profile": { # The profile information for each field type. # Profile information for the corresponding field.
"distinctRatio": 3.14, # Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
"doubleProfile": { # The profile information for a double type field. # Double type field information.
"average": 3.14, # Average of non-null values in the scanned data. NaN, if the field has a NaN.
"max": 3.14, # Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
"min": 3.14, # Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
"quartiles": [ # A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.
3.14,
],
"standardDeviation": 3.14, # Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
},
"integerProfile": { # The profile information for an integer type field. # Integer type field information.
"average": 3.14, # Average of non-null values in the scanned data. NaN, if the field has a NaN.
"max": "A String", # Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
"min": "A String", # Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
"quartiles": [ # A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.
"A String",
],
"standardDeviation": 3.14, # Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
},
"nullRatio": 3.14, # Ratio of rows with null value against total scanned rows.
"stringProfile": { # The profile information for a string type field. # String type field information.
"averageLength": 3.14, # Average length of non-null values in the scanned data.
"maxLength": "A String", # Maximum length of non-null values in the scanned data.
"minLength": "A String", # Minimum length of non-null values in the scanned data.
},
"topNValues": [ # The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
{ # Top N non-null values in the scanned data.
"count": "A String", # Count of the corresponding value in the scanned data.
"ratio": 3.14, # Ratio of the corresponding value in the field against the total number of rows in the scanned data.
"value": "A String", # String value of a top N non-null value.
},
],
},
"type": "A String", # The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema (https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#tablefieldschema). For a Dataplex Entity, it is the Entity Schema (https://cloud.google.com/dataplex/docs/reference/rpc/google.cloud.dataplex.v1#type_3).
},
],
},
"rowCount": "A String", # The count of rows scanned.
"scannedData": { # The data scanned during processing (e.g. in incremental DataScan) # The data scanned for this result.
"incrementalField": { # A data range denoted by a pair of start/end values of a field. # The range denoted by values of an incremental field
"end": "A String", # Value that marks the end of the range.
"field": "A String", # The field that contains values which monotonically increases over time (e.g. a timestamp column).
"start": "A String", # Value that marks the start of the range.
},
},
},
"dataProfileSpec": { # DataProfileScan related setting. # Output only. Settings for a data profile scan.
"excludeFields": { # The specification for fields to include or exclude in data profile scan. # Optional. The fields to exclude from data profile.If specified, the fields will be excluded from data profile, regardless of include_fields value.
"fieldNames": [ # Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.
"A String",
],
},
"includeFields": { # The specification for fields to include or exclude in data profile scan. # Optional. The fields to include in data profile.If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.
"fieldNames": [ # Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.
"A String",
],
},
"postScanActions": { # The configuration of post scan actions of DataProfileScan job. # Optional. Actions to take upon job completion..
"bigqueryExport": { # The configuration of BigQuery export post scan action. # Optional. If set, results will be exported to the provided BigQuery table.
"resultsTable": "A String", # Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
},
},
"rowFilter": "A String", # Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10
"samplingPercent": 3.14, # Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
},
"dataQualityResult": { # The output of a DataQualityScan. # Output only. The result of a data quality scan.
"columns": [ # Output only. A list of results at the column level.A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the 'column' field set to it.
{ # DataQualityColumnResult provides a more detailed, per-column view of the results.
"column": "A String", # Output only. The column specified in the DataQualityRule.
"score": 3.14, # Output only. The column-level data quality score for this data scan job if and only if the 'column' field is set.The score ranges between between 0, 100 (up to two decimal points).
},
],
"dimensions": [ # A list of results at the dimension level.A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the 'dimension' field set to it.
{ # DataQualityDimensionResult provides a more detailed, per-dimension view of the results.
"dimension": { # A dimension captures data quality intent about a defined subset of the rules specified. # Output only. The dimension config specified in the DataQualitySpec, as is.
"name": "A String", # The dimension name a rule belongs to. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
},
"passed": True or False, # Whether the dimension passed or failed.
"score": 3.14, # Output only. The dimension-level data quality score for this data scan job if and only if the 'dimension' field is set.The score ranges between 0, 100 (up to two decimal points).
},
],
"passed": True or False, # Overall data quality result -- true if all rules passed.
"postScanActionsResult": { # The result of post scan actions of DataQualityScan job. # Output only. The result of post scan actions.
"bigqueryExportResult": { # The result of BigQuery export post scan action. # Output only. The result of BigQuery export post scan action.
"message": "A String", # Output only. Additional information about the BigQuery exporting.
"state": "A String", # Output only. Execution state for the BigQuery exporting.
},
},
"rowCount": "A String", # The count of rows processed.
"rules": [ # A list of all the rules in a job, and their results.
{ # DataQualityRuleResult provides a more detailed, per-rule view of the results.
"assertionRowCount": "A String", # Output only. The number of rows returned by the SQL statement in a SQL assertion rule.This field is only valid for SQL assertion rules.
"evaluatedCount": "A String", # The number of rows a rule was evaluated against.This field is only valid for row-level type rules.Evaluated count can be configured to either include all rows (default) - with null rows automatically failing rule evaluation, or exclude null rows from the evaluated_count, by setting ignore_nulls = true.
"failingRowsQuery": "A String", # The query to find rows that did not pass this rule.This field is only valid for row-level type rules.
"nullCount": "A String", # The number of rows with null values in the specified column.
"passRatio": 3.14, # The ratio of passed_count / evaluated_count.This field is only valid for row-level type rules.
"passed": True or False, # Whether the rule passed or failed.
"passedCount": "A String", # The number of rows which passed a rule evaluation.This field is only valid for row-level type rules.
"rule": { # A rule captures data quality intent about a data source. # The rule specified in the DataQualitySpec, as is.
"column": "A String", # Optional. The unnested column which this rule is evaluated against.
"description": "A String", # Optional. Description of the rule. The maximum length is 1,024 characters.
"dimension": "A String", # Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
"ignoreNull": True or False, # Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
"name": "A String", # Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
"nonNullExpectation": { # Evaluates whether each column value is null. # Row-level rule which evaluates whether each column value is null.
},
"rangeExpectation": { # Evaluates whether each column value lies between a specified range. # Row-level rule which evaluates whether each column value lies between a specified range.
"maxValue": "A String", # Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"strictMaxEnabled": True or False, # Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"regexExpectation": { # Evaluates whether each column value matches a specified regex. # Row-level rule which evaluates whether each column value matches a specified regex.
"regex": "A String", # Optional. A regular expression the column value is expected to match.
},
"rowConditionExpectation": { # Evaluates whether each row passes the specified condition.The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.Example: col1 >= 0 AND col2 < 10 # Row-level rule which evaluates whether each row in a table passes the specified condition.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"setExpectation": { # Evaluates whether each column value is contained by a specified set. # Row-level rule which evaluates whether each column value is contained by a specified set.
"values": [ # Optional. Expected values for the column value.
"A String",
],
},
"sqlAssertion": { # A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter (https://cloud.google.com/dataplex/docs/auto-data-quality-overview#data-reference-parameter).Example: SELECT * FROM ${data()} WHERE price < 0 # Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
"sqlStatement": "A String", # Optional. The SQL statement.
},
"statisticRangeExpectation": { # Evaluates whether the column aggregate statistic lies between a specified range. # Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
"maxValue": "A String", # Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"statistic": "A String", # Optional. The aggregate metric to evaluate.
"strictMaxEnabled": True or False, # Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"suspended": True or False, # Optional. Whether the Rule is active or suspended. Default is false.
"tableConditionExpectation": { # Evaluates whether the provided expression is true.The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.Example: MIN(col1) >= 0 # Aggregate rule which evaluates whether the provided expression is true for a table.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"threshold": 3.14, # Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
"uniquenessExpectation": { # Evaluates whether the column has duplicates. # Row-level rule which evaluates whether each column value is unique.
},
},
},
],
"scannedData": { # The data scanned during processing (e.g. in incremental DataScan) # The data scanned for this result.
"incrementalField": { # A data range denoted by a pair of start/end values of a field. # The range denoted by values of an incremental field
"end": "A String", # Value that marks the end of the range.
"field": "A String", # The field that contains values which monotonically increases over time (e.g. a timestamp column).
"start": "A String", # Value that marks the start of the range.
},
},
"score": 3.14, # Output only. The overall data quality score.The score ranges between 0, 100 (up to two decimal points).
},
"dataQualitySpec": { # DataQualityScan related setting. # Output only. Settings for a data quality scan.
"postScanActions": { # The configuration of post scan actions of DataQualityScan. # Optional. Actions to take upon job completion.
"bigqueryExport": { # The configuration of BigQuery export post scan action. # Optional. If set, results will be exported to the provided BigQuery table.
"resultsTable": "A String", # Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
},
"notificationReport": { # The configuration of notification report post scan action. # Optional. If set, results will be sent to the provided notification receipts upon triggers.
"jobEndTrigger": { # This trigger is triggered whenever a scan job run ends, regardless of the result. # Optional. If set, report will be sent when a scan job ends.
},
"jobFailureTrigger": { # This trigger is triggered when the scan job itself fails, regardless of the result. # Optional. If set, report will be sent when a scan job fails.
},
"recipients": { # The individuals or groups who are designated to receive notifications upon triggers. # Required. The recipients who will receive the notification report.
"emails": [ # Optional. The email recipients who will receive the DataQualityScan results report.
"A String",
],
},
"scoreThresholdTrigger": { # This trigger is triggered when the DQ score in the job result is less than a specified input score. # Optional. If set, report will be sent when score threshold is met.
"scoreThreshold": 3.14, # Optional. The score range is in 0,100.
},
},
},
"rowFilter": "A String", # Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10
"rules": [ # Required. The list of rules to evaluate against a data source. At least one rule is required.
{ # A rule captures data quality intent about a data source.
"column": "A String", # Optional. The unnested column which this rule is evaluated against.
"description": "A String", # Optional. Description of the rule. The maximum length is 1,024 characters.
"dimension": "A String", # Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are "COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"
"ignoreNull": True or False, # Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
"name": "A String", # Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
"nonNullExpectation": { # Evaluates whether each column value is null. # Row-level rule which evaluates whether each column value is null.
},
"rangeExpectation": { # Evaluates whether each column value lies between a specified range. # Row-level rule which evaluates whether each column value lies between a specified range.
"maxValue": "A String", # Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
"strictMaxEnabled": True or False, # Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"regexExpectation": { # Evaluates whether each column value matches a specified regex. # Row-level rule which evaluates whether each column value matches a specified regex.
"regex": "A String", # Optional. A regular expression the column value is expected to match.
},
"rowConditionExpectation": { # Evaluates whether each row passes the specified condition.The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.Example: col1 >= 0 AND col2 < 10 # Row-level rule which evaluates whether each row in a table passes the specified condition.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"setExpectation": { # Evaluates whether each column value is contained by a specified set. # Row-level rule which evaluates whether each column value is contained by a specified set.
"values": [ # Optional. Expected values for the column value.
"A String",
],
},
"sqlAssertion": { # A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter (https://cloud.google.com/dataplex/docs/auto-data-quality-overview#data-reference-parameter).Example: SELECT * FROM ${data()} WHERE price < 0 # Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
"sqlStatement": "A String", # Optional. The SQL statement.
},
"statisticRangeExpectation": { # Evaluates whether the column aggregate statistic lies between a specified range. # Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
"maxValue": "A String", # Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"minValue": "A String", # Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
"statistic": "A String", # Optional. The aggregate metric to evaluate.
"strictMaxEnabled": True or False, # Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
"strictMinEnabled": True or False, # Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
},
"suspended": True or False, # Optional. Whether the Rule is active or suspended. Default is false.
"tableConditionExpectation": { # Evaluates whether the provided expression is true.The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.Example: MIN(col1) >= 0 # Aggregate rule which evaluates whether the provided expression is true for a table.
"sqlExpression": "A String", # Optional. The SQL expression.
},
"threshold": 3.14, # Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
"uniquenessExpectation": { # Evaluates whether the column has duplicates. # Row-level rule which evaluates whether each column value is unique.
},
},
],
"samplingPercent": 3.14, # Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
},
"endTime": "A String", # Output only. The time when the DataScanJob ended.
"message": "A String", # Output only. Additional information about the current state.
"name": "A String", # Output only. Identifier. The relative resource name of the DataScanJob, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}/jobs/{job_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.
"startTime": "A String", # Output only. The time when the DataScanJob was started.
"state": "A String", # Output only. Execution state for the DataScanJob.
"type": "A String", # Output only. The type of the parent DataScan.
"uid": "A String", # Output only. System generated globally unique ID for the DataScanJob.
},
],
"nextPageToken": "A String", # Token to retrieve the next page of results, or empty if there are no more results in the list.
}

Cloud Dataplex API . projects . locations . dataScans . jobs

Instance Methods

Method Details