Bulk import of multiple Documents. Request processing may be synchronous. Non-existing items are created. Note: It is possible for a subset of the Documents to be successfully updated.

Permanently deletes all selected Documents in a branch. This process is asynchronous. Depending on the number of Documents to be deleted, this operation can take hours to complete. Before the delete operation completes, some Documents might still be returned by DocumentService.GetDocument or DocumentService.ListDocuments. To get a list of the Documents to be deleted, set PurgeDocumentsRequest.force to false.

Method Details

close()

Close httplib2 connections.

create(parent, body=None, documentId=None, x__xgafv=None)

Creates a Document.

Args:
  parent: string, Required. The parent resource name, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. (required)
  body: object, The request body.
    The object takes the form of:

{ # Document captures all raw metadata information of items to be recommended or searched.
  "aclInfo": { # ACL Information of the Document. # Access control information for the document.
    "readers": [ # Readers of the document.
      { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
        "idpWide": True or False, # All users within the Identity Provider.
        "principals": [ # List of principals.
          { # Principal identifier of a user or a group.
            "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
            "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
            "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
          },
        ],
      },
    ],
  },
  "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
    "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
    "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
    "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
  },
  "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
    "a_key": "", # Properties of the object.
  },
  "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
  "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
    "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
      { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
        "code": 42, # The status code, which should be an enum value of google.rpc.Code.
        "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
          {
            "a_key": "", # Properties of the object. Contains field @type with type URL.
          },
        ],
        "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
      },
    ],
    "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
    "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
  },
  "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
  "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
  "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
  "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
  "schemaId": "A String", # The identifier of the schema located in the same data store.
  "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
    "a_key": "", # Properties of the object.
  },
}

  documentId: string, Required. The ID to use for the Document, which becomes the final component of the Document.name. If the caller does not have permission to create the Document, regardless of whether or not it exists, a `PERMISSION_DENIED` error is returned. This field must be unique among all Documents with the same parent. Otherwise, an `ALREADY_EXISTS` error is returned. This field must conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters. Otherwise, an `INVALID_ARGUMENT` error is returned.
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # Document captures all raw metadata information of items to be recommended or searched.
  "aclInfo": { # ACL Information of the Document. # Access control information for the document.
    "readers": [ # Readers of the document.
      { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
        "idpWide": True or False, # All users within the Identity Provider.
        "principals": [ # List of principals.
          { # Principal identifier of a user or a group.
            "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
            "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
            "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
          },
        ],
      },
    ],
  },
  "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
    "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
    "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
    "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
  },
  "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
    "a_key": "", # Properties of the object.
  },
  "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
  "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
    "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
      { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
        "code": 42, # The status code, which should be an enum value of google.rpc.Code.
        "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
          {
            "a_key": "", # Properties of the object. Contains field @type with type URL.
          },
        ],
        "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
      },
    ],
    "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
    "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
  },
  "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
  "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
  "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
  "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
  "schemaId": "A String", # The identifier of the schema located in the same data store.
  "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
    "a_key": "", # Properties of the object.
  },
}

delete(name, x__xgafv=None)

Deletes a Document.

Args:
  name: string, Required. Full resource name of Document, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}`. If the caller does not have permission to delete the Document, regardless of whether or not it exists, a `PERMISSION_DENIED` error is returned. If the Document to delete does not exist, a `NOT_FOUND` error is returned. (required)
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # A generic empty message that you can re-use to avoid defining duplicated empty messages in your APIs. A typical example is to use it as the request or the response type of an API method. For instance: service Foo { rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty); }
}

get(name, x__xgafv=None)

Gets a Document.

Args:
  name: string, Required. Full resource name of Document, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}`. If the caller does not have permission to access the Document, regardless of whether or not it exists, a `PERMISSION_DENIED` error is returned. If the requested Document does not exist, a `NOT_FOUND` error is returned. (required)
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # Document captures all raw metadata information of items to be recommended or searched.
  "aclInfo": { # ACL Information of the Document. # Access control information for the document.
    "readers": [ # Readers of the document.
      { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
        "idpWide": True or False, # All users within the Identity Provider.
        "principals": [ # List of principals.
          { # Principal identifier of a user or a group.
            "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
            "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
            "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
          },
        ],
      },
    ],
  },
  "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
    "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
    "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
    "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
  },
  "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
    "a_key": "", # Properties of the object.
  },
  "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
  "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
    "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
      { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
        "code": 42, # The status code, which should be an enum value of google.rpc.Code.
        "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
          {
            "a_key": "", # Properties of the object. Contains field @type with type URL.
          },
        ],
        "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
      },
    ],
    "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
    "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
  },
  "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
  "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
  "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
  "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
  "schemaId": "A String", # The identifier of the schema located in the same data store.
  "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
    "a_key": "", # Properties of the object.
  },
}

getProcessedDocument(name, imageId=None, processedDocumentFormat=None, processedDocumentType=None, x__xgafv=None)

Gets the parsed layout information for a Document.

Args:
  name: string, Required. Full resource name of Document, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document}`. If the caller does not have permission to access the Document, regardless of whether or not it exists, a `PERMISSION_DENIED` error is returned. If the requested Document does not exist, a `NOT_FOUND` error is returned. (required)
  imageId: string, Optional. Specifies config for IMAGE_BYTES.
  processedDocumentFormat: string, What format output should be. If unspecified, defaults to JSON.
    Allowed values
      PROCESSED_DOCUMENT_FORMAT_UNSPECIFIED - Default value.
      JSON - Output format is a JSON string representation of processed document.
  processedDocumentType: string, Required. What type of processing to return.
    Allowed values
      PROCESSED_DOCUMENT_TYPE_UNSPECIFIED - Default value.
      PARSED_DOCUMENT - Available for all data store parsing configs.
      CHUNKED_DOCUMENT - Only available if ChunkingConfig is enabled on the data store.
      IMAGE_CONVERTED_DOCUMENT - Returns the converted Image bytes (as JPEG or PNG) if available.
      IMAGE_BYTES - Return image bytes in base64 encoded format if image_id of a document is provided, only supported for enabling shareholder-structure in layout parsing config for now.
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # Document captures all raw metadata information of items to be recommended or searched.
  "document": "A String", # Required. Full resource name of the referenced document, in the format `projects/*/locations/*/collections/*/dataStores/*/branches/*/documents/*`.
  "jsonData": "A String", # The JSON string representation of the processed document.
}

import_(parent, body=None, x__xgafv=None)

Bulk import of multiple Documents. Request processing may be synchronous. Non-existing items are created. Note: It is possible for a subset of the Documents to be successfully updated.

Args:
parent: string, Required. The parent branch resource name, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. Requires create/update permission. (required)
body: object, The request body.
The object takes the form of:

{ # Request message for Import methods.
"alloyDbSource": { # AlloyDB source import data from. # AlloyDB input source.
"clusterId": "A String", # Required. The AlloyDB cluster to copy the data from with a length limit of 256 characters.
"databaseId": "A String", # Required. The AlloyDB database to copy the data from with a length limit of 256 characters.
"gcsStagingDir": "A String", # Intermediate Cloud Storage directory used for the import with a length limit of 2,000 characters. Can be specified if one wants to have the AlloyDB export to a specific Cloud Storage directory. Ensure that the AlloyDB service account has the necessary Cloud Storage Admin permissions to access the specified Cloud Storage directory.
"locationId": "A String", # Required. The AlloyDB location to copy the data from with a length limit of 256 characters.
"projectId": "A String", # The project ID that contains the AlloyDB source. Has a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
"tableId": "A String", # Required. The AlloyDB table to copy the data from with a length limit of 256 characters.
},
"autoGenerateIds": True or False, # Whether to automatically generate IDs for the documents if absent. If set to `true`, Document.ids are automatically generated based on the hash of the payload, where IDs may not be consistent during multiple imports. In which case ReconciliationMode.FULL is highly recommended to avoid duplicate contents. If unset or set to `false`, Document.ids have to be specified using id_field, otherwise, documents without IDs fail to be imported. Supported data sources: * GcsSource. GcsSource.data_schema must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. * BigQuerySource. BigQuerySource.data_schema must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. * SpannerSource. * CloudSqlSource. * FirestoreSource. * BigtableSource.
"bigquerySource": { # BigQuery source import data from. # BigQuery input source.
"dataSchema": "A String", # The schema to use when parsing the data from the source. Supported values for user event imports: * `user_event` (default): One UserEvent per row. Supported values for document imports: * `document` (default): One Document format per row. Each document must have a valid Document.id and one of Document.json_data or Document.struct_data. * `custom`: One custom data per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical.
"datasetId": "A String", # Required. The BigQuery data set to copy the data from with a length limit of 1,024 characters.
"gcsStagingDir": "A String", # Intermediate Cloud Storage directory used for the import with a length limit of 2,000 characters. Can be specified if one wants to have the BigQuery export to a specific Cloud Storage directory.
"partitionDate": { # Represents a whole or partial calendar date, such as a birthday. The time of day and time zone are either specified elsewhere or are insignificant. The date is relative to the Gregorian Calendar. This can represent one of the following: * A full date, with non-zero year, month, and day values. * A month and day, with a zero year (for example, an anniversary). * A year on its own, with a zero month and a zero day. * A year and month, with a zero day (for example, a credit card expiration date). Related types: * google.type.TimeOfDay * google.type.DateTime * google.protobuf.Timestamp # BigQuery time partitioned table's _PARTITIONDATE in YYYY-MM-DD format.
"day": 42, # Day of a month. Must be from 1 to 31 and valid for the year and month, or 0 to specify a year by itself or a year and month where the day isn't significant.
"month": 42, # Month of a year. Must be from 1 to 12, or 0 to specify a year without a month and day.
"year": 42, # Year of the date. Must be from 1 to 9999, or 0 to specify a date without a year.
},
"projectId": "A String", # The project ID or the project number that contains the BigQuery source. Has a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
"tableId": "A String", # Required. The BigQuery table to copy the data from with a length limit of 1,024 characters.
},
"bigtableSource": { # The Cloud Bigtable source for importing data. # Cloud Bigtable input source.
"bigtableOptions": { # The Bigtable Options object that contains information to support the import. # Required. Bigtable options that contains information needed when parsing data into typed structures. For example, column type annotations.
"families": { # The mapping from family names to an object that contains column families level information for the given column family. If a family is not present in this map it will be ignored.
"a_key": { # The column family of the Bigtable.
"columns": [ # The list of objects that contains column level information for each column. If a column is not present in this list it will be ignored.
{ # The column of the Bigtable.
"encoding": "A String", # The encoding mode of the values when the type is not `STRING`. Acceptable encoding values are: * `TEXT`: indicates values are alphanumeric text strings. * `BINARY`: indicates values are encoded using `HBase Bytes.toBytes` family of functions. This can be overridden for a specific column by listing that column in `columns` and specifying an encoding for it.
"fieldName": "A String", # The field name to use for this column in the document. The name has to match the pattern `a-zA-Z0-9*`. If not set, it is parsed from the qualifier bytes with best effort. However, due to different naming patterns, field name collisions could happen, where parsing behavior is undefined.
"qualifier": "A String", # Required. Qualifier of the column. If it cannot be decoded with utf-8, use a base-64 encoded string instead.
"type": "A String", # The type of values in this column family. The values are expected to be encoded using `HBase Bytes.toBytes` function when the encoding value is set to `BINARY`.
},
],
"encoding": "A String", # The encoding mode of the values when the type is not STRING. Acceptable encoding values are: * `TEXT`: indicates values are alphanumeric text strings. * `BINARY`: indicates values are encoded using `HBase Bytes.toBytes` family of functions. This can be overridden for a specific column by listing that column in `columns` and specifying an encoding for it.
"fieldName": "A String", # The field name to use for this column family in the document. The name has to match the pattern `a-zA-Z0-9*`. If not set, it is parsed from the family name with best effort. However, due to different naming patterns, field name collisions could happen, where parsing behavior is undefined.
"type": "A String", # The type of values in this column family. The values are expected to be encoded using `HBase Bytes.toBytes` function when the encoding value is set to `BINARY`.
},
},
"keyFieldName": "A String", # The field name used for saving row key value in the document. The name has to match the pattern `a-zA-Z0-9*`.
},
"instanceId": "A String", # Required. The instance ID of the Cloud Bigtable that needs to be imported.
"projectId": "A String", # The project ID that contains the Bigtable source. Has a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
"tableId": "A String", # Required. The table ID of the Cloud Bigtable that needs to be imported.
},
"cloudSqlSource": { # Cloud SQL source import data from. # Cloud SQL input source.
"databaseId": "A String", # Required. The Cloud SQL database to copy the data from with a length limit of 256 characters.
"gcsStagingDir": "A String", # Intermediate Cloud Storage directory used for the import with a length limit of 2,000 characters. Can be specified if one wants to have the Cloud SQL export to a specific Cloud Storage directory. Ensure that the Cloud SQL service account has the necessary Cloud Storage Admin permissions to access the specified Cloud Storage directory.
"instanceId": "A String", # Required. The Cloud SQL instance to copy the data from with a length limit of 256 characters.
"offload": True or False, # Option for serverless export. Enabling this option will incur additional cost. More info can be found [here](https://cloud.google.com/sql/pricing#serverless).
"projectId": "A String", # The project ID that contains the Cloud SQL source. Has a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
"tableId": "A String", # Required. The Cloud SQL table to copy the data from with a length limit of 256 characters.
},
"errorConfig": { # Configuration of destination for Import related errors. # The desired location of errors incurred during the Import.
"gcsPrefix": "A String", # Cloud Storage prefix for import errors. This must be an empty, existing Cloud Storage directory. Import errors are written to sharded files in this directory, one per line, as a JSON-encoded `google.rpc.Status` message.
},
"fhirStoreSource": { # Cloud FhirStore source import data from. # FhirStore input source.
"fhirStore": "A String", # Required. The full resource name of the FHIR store to import data from, in the format of `projects/{project}/locations/{location}/datasets/{dataset}/fhirStores/{fhir_store}`.
"gcsStagingDir": "A String", # Intermediate Cloud Storage directory used for the import with a length limit of 2,000 characters. Can be specified if one wants to have the FhirStore export to a specific Cloud Storage directory.
"resourceTypes": [ # The FHIR resource types to import. The resource types should be a subset of all [supported FHIR resource types](https://cloud.google.com/generative-ai-app-builder/docs/fhir-schema-reference#resource-level-specification). Default to all supported FHIR resource types if empty.
"A String",
],
"updateFromLatestPredefinedSchema": True or False, # Optional. Whether to update the DataStore schema to the latest predefined schema. If true, the DataStore schema will be updated to include any FHIR fields or resource types that have been added since the last import and corresponding FHIR resources will be imported from the FHIR store. Note this field cannot be used in conjunction with `resource_types`. It should be used after initial import.
},
"firestoreSource": { # Firestore source import data from. # Firestore input source.
"collectionId": "A String", # Required. The Firestore collection (or entity) to copy the data from with a length limit of 1,500 characters.
"databaseId": "A String", # Required. The Firestore database to copy the data from with a length limit of 256 characters.
"gcsStagingDir": "A String", # Intermediate Cloud Storage directory used for the import with a length limit of 2,000 characters. Can be specified if one wants to have the Firestore export to a specific Cloud Storage directory. Ensure that the Firestore service account has the necessary Cloud Storage Admin permissions to access the specified Cloud Storage directory.
"projectId": "A String", # The project ID that the Cloud SQL source is in with a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
},
"forceRefreshContent": True or False, # Optional. Whether to force refresh the unstructured content of the documents. If set to `true`, the content part of the documents will be refreshed regardless of the update status of the referencing content.
"gcsSource": { # Cloud Storage location for input content. # Cloud Storage location for the input content.
"dataSchema": "A String", # The schema to use when parsing the data from the source. Supported values for document imports: * `document` (default): One JSON Document per line. Each document must have a valid Document.id. * `content`: Unstructured data (e.g. PDF, HTML). Each file matched by `input_uris` becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string. * `custom`: One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical. * `csv`: A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical. Supported values for user event imports: * `user_event` (default): One JSON UserEvent per line.
"inputUris": [ # Required. Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, `gs://bucket/directory/object.json`) or a pattern matching one or more files, such as `gs://bucket/directory/*.json`. A request can contain at most 100 files (or 100,000 files if `data_schema` is `content`). Each file can be up to 2 GB (or 100 MB if `data_schema` is `content`).
"A String",
],
},
"idField": "A String", # The field indicates the ID field or column to be used as unique IDs of the documents. For GcsSource it is the key of the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`. For others, it may be the column name of the table where the unique ids are stored. The values of the JSON field or the table column are used as the Document.ids. The JSON field or the table column must be of string type, and the values must be set as valid strings conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) with 1-63 characters. Otherwise, documents without valid IDs fail to be imported. Only set this field when auto_generate_ids is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown. If it is unset, a default value `_id` is used when importing from the allowed data sources. Supported data sources: * GcsSource. GcsSource.data_schema must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. * BigQuerySource. BigQuerySource.data_schema must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. * SpannerSource. * CloudSqlSource. * BigtableSource.
"inlineSource": { # The inline source for the input config for ImportDocuments method. # The Inline source for the input content for documents.
"documents": [ # Required. A list of documents to update/create. Each document must have a valid Document.id. Recommended max of 100 items.
{ # Document captures all raw metadata information of items to be recommended or searched.
"aclInfo": { # ACL Information of the Document. # Access control information for the document.
"readers": [ # Readers of the document.
{ # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
"idpWide": True or False, # All users within the Identity Provider.
"principals": [ # List of principals.
{ # Principal identifier of a user or a group.
"externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
"groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
"userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
},
],
},
],
},
"content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
"mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
"rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
"uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
},
"derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
"a_key": "", # Properties of the object.
},
"id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
"indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
"errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
{ # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
],
"indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
"pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
},
"indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
"jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
"name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
"parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
"schemaId": "A String", # The identifier of the schema located in the same data store.
"structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
"a_key": "", # Properties of the object.
},
},
],
},
"reconciliationMode": "A String", # The mode of reconciliation between existing documents and the documents to be imported. Defaults to ReconciliationMode.INCREMENTAL.
"spannerSource": { # The Spanner source for importing data # Spanner input source.
"databaseId": "A String", # Required. The database ID of the source Spanner table.
"enableDataBoost": True or False, # Whether to apply data boost on Spanner export. Enabling this option will incur additional cost. More info can be found [here](https://cloud.google.com/spanner/docs/databoost/databoost-overview#billing_and_quotas).
"instanceId": "A String", # Required. The instance ID of the source Spanner table.
"projectId": "A String", # The project ID that contains the Spanner source. Has a length limit of 128 characters. If not specified, inherits the project ID from the parent request.
"tableId": "A String", # Required. The table name of the Spanner database that needs to be imported.
},
"updateMask": "A String", # Indicates which fields in the provided imported documents to update. If not set, the default is to update all fields.
}

x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format

Returns:
An object of the form:

{ # This resource represents a long-running operation that is the result of a network API call.
"done": True or False, # If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
"error": { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors). # The error result of the operation in case of failure or cancellation.
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
"details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
{
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
],
"message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
},
"metadata": { # Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
"name": "A String", # The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
"response": { # The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
}

list(parent, pageSize=None, pageToken=None, x__xgafv=None)

Gets a list of Documents.

Args:
  parent: string, Required. The parent branch resource name, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. Use `default_branch` as the branch ID, to list documents under the default branch. If the caller does not have permission to list Documents under this branch, regardless of whether or not this branch exists, a `PERMISSION_DENIED` error is returned. (required)
  pageSize: integer, Maximum number of Documents to return. If unspecified, defaults to 100. The maximum allowed value is 1000. Values above 1000 are set to 1000. If this field is negative, an `INVALID_ARGUMENT` error is returned.
  pageToken: string, A page token ListDocumentsResponse.next_page_token, received from a previous DocumentService.ListDocuments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to DocumentService.ListDocuments must match the call that provided the page token. Otherwise, an `INVALID_ARGUMENT` error is returned.
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # Response message for DocumentService.ListDocuments method.
  "documents": [ # The Documents.
    { # Document captures all raw metadata information of items to be recommended or searched.
      "aclInfo": { # ACL Information of the Document. # Access control information for the document.
        "readers": [ # Readers of the document.
          { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
            "idpWide": True or False, # All users within the Identity Provider.
            "principals": [ # List of principals.
              { # Principal identifier of a user or a group.
                "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
                "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
                "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
              },
            ],
          },
        ],
      },
      "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
        "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
        "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
        "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
      },
      "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
        "a_key": "", # Properties of the object.
      },
      "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
      "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
        "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
          { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
            "code": 42, # The status code, which should be an enum value of google.rpc.Code.
            "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
              {
                "a_key": "", # Properties of the object. Contains field @type with type URL.
              },
            ],
            "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
          },
        ],
        "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
        "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
      },
      "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
      "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
      "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
      "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
      "schemaId": "A String", # The identifier of the schema located in the same data store.
      "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
        "a_key": "", # Properties of the object.
      },
    },
  ],
  "nextPageToken": "A String", # A token that can be sent as ListDocumentsRequest.page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.
}

list_next()

Retrieves the next page of results.

        Args:
          previous_request: The request for the previous page. (required)
          previous_response: The response from the request for the previous page. (required)

        Returns:
          A request object that you can call 'execute()' on to request the next
          page. Returns None if there are no more items in the collection.

patch(name, allowMissing=None, body=None, updateMask=None, x__xgafv=None)

Updates a Document.

Args:
  name: string, Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters. (required)
  body: object, The request body.
    The object takes the form of:

{ # Document captures all raw metadata information of items to be recommended or searched.
  "aclInfo": { # ACL Information of the Document. # Access control information for the document.
    "readers": [ # Readers of the document.
      { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
        "idpWide": True or False, # All users within the Identity Provider.
        "principals": [ # List of principals.
          { # Principal identifier of a user or a group.
            "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
            "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
            "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
          },
        ],
      },
    ],
  },
  "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
    "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
    "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
    "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
  },
  "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
    "a_key": "", # Properties of the object.
  },
  "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
  "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
    "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
      { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
        "code": 42, # The status code, which should be an enum value of google.rpc.Code.
        "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
          {
            "a_key": "", # Properties of the object. Contains field @type with type URL.
          },
        ],
        "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
      },
    ],
    "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
    "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
  },
  "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
  "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
  "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
  "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
  "schemaId": "A String", # The identifier of the schema located in the same data store.
  "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
    "a_key": "", # Properties of the object.
  },
}

  allowMissing: boolean, If set to `true` and the Document is not found, a new Document is be created.
  updateMask: string, Indicates which fields in the provided imported 'document' to update. If not set, by default updates all fields.
  x__xgafv: string, V1 error format.
    Allowed values
      1 - v1 error format
      2 - v2 error format

Returns:
  An object of the form:

    { # Document captures all raw metadata information of items to be recommended or searched.
  "aclInfo": { # ACL Information of the Document. # Access control information for the document.
    "readers": [ # Readers of the document.
      { # AclRestriction to model complex inheritance restrictions. Example: Modeling a "Both Permit" inheritance, where to access a child document, user needs to have access to parent document. Document Hierarchy - Space_S --> Page_P. Readers: Space_S: group_1, user_1 Page_P: group_2, group_3, user_2 Space_S ACL Restriction - { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ] } ] } } Page_P ACL Restriction. { "acl_info": { "readers": [ { "principals": [ { "group_id": "group_2" }, { "group_id": "group_3" }, { "user_id": "user_2" } ], }, { "principals": [ { "group_id": "group_1" }, { "user_id": "user_1" } ], } ] } }
        "idpWide": True or False, # All users within the Identity Provider.
        "principals": [ # List of principals.
          { # Principal identifier of a user or a group.
            "externalEntityId": "A String", # For 3P application identities which are not present in the customer identity provider.
            "groupId": "A String", # Group identifier. For Google Workspace user account, group_id should be the google workspace group email. For non-google identity provider user account, group_id is the mapped group identifier configured during the workforcepool config.
            "userId": "A String", # User identifier. For Google Workspace user account, user_id should be the google workspace user email. For non-google identity provider user account, user_id is the mapped user identifier configured during the workforcepool config.
          },
        ],
      },
    ],
  },
  "content": { # Unstructured data linked to this document. # The unstructured data linked to this document. Content can only be set and must be set if this document is under a `CONTENT_REQUIRED` data store.
    "mimeType": "A String", # The MIME type of the content. Supported types: * `application/pdf` (PDF, only native PDFs are supported for now) * `text/html` (HTML) * `text/plain` (TXT) * `application/xml` or `text/xml` (XML) * `application/json` (JSON) * `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (DOCX) * `application/vnd.openxmlformats-officedocument.presentationml.presentation` (PPTX) * `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (XLSX) * `application/vnd.ms-excel.sheet.macroenabled.12` (XLSM) The following types are supported only if layout parser is enabled in the data store: * `image/bmp` (BMP) * `image/gif` (GIF) * `image/jpeg` (JPEG) * `image/png` (PNG) * `image/tiff` (TIFF) See https://www.iana.org/assignments/media-types/media-types.xhtml.
    "rawBytes": "A String", # The content represented as a stream of bytes. The maximum length is 1,000,000 bytes (1 MB / ~0.95 MiB). Note: As with all `bytes` fields, this field is represented as pure binary in Protocol Buffers and base64-encoded string in JSON. For example, `abc123!?$*&()'-=@~` should be represented as `YWJjMTIzIT8kKiYoKSctPUB+` in JSON. See https://developers.google.com/protocol-buffers/docs/proto3#json.
    "uri": "A String", # The URI of the content. Only Cloud Storage URIs (e.g. `gs://bucket-name/path/to/file`) are supported. The maximum file size is 2.5 MB for text-based formats, 200 MB for other formats.
  },
  "derivedStructData": { # Output only. This field is OUTPUT_ONLY. It contains derived data that are not in the original input document.
    "a_key": "", # Properties of the object.
  },
  "id": "A String", # Immutable. The identifier of the document. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 128 characters.
  "indexStatus": { # Index status of the document. # Output only. The index status of the document. * If document is indexed successfully, the index_time field is populated. * Otherwise, if document is not indexed due to errors, the error_samples field is populated. * Otherwise, if document's index is in progress, the pending_message field is populated.
    "errorSamples": [ # A sample of errors encountered while indexing the document. If this field is populated, the document is not indexed due to errors.
      { # The `Status` type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by [gRPC](https://github.com/grpc). Each `Status` message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the [API Design Guide](https://cloud.google.com/apis/design/errors).
        "code": 42, # The status code, which should be an enum value of google.rpc.Code.
        "details": [ # A list of messages that carry the error details. There is a common set of message types for APIs to use.
          {
            "a_key": "", # Properties of the object. Contains field @type with type URL.
          },
        ],
        "message": "A String", # A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
      },
    ],
    "indexTime": "A String", # The time when the document was indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours.
    "pendingMessage": "A String", # Immutable. The message indicates the document index is in progress. If this field is populated, the document index is pending.
  },
  "indexTime": "A String", # Output only. The time when the document was last indexed. If this field is populated, it means the document has been indexed. While documents typically become searchable within seconds of indexing, it can sometimes take up to a few hours. If this field is not populated, it means the document has never been indexed.
  "jsonData": "A String", # The JSON string representation of the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
  "name": "A String", # Immutable. The full resource name of the document. Format: `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}/documents/{document_id}`. This field must be a UTF-8 encoded string with a length limit of 1024 characters.
  "parentDocumentId": "A String", # The identifier of the parent document. Currently supports at most two level document hierarchy. Id should conform to [RFC-1034](https://tools.ietf.org/html/rfc1034) standard with a length limit of 63 characters.
  "schemaId": "A String", # The identifier of the schema located in the same data store.
  "structData": { # The structured JSON data for the document. It should conform to the registered Schema or an `INVALID_ARGUMENT` error is thrown.
    "a_key": "", # Properties of the object.
  },
}

purge(parent, body=None, x__xgafv=None)

Args:
parent: string, Required. The parent resource name, such as `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. (required)
body: object, The request body.
The object takes the form of:

{ # Request message for DocumentService.PurgeDocuments method.
"errorConfig": { # Configuration of destination for Purge related errors. # The desired location of errors incurred during the purge.
"gcsPrefix": "A String", # Cloud Storage prefix for purge errors. This must be an empty, existing Cloud Storage directory. Purge errors are written to sharded files in this directory, one per line, as a JSON-encoded `google.rpc.Status` message.
},
"filter": "A String", # Required. Filter matching documents to purge. Only currently supported value is `*` (all items).
"force": True or False, # Actually performs the purge. If `force` is set to false, return the expected purge count without deleting any documents.
"gcsSource": { # Cloud Storage location for input content. # Cloud Storage location for the input content. Supported `data_schema`: * `document_id`: One valid Document.id per line.
"dataSchema": "A String", # The schema to use when parsing the data from the source. Supported values for document imports: * `document` (default): One JSON Document per line. Each document must have a valid Document.id. * `content`: Unstructured data (e.g. PDF, HTML). Each file matched by `input_uris` becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string. * `custom`: One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical. * `csv`: A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical. Supported values for user event imports: * `user_event` (default): One JSON UserEvent per line.
"inputUris": [ # Required. Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, `gs://bucket/directory/object.json`) or a pattern matching one or more files, such as `gs://bucket/directory/*.json`. A request can contain at most 100 files (or 100,000 files if `data_schema` is `content`). Each file can be up to 2 GB (or 100 MB if `data_schema` is `content`).
"A String",
],
},
"inlineSource": { # The inline source for the input config for DocumentService.PurgeDocuments method. # Inline source for the input content for purge.
"documents": [ # Required. A list of full resource name of documents to purge. In the format `projects/*/locations/*/collections/*/dataStores/*/branches/*/documents/*`. Recommended max of 100 items.
"A String",
],
},
}

x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
2 - v2 error format

Returns:
An object of the form:

Discovery Engine API . projects . locations . collections . dataStores . branches . documents

Instance Methods

Method Details