Client for Google Cloud Dataproc API#

class google.cloud.dataproc_v1.ClusterControllerClient(transport=None, channel=None, credentials=None, client_config=None, client_info=None, client_options=None)[source]#

The ClusterControllerService provides methods to manage clusters of Compute Engine instances.

Constructor.

Parameters
  • (Union[ClusterControllerGrpcTransport, (transport) – Callable[[~.Credentials, type], ~.ClusterControllerGrpcTransport]): A transport instance, responsible for actually making the API calls. The default transport uses the gRPC protocol. This argument may also be a callable which returns a transport instance. Callables will be sent the credentials as the first argument and the default transport class as the second argument.

  • channel (grpc.Channel) – DEPRECATED. A Channel instance through which to make calls. This argument is mutually exclusive with credentials; providing both will raise an exception.

  • credentials (google.auth.credentials.Credentials) – The authorization credentials to attach to requests. These credentials identify this application to the service. If none are specified, the client will attempt to ascertain the credentials from the environment. This argument is mutually exclusive with providing a transport instance to transport; doing so will raise an exception.

  • client_config (dict) – DEPRECATED. A dictionary of call options for each method. If not specified, the default configuration is used.

  • client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own client library.

  • client_options (Union[dict, google.api_core.client_options.ClientOptions]) – Client options used to set user options on the client. API Endpoint should be set through client_options.

create_cluster(project_id, region, cluster, request_id=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Creates a cluster in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `cluster`:
>>> cluster = {}
>>>
>>> response = client.create_cluster(project_id, region, cluster)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • cluster (Union[dict, Cluster]) –

    Required. The cluster to create.

    If a dict is provided, it must be of the same form as the protobuf message Cluster

  • request_id (str) –

    Optional. A unique id used to identify the request. If the server receives two CreateClusterRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.

    It is recommended to always set this value to a UUID.

    The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
delete_cluster(project_id, region, cluster_name, cluster_uuid=None, request_id=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Deletes a cluster in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `cluster_name`:
>>> cluster_name = ''
>>>
>>> response = client.delete_cluster(project_id, region, cluster_name)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • cluster_name (str) – Required. The cluster name.

  • cluster_uuid (str) – Optional. Specifying the cluster_uuid means the RPC should fail (with error NOT_FOUND) if cluster with specified UUID does not exist.

  • request_id (str) –

    Optional. A unique id used to identify the request. If the server receives two DeleteClusterRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.

    It is recommended to always set this value to a UUID.

    The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
diagnose_cluster(project_id, region, cluster_name, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Gets cluster diagnostic information. After the operation completes, the Operation.response field contains DiagnoseClusterOutputLocation.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `cluster_name`:
>>> cluster_name = ''
>>>
>>> response = client.diagnose_cluster(project_id, region, cluster_name)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • cluster_name (str) – Required. The cluster name.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
enums = <module 'google.cloud.dataproc_v1.gapic.enums' from '/var/code/gcp/dataproc/google/cloud/dataproc_v1/gapic/enums.py'>#
classmethod from_service_account_file(filename, *args, **kwargs)[source]#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

ClusterControllerClient

classmethod from_service_account_json(filename, *args, **kwargs)#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

ClusterControllerClient

get_cluster(project_id, region, cluster_name, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Gets the resource representation for a cluster in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `cluster_name`:
>>> cluster_name = ''
>>>
>>> response = client.get_cluster(project_id, region, cluster_name)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • cluster_name (str) – Required. The cluster name.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A Cluster instance.

Raises
list_clusters(project_id, region, filter_=None, page_size=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Lists all regions/{region}/clusters in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # Iterate over all results
>>> for element in client.list_clusters(project_id, region):
...     # process element
...     pass
>>>
>>>
>>> # Alternatively:
>>>
>>> # Iterate over results one page at a time
>>> for page in client.list_clusters(project_id, region).pages:
...     for element in page:
...         # process element
...         pass
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • filter_ (str) –

    Optional. A filter constraining the clusters to list. Filters are case-sensitive and have the following syntax:

    field = value [AND [field = value]] …

    where field is one of status.state, clusterName, or labels.[KEY], and [KEY] is a label key. value can be * to match all values. status.state can be one of the following: ACTIVE, INACTIVE, CREATING, RUNNING, ERROR, DELETING, or UPDATING. ACTIVE contains the CREATING, UPDATING, and RUNNING states. INACTIVE contains the DELETING and ERROR states. clusterName is the name of the cluster provided at creation time. Only the logical AND operator is supported; space-separated items are treated as having an implicit AND operator.

    Example filter:

    status.state = ACTIVE AND clusterName = mycluster AND labels.env = staging AND labels.starred = *

  • page_size (int) – The maximum number of resources contained in the underlying API response. If page streaming is performed per- resource, this parameter does not affect the return value. If page streaming is performed per-page, this determines the maximum number of resources in a page.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A PageIterator instance. An iterable of Cluster instances. You can also iterate over the pages of the response using its pages property.

Raises
update_cluster(project_id, region, cluster_name, cluster, update_mask, graceful_decommission_timeout=None, request_id=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Updates a cluster in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.ClusterControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `cluster_name`:
>>> cluster_name = ''
>>>
>>> # TODO: Initialize `cluster`:
>>> cluster = {}
>>>
>>> # TODO: Initialize `update_mask`:
>>> update_mask = {}
>>>
>>> response = client.update_cluster(project_id, region, cluster_name, cluster, update_mask)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • cluster_name (str) – Required. The cluster name.

  • cluster (Union[dict, Cluster]) –

    Required. The changes to the cluster.

    If a dict is provided, it must be of the same form as the protobuf message Cluster

  • update_mask (Union[dict, FieldMask]) –

    Required. Specifies the path, relative to Cluster, of the field to update. For example, to change the number of workers in a cluster to 5, the update_mask parameter would be specified as config.worker_config.num_instances, and the PATCH request body would specify the new value, as follows:

    {
      "config":{
        "workerConfig":{
          "numInstances":"5"
        }
      }
    }
    

    Similarly, to change the number of preemptible workers in a cluster to 5, the update_mask parameter would be config.secondary_worker_config.num_instances, and the PATCH request body would be set as follows:

    {
      "config":{
        "secondaryWorkerConfig":{
          "numInstances":"5"
        }
      }
    }
    

    Note: Currently, only the following fields can be updated:

    Mask Purpose
    labels Update labels
    config.worker_config.num_instances Resize primary worker group
    config.secondary_worker_config.num_instances Resize secondary worker group

    If a dict is provided, it must be of the same form as the protobuf message FieldMask

  • graceful_decommission_timeout (Union[dict, Duration]) –

    Optional. Timeout for graceful YARN decomissioning. Graceful decommissioning allows removing nodes from the cluster without interrupting jobs in progress. Timeout specifies how long to wait for jobs in progress to finish before forcefully removing nodes (and potentially interrupting jobs). Default timeout is 0 (for forceful decommission), and the maximum allowed timeout is 1 day.

    Only supported on Dataproc image versions 1.2 and higher.

    If a dict is provided, it must be of the same form as the protobuf message Duration

  • request_id (str) –

    Optional. A unique id used to identify the request. If the server receives two UpdateClusterRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.

    It is recommended to always set this value to a UUID.

    The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
class google.cloud.dataproc_v1.JobControllerClient(transport=None, channel=None, credentials=None, client_config=None, client_info=None, client_options=None)[source]#

The JobController provides methods to manage jobs.

Constructor.

Parameters
  • (Union[JobControllerGrpcTransport, (transport) – Callable[[~.Credentials, type], ~.JobControllerGrpcTransport]): A transport instance, responsible for actually making the API calls. The default transport uses the gRPC protocol. This argument may also be a callable which returns a transport instance. Callables will be sent the credentials as the first argument and the default transport class as the second argument.

  • channel (grpc.Channel) – DEPRECATED. A Channel instance through which to make calls. This argument is mutually exclusive with credentials; providing both will raise an exception.

  • credentials (google.auth.credentials.Credentials) – The authorization credentials to attach to requests. These credentials identify this application to the service. If none are specified, the client will attempt to ascertain the credentials from the environment. This argument is mutually exclusive with providing a transport instance to transport; doing so will raise an exception.

  • client_config (dict) – DEPRECATED. A dictionary of call options for each method. If not specified, the default configuration is used.

  • client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own client library.

  • client_options (Union[dict, google.api_core.client_options.ClientOptions]) – Client options used to set user options on the client. API Endpoint should be set through client_options.

cancel_job(project_id, region, job_id, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Starts a job cancellation request. To access the job resource after cancellation, call regions/{region}/jobs.list or regions/{region}/jobs.get.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `job_id`:
>>> job_id = ''
>>>
>>> response = client.cancel_job(project_id, region, job_id)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • job_id (str) – Required. The job ID.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A Job instance.

Raises
delete_job(project_id, region, job_id, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Deletes the job from the project. If the job is active, the delete fails, and the response returns FAILED_PRECONDITION.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `job_id`:
>>> job_id = ''
>>>
>>> client.delete_job(project_id, region, job_id)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • job_id (str) – Required. The job ID.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Raises
enums = <module 'google.cloud.dataproc_v1.gapic.enums' from '/var/code/gcp/dataproc/google/cloud/dataproc_v1/gapic/enums.py'>#
classmethod from_service_account_file(filename, *args, **kwargs)[source]#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

JobControllerClient

classmethod from_service_account_json(filename, *args, **kwargs)#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

JobControllerClient

get_job(project_id, region, job_id, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Gets the resource representation for a job in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `job_id`:
>>> job_id = ''
>>>
>>> response = client.get_job(project_id, region, job_id)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • job_id (str) – Required. The job ID.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A Job instance.

Raises
list_jobs(project_id, region, page_size=None, cluster_name=None, job_state_matcher=None, filter_=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Lists regions/{region}/jobs in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # Iterate over all results
>>> for element in client.list_jobs(project_id, region):
...     # process element
...     pass
>>>
>>>
>>> # Alternatively:
>>>
>>> # Iterate over results one page at a time
>>> for page in client.list_jobs(project_id, region).pages:
...     for element in page:
...         # process element
...         pass
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • page_size (int) – The maximum number of resources contained in the underlying API response. If page streaming is performed per- resource, this parameter does not affect the return value. If page streaming is performed per-page, this determines the maximum number of resources in a page.

  • cluster_name (str) – Optional. If set, the returned jobs list includes only jobs that were submitted to the named cluster.

  • job_state_matcher (JobStateMatcher) –

    Optional. Specifies enumerated categories of jobs to list. (default = match ALL jobs).

    If filter is provided, jobStateMatcher will be ignored.

  • filter_ (str) –

    Optional. A filter constraining the jobs to list. Filters are case-sensitive and have the following syntax:

    [field = value] AND [field [= value]] …

    where field is status.state or labels.[KEY], and [KEY] is a label key. value can be * to match all values. status.state can be either ACTIVE or NON_ACTIVE. Only the logical AND operator is supported; space-separated items are treated as having an implicit AND operator.

    Example filter:

    status.state = ACTIVE AND labels.env = staging AND labels.starred = *

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A PageIterator instance. An iterable of Job instances. You can also iterate over the pages of the response using its pages property.

Raises
submit_job(project_id, region, job, request_id=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Submits a job to a cluster.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `job`:
>>> job = {}
>>>
>>> response = client.submit_job(project_id, region, job)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • job (Union[dict, Job]) –

    Required. The job resource.

    If a dict is provided, it must be of the same form as the protobuf message Job

  • request_id (str) –

    Optional. A unique id used to identify the request. If the server receives two SubmitJobRequest requests with the same id, then the second request will be ignored and the first Job created and stored in the backend is returned.

    It is recommended to always set this value to a UUID.

    The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A Job instance.

Raises
update_job(project_id, region, job_id, job, update_mask, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Updates a job in a project.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.JobControllerClient()
>>>
>>> # TODO: Initialize `project_id`:
>>> project_id = ''
>>>
>>> # TODO: Initialize `region`:
>>> region = ''
>>>
>>> # TODO: Initialize `job_id`:
>>> job_id = ''
>>>
>>> # TODO: Initialize `job`:
>>> job = {}
>>>
>>> # TODO: Initialize `update_mask`:
>>> update_mask = {}
>>>
>>> response = client.update_job(project_id, region, job_id, job, update_mask)
Parameters
  • project_id (str) – Required. The ID of the Google Cloud Platform project that the job belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • job_id (str) – Required. The job ID.

  • job (Union[dict, Job]) –

    Required. The changes to the job.

    If a dict is provided, it must be of the same form as the protobuf message Job

  • update_mask (Union[dict, FieldMask]) –

    Required. Specifies the path, relative to Job, of the field to update. For example, to update the labels of a Job the update_mask parameter would be specified as labels, and the PATCH request body would specify the new value. Note: Currently, labels is the only field that can be updated.

    If a dict is provided, it must be of the same form as the protobuf message FieldMask

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A Job instance.

Raises
class google.cloud.dataproc_v1.WorkflowTemplateServiceClient(transport=None, channel=None, credentials=None, client_config=None, client_info=None, client_options=None)[source]#

The API interface for managing Workflow Templates in the Cloud Dataproc API.

Constructor.

Parameters
  • (Union[WorkflowTemplateServiceGrpcTransport, (transport) – Callable[[~.Credentials, type], ~.WorkflowTemplateServiceGrpcTransport]): A transport instance, responsible for actually making the API calls. The default transport uses the gRPC protocol. This argument may also be a callable which returns a transport instance. Callables will be sent the credentials as the first argument and the default transport class as the second argument.

  • channel (grpc.Channel) – DEPRECATED. A Channel instance through which to make calls. This argument is mutually exclusive with credentials; providing both will raise an exception.

  • credentials (google.auth.credentials.Credentials) – The authorization credentials to attach to requests. These credentials identify this application to the service. If none are specified, the client will attempt to ascertain the credentials from the environment. This argument is mutually exclusive with providing a transport instance to transport; doing so will raise an exception.

  • client_config (dict) – DEPRECATED. A dictionary of call options for each method. If not specified, the default configuration is used.

  • client_info (google.api_core.gapic_v1.client_info.ClientInfo) – The client info used to send a user-agent string along with API requests. If None, then default info will be used. Generally, you only need to set this if you’re developing your own client library.

  • client_options (Union[dict, google.api_core.client_options.ClientOptions]) – Client options used to set user options on the client. API Endpoint should be set through client_options.

create_workflow_template(parent, template, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Creates new workflow template.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> parent = client.region_path('[PROJECT]', '[REGION]')
>>>
>>> # TODO: Initialize `template`:
>>> template = {}
>>>
>>> response = client.create_workflow_template(parent, template)
Parameters
  • parent (str) – Required. The “resource name” of the region, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}

  • template (Union[dict, WorkflowTemplate]) –

    Required. The Dataproc workflow template to create.

    If a dict is provided, it must be of the same form as the protobuf message WorkflowTemplate

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A WorkflowTemplate instance.

Raises
delete_workflow_template(name, version=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Deletes a workflow template. It does not cancel in-progress workflows.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> name = client.workflow_template_path('[PROJECT]', '[REGION]', '[WORKFLOW_TEMPLATE]')
>>>
>>> client.delete_workflow_template(name)
Parameters
  • name (str) – Required. The “resource name” of the workflow template, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • version (int) – Optional. The version of workflow template to delete. If specified, will only delete the template if the current server version matches specified version.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Raises
enums = <module 'google.cloud.dataproc_v1.gapic.enums' from '/var/code/gcp/dataproc/google/cloud/dataproc_v1/gapic/enums.py'>#
classmethod from_service_account_file(filename, *args, **kwargs)[source]#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

WorkflowTemplateServiceClient

classmethod from_service_account_json(filename, *args, **kwargs)#

Creates an instance of this client using the provided credentials file.

Parameters
  • filename (str) – The path to the service account private key json file.

  • args – Additional arguments to pass to the constructor.

  • kwargs – Additional arguments to pass to the constructor.

Returns

The constructed client.

Return type

WorkflowTemplateServiceClient

get_workflow_template(name, version=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Retrieves the latest workflow template.

Can retrieve previously instantiated template by specifying optional version parameter.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> name = client.workflow_template_path('[PROJECT]', '[REGION]', '[WORKFLOW_TEMPLATE]')
>>>
>>> response = client.get_workflow_template(name)
Parameters
  • name (str) – Required. The “resource name” of the workflow template, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • version (int) –

    Optional. The version of workflow template to retrieve. Only previously instatiated versions can be retrieved.

    If unspecified, retrieves the current version.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A WorkflowTemplate instance.

Raises
instantiate_inline_workflow_template(parent, template, request_id=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Instantiates a template and begins execution.

This method is equivalent to executing the sequence CreateWorkflowTemplate, InstantiateWorkflowTemplate, DeleteWorkflowTemplate.

The returned Operation can be used to track execution of workflow by polling operations.get. The Operation will complete when entire workflow is finished.

The running workflow can be aborted via operations.cancel. This will cause any inflight jobs to be cancelled and workflow-owned clusters to be deleted.

The Operation.metadata will be WorkflowMetadata.

On successful completion, Operation.response will be Empty.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> parent = client.region_path('[PROJECT]', '[REGION]')
>>>
>>> # TODO: Initialize `template`:
>>> template = {}
>>>
>>> response = client.instantiate_inline_workflow_template(parent, template)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • parent (str) – Required. The “resource name” of the workflow template region, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}

  • template (Union[dict, WorkflowTemplate]) –

    Required. The workflow template to instantiate.

    If a dict is provided, it must be of the same form as the protobuf message WorkflowTemplate

  • request_id (str) –

    Optional. A tag that prevents multiple concurrent workflow instances with the same tag from running. This mitigates risk of concurrent instances started due to retries.

    It is recommended to always set this value to a UUID.

    The tag must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
instantiate_workflow_template(name, version=None, request_id=None, parameters=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Instantiates a template and begins execution.

The returned Operation can be used to track execution of workflow by polling operations.get. The Operation will complete when entire workflow is finished.

The running workflow can be aborted via operations.cancel. This will cause any inflight jobs to be cancelled and workflow-owned clusters to be deleted.

The Operation.metadata will be WorkflowMetadata.

On successful completion, Operation.response will be Empty.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> name = client.workflow_template_path('[PROJECT]', '[REGION]', '[WORKFLOW_TEMPLATE]')
>>>
>>> response = client.instantiate_workflow_template(name)
>>>
>>> def callback(operation_future):
...     # Handle result.
...     result = operation_future.result()
>>>
>>> response.add_done_callback(callback)
>>>
>>> # Handle metadata.
>>> metadata = response.metadata()
Parameters
  • name (str) – Required. The “resource name” of the workflow template, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • version (int) –

    Optional. The version of workflow template to instantiate. If specified, the workflow will be instantiated only if the current version of the workflow template has the supplied version.

    This option cannot be used to instantiate a previous version of workflow template.

  • request_id (str) –

    Optional. A tag that prevents multiple concurrent workflow instances with the same tag from running. This mitigates risk of concurrent instances started due to retries.

    It is recommended to always set this value to a UUID.

    The tag must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

  • parameters (dict[str -> str]) – Optional. Map from parameter names to values that should be used for those parameters. Values may not exceed 100 characters.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A _OperationFuture instance.

Raises
list_workflow_templates(parent, page_size=None, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Lists workflows that match the specified filter in the request.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> parent = client.region_path('[PROJECT]', '[REGION]')
>>>
>>> # Iterate over all results
>>> for element in client.list_workflow_templates(parent):
...     # process element
...     pass
>>>
>>>
>>> # Alternatively:
>>>
>>> # Iterate over results one page at a time
>>> for page in client.list_workflow_templates(parent).pages:
...     for element in page:
...         # process element
...         pass
Parameters
  • parent (str) – Required. The “resource name” of the region, as described in https://cloud.google.com/apis/design/resource_names of the form projects/{project_id}/regions/{region}

  • page_size (int) – The maximum number of resources contained in the underlying API response. If page streaming is performed per- resource, this parameter does not affect the return value. If page streaming is performed per-page, this determines the maximum number of resources in a page.

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A PageIterator instance. An iterable of WorkflowTemplate instances. You can also iterate over the pages of the response using its pages property.

Raises
classmethod region_path(project, region)[source]#

Return a fully-qualified region string.

update_workflow_template(template, retry=<object object>, timeout=<object object>, metadata=None)[source]#

Updates (replaces) workflow template. The updated template must contain version that matches the current server version.

Example

>>> from google.cloud import dataproc_v1
>>>
>>> client = dataproc_v1.WorkflowTemplateServiceClient()
>>>
>>> # TODO: Initialize `template`:
>>> template = {}
>>>
>>> response = client.update_workflow_template(template)
Parameters
  • template (Union[dict, WorkflowTemplate]) –

    Required. The updated workflow template.

    The template.version field must match the current version.

    If a dict is provided, it must be of the same form as the protobuf message WorkflowTemplate

  • retry (Optional[google.api_core.retry.Retry]) – A retry object used to retry requests. If None is specified, requests will be retried using a default configuration.

  • timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.

Returns

A WorkflowTemplate instance.

Raises
classmethod workflow_template_path(project, region, workflow_template)[source]#

Return a fully-qualified workflow_template string.