google.resumable_media.requests

requests utilities for Google Media Downloads and Resumable Uploads.

This sub-package assumes callers will use the requests library as transport and google-auth for sending authenticated HTTP traffic with requests.

Authorized Transport

To use google-auth and requests to create an authorized transport that has read-only access to Google Cloud Storage (GCS):

>>> import google.auth
>>> import google.auth.transport.requests as tr_requests
>>>
>>> ro_scope = u'https://www.googleapis.com/auth/devstorage.read_only'
>>> credentials, _ = google.auth.default(scopes=(ro_scope,))
>>> transport = tr_requests.AuthorizedSession(credentials)
>>> transport
<google.auth.transport.requests.AuthorizedSession object at 0x...>

Simple Downloads

To download an object from Google Cloud Storage, construct the media URL for the GCS object and download it with an authorized transport that has access to the resource:

>>> from google.resumable_media.requests import Download
>>>
>>> url_template = (
...     u'https://www.googleapis.com/download/storage/v1/b/'
...     u'{bucket}/o/{blob_name}?alt=media')
>>> media_url = url_template.format(
...     bucket=bucket, blob_name=blob_name)
>>>
>>> download = Download(media_url)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [200]>
>>> response.headers[u'Content-Length']
'1364156'
>>> len(response.content)
1364156

To download only a portion of the bytes in the object, specify start and end byte positions (both optional):

>>> download = Download(media_url, start=4096, end=8191)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'4096'
>>> response.headers[u'Content-Range']
'bytes 4096-8191/1364156'
>>> len(response.content)
4096

Chunked Downloads

For very large objects or objects of unknown size, it may make more sense to download the object in chunks rather than all at once. This can be done to avoid dropped connections with a poor internet connection or can allow multiple chunks to be downloaded in parallel to speed up the total download.

A ChunkedDownload uses the same media URL and authorized transport that a basic Download would use, but also requires a chunk size and a write-able byte stream. The chunk size is used to determine how much of the resouce to consume with each request and the stream is to allow the resource to be written out (e.g. to disk) without having to fit in memory all at once.

>>> from google.resumable_media.requests import ChunkedDownload
>>>
>>> chunk_size = 50 * 1024 * 1024  # 50MB
>>> stream = io.BytesIO()
>>> download = ChunkedDownload(
...     media_url, chunk_size, stream)
>>> # Check the state of the download before starting.
>>> download.bytes_downloaded
0
>>> download.total_bytes is None
True
>>> response = download.consume_next_chunk(transport)
>>> # Check the state of the download after consuming one chunk.
>>> download.finished
False
>>> download.bytes_downloaded  # chunk_size
52428800
>>> download.total_bytes  # 1GB
1073741824
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'52428800'
>>> response.headers[u'Content-Range']
'bytes 0-52428799/1073741824'
>>> len(response.content) == chunk_size
True
>>> stream.seek(0)
0
>>> stream.read(29)
b'The beginning of the chunk...'

The download will change it’s finished status to True once the final chunk is consumed. In some cases, the final chunk may not be the same size as the other chunks:

>>> # The state of the download in progress.
>>> download.finished
False
>>> download.bytes_downloaded  # 20 chunks at 50MB
1048576000
>>> download.total_bytes  # 1GB
1073741824
>>> response = download.consume_next_chunk(transport)
>>> # The state of the download after consuming the final chunk.
>>> download.finished
True
>>> download.bytes_downloaded == download.total_bytes
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'25165824'
>>> response.headers[u'Content-Range']
'bytes 1048576000-1073741823/1073741824'
>>> len(response.content) < download.chunk_size
True

In addition, a ChunkedDownload can also take optional start and end byte positions.

Simple Uploads

Among the three supported upload classes, the simplest is SimpleUpload. A simple upload should be used when the resource being uploaded is small and when there is no metadata (other than the name) associated with the resource.

>>> from google.resumable_media.requests import SimpleUpload
>>>
>>> url_template = (
...     u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
...     u'uploadType=media&'
...     u'name={blob_name}')
>>> upload_url = url_template.format(
...     bucket=bucket, blob_name=blob_name)
>>>
>>> upload = SimpleUpload(upload_url)
>>> data = b'Some not too large content.'
>>> content_type = u'text/plain'
>>> response = upload.transmit(transport, data, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'contentType'] == content_type
True
>>> json_response[u'md5Hash']
'M0XLEsX9/sMdiI+4pB4CAQ=='
>>> int(json_response[u'size']) == len(data)
True

In the rare case that an upload fails, an InvalidResponse will be raised:

>>> upload = SimpleUpload(upload_url)
>>> error = None
>>> try:
...     upload.transmit(transport, data, content_type)
... except resumable_media.InvalidResponse as caught_exc:
...     error = caught_exc
...
>>> error
InvalidResponse('Request failed with status code', 503,
                'Expected one of', <HTTPStatus.OK: 200>)
>>> error.response
<Response [503]>
>>>
>>> upload.finished
True

Even in the case of failure, we see that the upload is finished, i.e. it cannot be re-used.

Multipart Uploads

After the simple upload, the MultipartUpload can be used to achieve essentially the same task. However, a multipart upload allows some metadata about the resource to be sent along as well. (This is the “multi”: we send a first part with the metadata and a second part with the actual bytes in the resource.)

Usage is similar to the simple upload, but transmit() accepts an extra required argument: metadata.

>>> from google.resumable_media.requests import MultipartUpload
>>>
>>> url_template = (
...     u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
...     u'uploadType=multipart')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> upload = MultipartUpload(upload_url)
>>> metadata = {
...     u'name': blob_name,
...     u'metadata': {
...         u'color': u'grurple',
...     },
... }
>>> response = upload.transmit(transport, data, metadata, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'metadata'] == metadata[u'metadata']
True

As with the simple upload, in the case of failure an InvalidResponse is raised, enclosing the response that caused the failure and the upload object cannot be re-used after a failure.

Resumable Uploads

A ResumableUpload deviates from the other two upload classes: it transmits a resource over the course of multiple requests. This is intended to be used in cases where:

  • the size of the resource is not known (i.e. it is generated on the fly)
  • requests must be short-lived
  • the client has request size limitations
  • the resource is too large to fit into memory

In general, a resource should be sent in a single request to avoid latency and reduce QPS. See GCS best practices for more things to consider when using a resumable upload.

After creating a ResumableUpload instance, a resumable upload session must be initiated to let the server know that a series of chunked upload requests will be coming and to obtain an upload_id for the session. In contrast to the other two upload classes, initiate() takes a byte stream as input rather than raw bytes as data. This can be a file object, a BytesIO object or any other stream implementing the same interface.

>>> from google.resumable_media.requests import ResumableUpload
>>>
>>> url_template = (
...     u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
...     u'uploadType=resumable')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> chunk_size = 1024 * 1024  # 1MB
>>> upload = ResumableUpload(upload_url, chunk_size)
>>> stream = io.BytesIO(data)
>>> # The upload doesn't know how "big" it is until seeing a stream.
>>> upload.total_bytes is None
True
>>> metadata = {u'name': blob_name}
>>> response = upload.initiate(transport, stream, metadata, content_type)
>>> response
<Response [200]>
>>> upload.resumable_url == response.headers[u'Location']
True
>>> upload.total_bytes == len(data)
True
>>> upload_id = response.headers[u'X-GUploader-UploadID']
>>> upload_id
'ABCdef189XY_super_serious'
>>> upload.resumable_url == upload_url + u'&upload_id=' + upload_id
True

Once a ResumableUpload has been initiated, the resource is transmitted in chunks until completion:

>>> response0 = upload.transmit_next_chunk(transport)
>>> response0
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == upload.chunk_size
True
>>>
>>> response1 = upload.transmit_next_chunk(transport)
>>> response1
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == 2 * upload.chunk_size
True
>>>
>>> response2 = upload.transmit_next_chunk(transport)
>>> response2
<Response [200]>
>>> upload.finished
True
>>> upload.bytes_uploaded == upload.total_bytes
True
>>> json_response = response2.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True