gigl.common.utils.gcs#

Attributes#

Classes#

GcsUtils

Utility class for interacting with Google Cloud Storage (GCS).

Module Contents#

class gigl.common.utils.gcs.GcsUtils(project=None)[source]#

Utility class for interacting with Google Cloud Storage (GCS).

Initialize the GcsUtils instance.

Parameters:

project (Optional[str]) – The GCP project ID. Defaults to None.

add_bucket_lifecycle_rule_with_prefix(gcs_path, days_to_expire, should_delete_irrelevant_lifecycle_rules=False)[source]#
Parameters:
Return type:

None

close_upload_delete_and_push_to_gcs(local_file_handle, gcs_file_path)[source]#
Parameters:
Return type:

None

copy_gcs_path(src_gcs_path, dst_gcs_path)[source]#
Parameters:
count_blobs_in_gcs_path(gcs_path, suffix=None)[source]#
Parameters:
Return type:

int

delete_files(gcs_files)[source]#
Parameters:

gcs_files (Iterable[Union[gigl.common.GcsUri, google.cloud.storage.Blob]])

Return type:

None

delete_files_in_bucket_dir(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

None

delete_gcs_file_if_exist(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

None

does_gcs_file_exist(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

bool

download_file_from_gcs(gcs_path, dest_file_path)[source]#
Parameters:
Return type:

None

download_file_from_gcs_to_temp_file(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

tempfile._TemporaryFileWrapper

download_files_from_gcs_paths_to_local_dir(gcs_paths, local_path_dir)[source]#
Parameters:
Return type:

None

download_files_from_gcs_paths_to_local_paths(file_map)[source]#

Downloads files from GCS path to local path. :param file_map: mapping of GCS path -> local path :return:

Parameters:

file_map (Dict[gigl.common.GcsUri, gigl.common.LocalUri])

static get_bucket_and_blob_path_from_gcs_path(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

Tuple[str, str]

list_uris_with_gcs_path_pattern(gcs_path, suffix=None, pattern=None)[source]#

List GCS URIs with a given suffix or pattern.

Ex: gs://bucket-name/dir/file1.txt gs://bucket-name/dir/foo.txt gs://bucket-name/dir/file.json

list_uris_with_gcs_path_pattern(gcs_path=gs://bucket-name/dir, suffix=”.txt”) -> [gs://bucket-name/dir/file1.txt, gs://bucket-name/dir/foo.txt] list_uris_with_gcs_path_pattern(gcs_path=gs://bucket-name/dir, pattern=”file.*”) -> [gs://bucket-name/dir/file1.txt, gs://bucket-name/dir/file.json]

Parameters:
  • gcs_path (GcsUri) – The GCS path to list URIs from.

  • suffix (Optional[str]) – The suffix to filter URIs by. If None (the default), then no filtering on suffix will be done.

  • pattern (Optional[str]) – The regex to filter URIs by. If None (the default), then no filtering on the pattern will be done.

Returns:

A list of GCS URIs that match the given suffix or pattern.

Return type:

List[GcsUri]

read_from_gcs(gcs_path)[source]#
Parameters:

gcs_path (gigl.common.GcsUri)

Return type:

str

upload_files_to_gcs(local_file_path_to_gcs_path_map, parallel=True)[source]#

Upload files from local paths to their subsequent provided GCS paths.

Parameters:
  • local_file_path_to_gcs_path_map (Dict[LocalUri, GcsUri]) – A dictionary mapping local file paths to GCS paths.

  • parallel (bool) – Flag indicating whether to upload files in parallel. Defaults to True.

Return type:

None

upload_from_filelike(gcs_path, filelike, content_type='application/octet-stream')[source]#

Uploads a file-like object to GCS.

A “filelike” object is one that satisfies the typing.IO interface, e.g contains read(), write(), etc. The prototypical example of this is the object returned by open(), but we also use io.BytesIO as an in-memory buffer which also satisfies the typing.IO interface.

Parameters:
  • gcs_path (GcsUri) – The GCS path to upload the file to.

  • filelike (IO[AnyStr]) – The file-like object to upload.

  • content_type (str) – The content type of the file. Defaults to “application/octet-stream”.

Return type:

None

upload_from_string(gcs_path, content)[source]#
Parameters:
Return type:

None

gigl.common.utils.gcs.UPLOAD_RETRY_DEADLINE_S = 7200[source]#
gigl.common.utils.gcs.logger[source]#