gigl.common.utils.vertex_ai_context#

Utility functions to be used by machines running on Vertex AI.

Attributes#

Functions#

connect_worker_pool()

Used to connect the worker pool. This function should be called by all workers

get_host_name()

Get the current machines hostname.

get_leader_hostname()

Hostname of the machine that will host the process with rank 0. It is used

get_leader_port()

A free port on the machine that will host the process with rank 0.

get_rank()

Rank of the current VAI process, so they will know whether it is the master or a worker.

get_vertex_ai_job_id()

Get the Vertex AI job ID.

get_world_size()

The total number of processes that VAI creates. Note that VAI only creates one process per machine.

is_currently_running_in_vertex_ai_job()

Check if the code is running in a Vertex AI job.

Module Contents#

gigl.common.utils.vertex_ai_context.connect_worker_pool()[source]#

Used to connect the worker pool. This function should be called by all workers to get the leader worker’s internal IP address and to ensure that the workers can all communicate with the leader worker.

Return type:

gigl.distributed.DistributedContext

gigl.common.utils.vertex_ai_context.get_host_name()[source]#

Get the current machines hostname. Throws if not on Vertex AI.

Return type:

str

gigl.common.utils.vertex_ai_context.get_leader_hostname()[source]#

Hostname of the machine that will host the process with rank 0. It is used to synchronize the workers.

VAI does not automatically set this for single-replica jobs, hence the default value of “localhost”. Throws if not on Vertex AI.

Return type:

str

gigl.common.utils.vertex_ai_context.get_leader_port()[source]#

A free port on the machine that will host the process with rank 0.

VAI does not automatically set this for single-replica jobs, hence the default value of 29500. This is a PyTorch convention: pytorch/pytorch Throws if not on Vertex AI.

Return type:

int

gigl.common.utils.vertex_ai_context.get_rank()[source]#

Rank of the current VAI process, so they will know whether it is the master or a worker. Note: that VAI only creates one process per machine. It is the user’s responsibility to create multiple processes per machine. Meaning, this function will only return one integer for the main process that VAI creates.

VAI does not automatically set this for single-replica jobs, hence the default value of 0. Throws if not on Vertex AI.

Return type:

int

gigl.common.utils.vertex_ai_context.get_vertex_ai_job_id()[source]#

Get the Vertex AI job ID. Throws if not on Vertex AI.

Return type:

str

gigl.common.utils.vertex_ai_context.get_world_size()[source]#

The total number of processes that VAI creates. Note that VAI only creates one process per machine. It is the user’s responsibility to create multiple processes per machine.

VAI does not automatically set this for single-replica jobs, hence the default value of 1. Throws if not on Vertex AI.

Return type:

int

gigl.common.utils.vertex_ai_context.is_currently_running_in_vertex_ai_job()[source]#

Check if the code is running in a Vertex AI job.

Returns:

True if running in a Vertex AI job, False otherwise.

Return type:

bool

gigl.common.utils.vertex_ai_context.logger[source]#