gigl.common.utils.vertex_ai_context#
Utility functions to be used by machines running on Vertex AI.
Attributes#
Functions#
Used to connect the worker pool. This function should be called by all workers |
|
Get the current machines hostname. |
|
Hostname of the machine that will host the process with rank 0. It is used |
|
A free port on the machine that will host the process with rank 0. |
|
|
Rank of the current VAI process, so they will know whether it is the master or a worker. |
Get the Vertex AI job ID. |
|
The total number of processes that VAI creates. Note that VAI only creates one process per machine. |
|
Check if the code is running in a Vertex AI job. |
Module Contents#
- gigl.common.utils.vertex_ai_context.connect_worker_pool()[source]#
Used to connect the worker pool. This function should be called by all workers to get the leader worker’s internal IP address and to ensure that the workers can all communicate with the leader worker.
- Return type:
- gigl.common.utils.vertex_ai_context.get_host_name()[source]#
Get the current machines hostname. Throws if not on Vertex AI.
- Return type:
str
- gigl.common.utils.vertex_ai_context.get_leader_hostname()[source]#
Hostname of the machine that will host the process with rank 0. It is used to synchronize the workers.
VAI does not automatically set this for single-replica jobs, hence the default value of “localhost”. Throws if not on Vertex AI.
- Return type:
str
- gigl.common.utils.vertex_ai_context.get_leader_port()[source]#
A free port on the machine that will host the process with rank 0.
VAI does not automatically set this for single-replica jobs, hence the default value of 29500. This is a PyTorch convention: pytorch/pytorch Throws if not on Vertex AI.
- Return type:
int
- gigl.common.utils.vertex_ai_context.get_rank()[source]#
Rank of the current VAI process, so they will know whether it is the master or a worker. Note: that VAI only creates one process per machine. It is the user’s responsibility to create multiple processes per machine. Meaning, this function will only return one integer for the main process that VAI creates.
VAI does not automatically set this for single-replica jobs, hence the default value of 0. Throws if not on Vertex AI.
- Return type:
int
- gigl.common.utils.vertex_ai_context.get_vertex_ai_job_id()[source]#
Get the Vertex AI job ID. Throws if not on Vertex AI.
- Return type:
str
- gigl.common.utils.vertex_ai_context.get_world_size()[source]#
The total number of processes that VAI creates. Note that VAI only creates one process per machine. It is the user’s responsibility to create multiple processes per machine.
VAI does not automatically set this for single-replica jobs, hence the default value of 1. Throws if not on Vertex AI.
- Return type:
int