gigl.distributed.utils.networking#

Attributes#

Functions#

get_free_port()

Get a free port number.

get_free_ports(num_ports)

Get a list of free port numbers.

get_free_ports_from_master_node([num_ports, ...])

Get free ports from master node, that can be used for communication between workers.

get_internal_ip_from_all_ranks()

Get the internal IP addresses of all ranks in a distributed setup. Internal IPs are usually not accessible

get_internal_ip_from_master_node([_global_rank_override])

Get the internal IP address of the master node in a distributed setup.

Module Contents#

gigl.distributed.utils.networking.get_free_port()[source]#

Get a free port number. Note: If you call get_free_port multiple times, it can return the same port number if the port is still free. If you want multiple free ports before you init/use them, leverage get_free_ports instead. :returns: A free port number on the current machine. :rtype: int

Return type:

int

gigl.distributed.utils.networking.get_free_ports(num_ports)[source]#

Get a list of free port numbers. Note: If you call get_free_ports multiple times, it can return the same port number if the port is still free. :param num_ports: Number of free ports to find. :type num_ports: int

Returns:

A list of free port numbers on the current machine.

Return type:

List[int]

Parameters:

num_ports (int)

gigl.distributed.utils.networking.get_free_ports_from_master_node(num_ports=1, _global_rank_override=None)[source]#

Get free ports from master node, that can be used for communication between workers. :param num_ports: Number of free ports to find. :type num_ports: int :param _global_rank_override: Override for the global rank,

useful for testing or if global rank is not accurately available.

Returns:

A list of free port numbers on the master node.

Return type:

List[int]

Parameters:

_global_rank_override (Optional[int])

gigl.distributed.utils.networking.get_internal_ip_from_all_ranks()[source]#

Get the internal IP addresses of all ranks in a distributed setup. Internal IPs are usually not accessible from the web. i.e. the machines will have to be on the same network or VPN to get the right address so each rank can communicate with each other. This is useful for setting up RPC communication between ranks where the default torch.distributed env:// setup is not enough. Or, if you are trying to run validation checks, get local world size for a specific node, etc.

Returns:

A list of internal IP addresses of all ranks.

Return type:

List[str]

gigl.distributed.utils.networking.get_internal_ip_from_master_node(_global_rank_override=None)[source]#

Get the internal IP address of the master node in a distributed setup. This is useful for setting up RPC communication between workers where the default torch.distributed env:// setup is not enough.

i.e. when using gigl.distributed.dataset_factory

Returns:

The internal IP address of the master node.

Return type:

str

Parameters:

_global_rank_override (Optional[int])

gigl.distributed.utils.networking.logger[source]#