gigl.distributed.graph_store.remote_dist_dataset#

Attributes#

Classes#

RemoteDistDataset

Represents a dataset that is stored on a difference storage cluster.

Module Contents#

class gigl.distributed.graph_store.remote_dist_dataset.RemoteDistDataset(cluster_info, local_rank)[source]#

Represents a dataset that is stored on a difference storage cluster. Must be used in the GiGL graph-store distributed setup.

This class must be used on the compute (client) side of the graph-store distributed setup.

Parameters:
  • cluster_info (GraphStoreInfo) – The cluster information.

  • local_rank (int) – The local rank of the process on the compute node.

get_edge_dir()[source]#

Get the edge direction from the registered dataset.

Returns:

The edge direction.

Return type:

Union[str, Literal[‘in’, ‘out’]]

get_edge_feature_info()[source]#

Get edge feature information from the registered dataset.

Returns:

  • A single FeatureInfo object for homogeneous graphs

  • A dict mapping EdgeType to FeatureInfo for heterogeneous graphs

  • None if no edge features are available

Return type:

Edge feature information, which can be

get_free_ports_on_storage_cluster(num_ports)[source]#

Get free ports from the storage master node.

This must be used with a torch.distributed process group initialized, for the entire training cluster.

All compute ranks will receive the same free ports.

Parameters:

num_ports (int) – Number of free ports to get.

Return type:

list[int]

get_node_feature_info()[source]#

Get node feature information from the registered dataset.

Returns:

  • A single FeatureInfo object for homogeneous graphs

  • A dict mapping NodeType to FeatureInfo for heterogeneous graphs

  • None if no node features are available

Return type:

Node feature information, which can be

get_node_ids(node_type=None)[source]#

Fetches node ids from the storage nodes for the current compute node (machine).

The returned list are the node ids for the current compute node, by storage rank.

For example, if there are two storage ranks, and two compute ranks, and 16 total nodes, In this scenario, the node ids are sharded as follows: Storage rank 0: [0, 1, 2, 3, 4, 5, 6, 7] Storage rank 1: [8, 9, 10, 11, 12, 13, 14, 15]

NOTE: The GLT sampling enginer expects that all processes on a given compute machine to have the same sampling input (node ids). As such, the input tensors will be duplicated across all processes on a given compute machine. TODO(kmonte): Come up with a solution to avoid this duplication.

Then, for compute rank 0 (node 0, process 0), the returned list will be:
[

[0, 1, 3, 4], # From storage rank 0 [8, 9, 10, 11] # From storage rank 1

]

Parameters:
  • node_type (Optional[NodeType]) – The type of nodes to get.

  • datasets. (Must be provided for heterogeneous)

Returns:

A list of node IDs for the given node type.

Return type:

list[torch.Tensor]

property cluster_info: gigl.env.distributed.GraphStoreInfo[source]#
Return type:

gigl.env.distributed.GraphStoreInfo

gigl.distributed.graph_store.remote_dist_dataset.logger[source]#