gigl.distributed.graph_store.remote_dataset#

Utils for operating on a dataset remotely.

These are intended to be used in the context of a server-client architecture, and with graphlearn_torch.distributed.request_server.

register_dataset must be called once per process in the server.

And then the client can do something like:

>>> edge_feature_info = graphlearn_torch.distributed.request_server(
>>>    server_rank,
>>>    gigl.distributed.graph_store.remote_dataset.get_edge_feature_info,
>>> )

NOTE: Ideally these would be exposed via DistServer [1] so we could call them directly. TOOD(kmonte): If we ever fork GLT, we should look into expanding DistServer instead.

[1]: alibaba/graphlearn-for-pytorch

Attributes#

Functions#

get_edge_feature_info()

Get edge feature information from the registered dataset.

get_node_feature_info()

Get node feature information from the registered dataset.

get_node_ids_for_rank(rank, world_size[, node_type])

Get the node IDs assigned to a specific rank in distributed processing.

register_dataset(dataset)

Register a dataset for remote access.

Module Contents#

gigl.distributed.graph_store.remote_dataset.get_edge_feature_info()[source]#

Get edge feature information from the registered dataset.

Returns:

  • A single FeatureInfo object for homogeneous graphs

  • A dict mapping EdgeType to FeatureInfo for heterogeneous graphs

  • None if no edge features are available

Return type:

Edge feature information, which can be

Raises:

ValueError – If no dataset has been registered.

gigl.distributed.graph_store.remote_dataset.get_node_feature_info()[source]#

Get node feature information from the registered dataset.

Returns:

  • A single FeatureInfo object for homogeneous graphs

  • A dict mapping NodeType to FeatureInfo for heterogeneous graphs

  • None if no node features are available

Return type:

Node feature information, which can be

Raises:

ValueError – If no dataset has been registered.

gigl.distributed.graph_store.remote_dataset.get_node_ids_for_rank(rank, world_size, node_type=DEFAULT_HOMOGENEOUS_NODE_TYPE)[source]#

Get the node IDs assigned to a specific rank in distributed processing.

Shards the node IDs across processes based on the rank and world size.

Parameters:
  • rank (int) – The rank of the process requesting node IDs.

  • world_size (int) – The total number of processes in the distributed setup.

  • node_type (Optional[gigl.src.common.types.graph_data.NodeType]) – The type of nodes to retrieve. Defaults to the default homogeneous node type.

Returns:

A tensor containing the node IDs assigned to the specified rank.

Raises:

ValueError – If no dataset has been registered or if node_ids format is invalid.

Return type:

torch.Tensor

gigl.distributed.graph_store.remote_dataset.register_dataset(dataset)[source]#

Register a dataset for remote access.

This function must be called once per process in the server before any remote dataset operations can be performed.

Parameters:

dataset (gigl.distributed.dist_dataset.DistDataset) – The distributed dataset to register.

Raises:

ValueError – If a dataset has already been registered.

Return type:

None

gigl.distributed.graph_store.remote_dataset.logger[source]#