gigl.distributed.graph_store.remote_dataset#
Utils for operating on a dataset remotely.
These are intended to be used in the context of a server-client architecture, and with graphlearn_torch.distributed.request_server.
register_dataset must be called once per process in the server.
And then the client can do something like:
>>> edge_feature_info = graphlearn_torch.distributed.request_server(
>>> server_rank,
>>> gigl.distributed.graph_store.remote_dataset.get_edge_feature_info,
>>> )
NOTE: Ideally these would be exposed via DistServer [1] so we could call them directly. TOOD(kmonte): If we ever fork GLT, we should look into expanding DistServer instead.
[1]: alibaba/graphlearn-for-pytorch
Attributes#
Functions#
Get edge feature information from the registered dataset. |
|
Get node feature information from the registered dataset. |
|
|
Get the node IDs assigned to a specific rank in distributed processing. |
|
Register a dataset for remote access. |
Module Contents#
- gigl.distributed.graph_store.remote_dataset.get_edge_feature_info()[source]#
Get edge feature information from the registered dataset.
- Returns:
A single FeatureInfo object for homogeneous graphs
A dict mapping EdgeType to FeatureInfo for heterogeneous graphs
None if no edge features are available
- Return type:
Edge feature information, which can be
- Raises:
ValueError – If no dataset has been registered.
- gigl.distributed.graph_store.remote_dataset.get_node_feature_info()[source]#
Get node feature information from the registered dataset.
- Returns:
A single FeatureInfo object for homogeneous graphs
A dict mapping NodeType to FeatureInfo for heterogeneous graphs
None if no node features are available
- Return type:
Node feature information, which can be
- Raises:
ValueError – If no dataset has been registered.
- gigl.distributed.graph_store.remote_dataset.get_node_ids_for_rank(rank, world_size, node_type=DEFAULT_HOMOGENEOUS_NODE_TYPE)[source]#
Get the node IDs assigned to a specific rank in distributed processing.
Shards the node IDs across processes based on the rank and world size.
- Parameters:
rank (int) – The rank of the process requesting node IDs.
world_size (int) – The total number of processes in the distributed setup.
node_type (Optional[gigl.src.common.types.graph_data.NodeType]) – The type of nodes to retrieve. Defaults to the default homogeneous node type.
- Returns:
A tensor containing the node IDs assigned to the specified rank.
- Raises:
ValueError – If no dataset has been registered or if node_ids format is invalid.
- Return type:
torch.Tensor
- gigl.distributed.graph_store.remote_dataset.register_dataset(dataset)[source]#
Register a dataset for remote access.
This function must be called once per process in the server before any remote dataset operations can be performed.
- Parameters:
dataset (gigl.distributed.dist_dataset.DistDataset) – The distributed dataset to register.
- Raises:
ValueError – If a dataset has already been registered.
- Return type:
None