gigl.distributed.dist_link_prediction_dataset#

Attributes#

logger

Classes#

DistLinkPredictionDataset

This class is inherited from GraphLearn-for-PyTorch's DistDataset class. We override the __init__ functionality to support positive and

Module Contents#

class gigl.distributed.dist_link_prediction_dataset.DistLinkPredictionDataset(rank, world_size, edge_dir, graph_partition=None, node_feature_partition=None, edge_feature_partition=None, node_partition_book=None, edge_partition_book=None, positive_edge_label=None, negative_edge_label=None, node_ids=None, num_train=None, num_val=None, num_test=None)[source]#

Bases: graphlearn_torch.distributed.dist_dataset.DistDataset

This class is inherited from GraphLearn-for-PyTorch’s DistDataset class. We override the __init__ functionality to support positive and negative edges and labels. We also override the share_ipc function to correctly serialize these new fields. We additionally introduce a build function for storing the partitioned inside of this class. We assume data in this class is only in the CPU RAM, and do not support data on GPU memory, thus simplifying the logic and tooling required compared to the base DistDataset class.

Initializes the fields of the DistLinkPredictionDataset class. This function is called upon each serialization of the DistLinkPredictionDataset instance. :param rank: Rank of the current process :type rank: int :param world_size: World size of the current process :type world_size: int :param edge_dir: Edge direction of the provied graph :type edge_dir: Literal[“in”, “out”]

The below arguments are only expected to be provided when re-serializing an instance of the DistLinkPredictionDataset class after build() has been called: graph_partition (Optional[Union[Graph, Dict[EdgeType, Graph]]]): Partitioned Graph Data node_feature_partition (Optional[Union[Feature, Dict[NodeType, Feature]]]): Partitioned Node Feature Data edge_feature_partition (Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]): Partitioned Edge Feature Data node_partition_book (Optional[Union[PartitionBook, Dict[NodeType, PartitionBook]]]): Node Partition Book edge_partition_book (Optional[Union[PartitionBook, Dict[EdgeType, PartitionBook]]]): Edge Partition Book positive_edge_label (Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]): Positive Edge Label Tensor negative_edge_label (Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]): Negative Edge Label Tensor node_ids (Optional[Union[torch.Tensor, Dict[NodeType, torch.Tensor]]]): Node IDs on the current machine num_train: (Optional[Mapping[NodeType, int]]): Number of training nodes on the current machine. Will be a dict if heterogeneous. num_val: (Optional[Mapping[NodeType, int]]): Number of validation nodes on the current machine. Will be a dict if heterogeneous. num_test: (Optional[Mapping[NodeType, int]]): Number of test nodes on the current machine. Will be a dict if heterogeneous.

Parameters:

rank (int)
world_size (int)
edge_dir (Literal['in', 'out'])
graph_partition (Optional[Union[graphlearn_torch.data.Graph, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Graph]]])
node_feature_partition (Optional[Union[graphlearn_torch.data.Feature, Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.data.Feature]]])
edge_feature_partition (Optional[Union[graphlearn_torch.data.Feature, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Feature]]])
node_partition_book (Optional[Union[graphlearn_torch.partition.PartitionBook, Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.partition.PartitionBook]]])
edge_partition_book (Optional[Union[graphlearn_torch.partition.PartitionBook, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.partition.PartitionBook]]])
positive_edge_label (Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor]]])
negative_edge_label (Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor]]])
node_ids (Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.NodeType, torch.Tensor]]])
num_train (Optional[Union[int, Dict[gigl.src.common.types.graph_data.NodeType, int]]])
num_val (Optional[Union[int, Dict[gigl.src.common.types.graph_data.NodeType, int]]])
num_test (Optional[Union[int, Dict[gigl.src.common.types.graph_data.NodeType, int]]])

build(partition_output, splitter=None)[source]#

Provided some partition graph information, this method stores these tensors inside of the class for subsequent live subgraph sampling using a GraphLearn-for-PyTorch NeighborLoader.

Note that this method will clear the following fields from the provided partition_output:

partitioned_edge_index
partitioned_node_features
partitioned_edge_features

We do this to decrease the peak memory usage during the build process by removing these intermediate assets.

Parameters:

partition_output (PartitionOutput) – Partitioned Graph to be stored in the DistLinkPredictionDataset class
splitter (Optional[NodeAnchorLinkSplitter]) –
A function that takes in an edge index and returns:
- a tuple of train, val, and test node ids, if heterogeneous
- a dict[NodeType, tuple[train, val, test]] of node ids, if homogeneous
Optional as not all datasets need to be split on, e.g. if we’re doing inference.

Return type:

None

abstract load(*args, **kwargs)[source]#

Load a certain dataset partition from partitioned files and create in-memory objects (Graph, Feature or torch.Tensor).

Parameters:

root_dir (str) – The directory path to load the graph and feature partition data.
partition_idx (int) – Partition idx to load.
graph_mode (str) – Mode for creating graphlearn_torch’s Graph, including CPU, ZERO_COPY or CUDA. (default: ZERO_COPY)
input_layout (str) – layout of the input graph, including CSR, CSC or COO. (default: COO)
feature_with_gpu (bool) – A Boolean value indicating whether the created Feature objects of node/edge features use UnifiedTensor. If True, it means Feature consists of UnifiedTensor, otherwise Feature is a PyTorch CPU Tensor, the device_group_list and device will be invliad. (default: True)
graph_caching (bool) – A Boolean value indicating whether to load the full graph totoploy instead of partitioned one.
device_group_list (List[DeviceGroup], optional) – A list of device groups used for feature lookups, the GPU part of feature data will be replicated on each device group in this list during the initialization. GPUs with peer-to-peer access to each other should be set in the same device group properly. (default: None)
whole_node_label_file (str) – The path to the whole node labels which are not partitioned. (default: None)
device – The target cuda device rank used for graph operations when graph mode is not “CPU” and feature lookups when the GPU part is not None. (default: None)

share_ipc()[source]#

Serializes the member variables of the DistLinkPredictionDatasetClass :returns: Rank on current machine

int: World size across all machines Literal[“in”, “out”]: Graph Edge Direction Optional[Union[Graph, Dict[EdgeType, Graph]]]: Partitioned Graph Data Optional[Union[Feature, Dict[NodeType, Feature]]]: Partitioned Node Feature Data Optional[Union[Feature, Dict[EdgeType, Feature]]]: Partitioned Edge Feature Data Optional[Union[torch.Tensor, Dict[NodeType, torch.Tensor]]]: Node Partition Book Tensor Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]: Edge Partition Book Tensor Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]: Positive Edge Label Tensor Optional[Union[torch.Tensor, Dict[EdgeType, torch.Tensor]]]: Negative Edge Label Tensor Optional[Union[int, Dict[NodeType, int]]]: Number of training nodes on the current machine. Will be a dict if heterogeneous. Optional[Union[int, Dict[NodeType, int]]]: Number of validation nodes on the current machine. Will be a dict if heterogeneous. Optional[Union[int, Dict[NodeType, int]]]: Number of test nodes on the current machine. Will be a dict if heterogeneous.

Return type:: int

property edge_dir: Literal['in', 'out'][source]#

Return type:: Literal[‘in’, ‘out’]

property edge_features: graphlearn_torch.data.Feature | Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Feature] | None[source]#

During serializiation, the initialized Feature type does not immediately contain the feature and id2index tensors. These fields are initially set to None, and are only populated when we retrieve the size, retrieve the shape, or index into one of these tensors. This can also be done manually with the feature.lazy_init_with_ipc_handle() function.

Return type:: Optional[Union[graphlearn_torch.data.Feature, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Feature]]]

property edge_pb: graphlearn_torch.partition.PartitionBook | Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.partition.PartitionBook] | None[source]#

Return type:: Optional[Union[graphlearn_torch.partition.PartitionBook, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.partition.PartitionBook]]]

property graph: graphlearn_torch.data.Graph | Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Graph] | None[source]#

Return type:: Optional[Union[graphlearn_torch.data.Graph, Dict[gigl.src.common.types.graph_data.EdgeType, graphlearn_torch.data.Graph]]]

property negative_edge_label: torch.Tensor | Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor]]]

property node_features: graphlearn_torch.data.Feature | Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.data.Feature] | None[source]#

During serializiation, the initialized Feature type does not immediately contain the feature and id2index tensors. These fields are initially set to None, and are only populated when we retrieve the size, retrieve the shape, or index into one of these tensors. This can also be done manually with the feature.lazy_init_with_ipc_handle() function.

Return type:: Optional[Union[graphlearn_torch.data.Feature, Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.data.Feature]]]

property node_ids: torch.Tensor | Dict[gigl.src.common.types.graph_data.NodeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.NodeType, torch.Tensor]]]

property node_pb: graphlearn_torch.partition.PartitionBook | Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.partition.PartitionBook] | None[source]#

Return type:: Optional[Union[graphlearn_torch.partition.PartitionBook, Dict[gigl.src.common.types.graph_data.NodeType, graphlearn_torch.partition.PartitionBook]]]

property num_partitions: int[source]#

Return type:: int

property partition_idx: int[source]#

Return type:: int

property positive_edge_label: torch.Tensor | Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, Dict[gigl.src.common.types.graph_data.EdgeType, torch.Tensor]]]

property test_node_ids: torch.Tensor | collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor]]]

property train_node_ids: torch.Tensor | collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor]]]

property val_node_ids: torch.Tensor | collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor] | None[source]#

Return type:: Optional[Union[torch.Tensor, collections.abc.Mapping[gigl.src.common.types.graph_data.NodeType, torch.Tensor]]]

gigl.distributed.dist_link_prediction_dataset.logger[source]#