gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch#
Classes#
A class for representing a batch of edges in a heterogeneous graph. |
Functions#
|
Converts a list of HeterogeneousGraphEdgeDict into tensors. |
This is a collate function for the EdgeBatch. |
|
|
Performs random negative sampling for each edge type. |
Module Contents#
- class gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.EdgeBatch[source]#
Bases:
gigl.experimental.knowledge_graph_embedding.common.torchrec.batch.DataclassBatch
A class for representing a batch of edges in a heterogeneous graph. This can be derived from input edge tensors, and contains logic to build a torchrec KeyedJaggedTensor (used for sharded embedding lookups) and other metadata tensors which are required to train KGE models.
- static build_data_loader(dataset, sampling_config, dataloader_config, graph_metadata, condensed_node_type_to_vocab_size_map, pin_memory, should_loop=True)[source]#
- Parameters:
dataset (torch.utils.data.IterableDataset)
sampling_config (gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig)
dataloader_config (gigl.experimental.knowledge_graph_embedding.lib.config.dataloader.DataloaderConfig)
graph_metadata (gigl.src.common.types.pb_wrappers.graph_metadata.GraphMetadataPbWrapper)
condensed_node_type_to_vocab_size_map (dict[gigl.src.common.types.graph_data.CondensedNodeType, int])
pin_memory (bool)
should_loop (bool)
- static from_edge_tensors(edges, condensed_edge_types, edge_labels, condensed_node_type_to_node_type_map, condensed_edge_type_to_condensed_node_type_map)[source]#
Creates an EdgeBatch from edge tensors. We create an EdgeBatch of len(2 * edges) by creating a src-dst pair for each edge in the batch.
- Parameters:
edges (torch.Tensor) – A tensor of edges.
condensed_edge_types (torch.Tensor) – A tensor of condensed edge types.
edge_labels (torch.Tensor) – A tensor of edge labels.
condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.
condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.
- Return type:
- to_edge_tensors(condensed_edge_type_to_condensed_node_type_map)[source]#
Reconstructs the edge tensors from the EdgeBatch. This is used for debugging and sanity checking the EdgeBatch.
- Parameters:
condensed_edge_type_to_condensed_node_type_map (dict[gigl.src.common.types.graph_data.CondensedEdgeType, tuple[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.CondensedNodeType]])
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.build_tensors_from_edge_dicts(inputs)[source]#
Converts a list of HeterogeneousGraphEdgeDict into tensors.
- Parameters:
inputs (list[HeterogeneousGraphEdgeDict]) – A list of edge dictionaries.
- Returns:
- A tuple containing:
edges (torch.Tensor): A tensor of shape [num_edges, 2] containing the source and destination node IDs.
condensed_edge_types (torch.Tensor): A tensor of shape [num_edges] containing the condensed edge types.
labels (torch.Tensor): A tensor of shape [num_edges] containing labels (all set to 1).
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.collate_edge_batch_from_heterogeneous_graph_edge_dict(inputs, condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, condensed_node_type_to_node_type_map, num_random_negatives_per_edge=0)[source]#
This is a collate function for the EdgeBatch. It takes a list of heterogeneous graph edge dictionaries (read from upstream dataset), converts them to tensors for “positive” edges, samples “negative” edges if applicable, and constructs an EdgeBatch (containing a TorchRec KeyedJaggedTensor and metadata).
- Parameters:
inputs (list[HeterogeneousGraphEdgeDict]) – The input data.
condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.
condensed_node_type_to_vocab_size_map (dict[CondensedNodeType, int]) – A mapping from condensed node types to vocab sizes.
condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.
num_negative_samples_per_edge (int) – The number of negative samples to generate for each positive edge.
num_random_negatives_per_edge (int)
- Returns:
The collated EdgeBatch.
- Return type:
- gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.relationwise_batch_random_negative_sampling(condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, num_negatives_per_condensed_edge_type=1)[source]#
Performs random negative sampling for each edge type.
This function generates num_negatives_per_condensed_edge_type with src and dst selected at random from the vocabulary associated with the node types, as defined by the edge type and provided type-to-vocabulary maps.
These can be consumed in model training as negative samples which are shared across edges.
- Parameters:
condensed_edge_type_to_condensed_node_type_map (dict[int, tuple[int, int]]) – A mapping from each edge type to a tuple of (source_node_type, destination_node_type) [R].
condensed_node_type_to_vocab_size_map (dict[int, int]) – A mapping from each node type to the size of its vocabulary.
num_negatives_per_condensed_edge_type (int) – The number of negative edges to sample per edge type [K].
- Returns:
A tensor of shape [R * K] containing negative edges. negative_edge_types (Tensor): A tensor of shape [R * K] containing the edge type
for each negative edge.
- negative_labels (Tensor): A tensor of zeros with shape [R * K], suitable for
use in contrastive or classification losses.
- Return type:
negative_edges (Tensor)