gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch#

Classes#

EdgeBatch

A class for representing a batch of edges in a heterogeneous graph.

Functions#

`build_tensors_from_edge_dicts`(inputs)	Converts a list of HeterogeneousGraphEdgeDict into tensors.
`collate_edge_batch_from_heterogeneous_graph_edge_dict`(...)	This is a collate function for the EdgeBatch.
`relationwise_batch_random_negative_sampling`(...[, ...])	Performs random negative sampling for each edge type.

Module Contents#

class gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.EdgeBatch[source]#

Bases: gigl.experimental.knowledge_graph_embedding.common.torchrec.batch.DataclassBatch

A class for representing a batch of edges in a heterogeneous graph. This can be derived from input edge tensors, and contains logic to build a torchrec KeyedJaggedTensor (used for sharded embedding lookups) and other metadata tensors which are required to train KGE models.

static build_data_loader(dataset, sampling_config, dataloader_config, graph_metadata, condensed_node_type_to_vocab_size_map, pin_memory, should_loop=True)[source]#

Parameters:

dataset (torch.utils.data.IterableDataset)
sampling_config (gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig)
dataloader_config (gigl.experimental.knowledge_graph_embedding.lib.config.dataloader.DataloaderConfig)
graph_metadata (gigl.src.common.types.pb_wrappers.graph_metadata.GraphMetadataPbWrapper)
condensed_node_type_to_vocab_size_map (dict[gigl.src.common.types.graph_data.CondensedNodeType, int])
pin_memory (bool)
should_loop (bool)

static from_edge_tensors(edges, condensed_edge_types, edge_labels, condensed_node_type_to_node_type_map, condensed_edge_type_to_condensed_node_type_map)[source]#

Creates an EdgeBatch from edge tensors. We create an EdgeBatch of len(2 * edges) by creating a src-dst pair for each edge in the batch.

Parameters:

edges (torch.Tensor) – A tensor of edges.
condensed_edge_types (torch.Tensor) – A tensor of condensed edge types.
edge_labels (torch.Tensor) – A tensor of edge labels.
condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.
condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.

Return type:

EdgeBatch

to_edge_tensors(condensed_edge_type_to_condensed_node_type_map)[source]#

Reconstructs the edge tensors from the EdgeBatch. This is used for debugging and sanity checking the EdgeBatch.

Parameters:: condensed_edge_type_to_condensed_node_type_map (dict[gigl.src.common.types.graph_data.CondensedEdgeType, tuple[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.CondensedNodeType]])
Return type:: tuple[torch.Tensor, torch.Tensor, torch.Tensor]

condensed_edge_types: torch.Tensor[source]#

labels: torch.Tensor[source]#

src_dst_pairs: torchrec.KeyedJaggedTensor[source]#

gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.build_tensors_from_edge_dicts(inputs)[source]#

Converts a list of HeterogeneousGraphEdgeDict into tensors.

Parameters:

inputs (list[HeterogeneousGraphEdgeDict]) – A list of edge dictionaries.

Returns:

A tuple containing:

edges (torch.Tensor): A tensor of shape [num_edges, 2] containing the source and destination node IDs.
condensed_edge_types (torch.Tensor): A tensor of shape [num_edges] containing the condensed edge types.
labels (torch.Tensor): A tensor of shape [num_edges] containing labels (all set to 1).

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.collate_edge_batch_from_heterogeneous_graph_edge_dict(inputs, condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, condensed_node_type_to_node_type_map, num_random_negatives_per_edge=0)[source]#

This is a collate function for the EdgeBatch. It takes a list of heterogeneous graph edge dictionaries (read from upstream dataset), converts them to tensors for “positive” edges, samples “negative” edges if applicable, and constructs an EdgeBatch (containing a TorchRec KeyedJaggedTensor and metadata).

Parameters:

inputs (list[HeterogeneousGraphEdgeDict]) – The input data.
condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.
condensed_node_type_to_vocab_size_map (dict[CondensedNodeType, int]) – A mapping from condensed node types to vocab sizes.
condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.
num_negative_samples_per_edge (int) – The number of negative samples to generate for each positive edge.
num_random_negatives_per_edge (int)

Returns:

The collated EdgeBatch.

Return type:

EdgeBatch

gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.relationwise_batch_random_negative_sampling(condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, num_negatives_per_condensed_edge_type=1)[source]#

Performs random negative sampling for each edge type.

This function generates num_negatives_per_condensed_edge_type with src and dst selected at random from the vocabulary associated with the node types, as defined by the edge type and provided type-to-vocabulary maps.

These can be consumed in model training as negative samples which are shared across edges.

Parameters:

condensed_edge_type_to_condensed_node_type_map (dict[int, tuple[int, int]]) – A mapping from each edge type to a tuple of (source_node_type, destination_node_type) [R].
condensed_node_type_to_vocab_size_map (dict[int, int]) – A mapping from each node type to the size of its vocabulary.
num_negatives_per_condensed_edge_type (int) – The number of negative edges to sample per edge type [K].

Returns:

A tensor of shape [R * K] containing negative edges. negative_edge_types (Tensor): A tensor of shape [R * K] containing the edge type

for each negative edge.

negative_labels (Tensor): A tensor of zeros with shape [R * K], suitable for: use in contrastive or classification losses.

Return type:

negative_edges (Tensor)