gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch#

Classes#

EdgeBatch

A class for representing a batch of edges in a heterogeneous graph.

Functions#

build_tensors_from_edge_dicts(inputs)

Converts a list of HeterogeneousGraphEdgeDict into tensors.

collate_edge_batch_from_heterogeneous_graph_edge_dict(...)

This is a collate function for the EdgeBatch.

relationwise_batch_random_negative_sampling(...[, ...])

Performs random negative sampling for each edge type.

Module Contents#

class gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.EdgeBatch[source]#

Bases: gigl.experimental.knowledge_graph_embedding.common.torchrec.batch.DataclassBatch

A class for representing a batch of edges in a heterogeneous graph. This can be derived from input edge tensors, and contains logic to build a torchrec KeyedJaggedTensor (used for sharded embedding lookups) and other metadata tensors which are required to train KGE models.

static build_data_loader(dataset, sampling_config, dataloader_config, graph_metadata, condensed_node_type_to_vocab_size_map, pin_memory, should_loop=True)[source]#
Parameters:
static from_edge_tensors(edges, condensed_edge_types, edge_labels, condensed_node_type_to_node_type_map, condensed_edge_type_to_condensed_node_type_map)[source]#

Creates an EdgeBatch from edge tensors. We create an EdgeBatch of len(2 * edges) by creating a src-dst pair for each edge in the batch.

Parameters:
  • edges (torch.Tensor) – A tensor of edges.

  • condensed_edge_types (torch.Tensor) – A tensor of condensed edge types.

  • edge_labels (torch.Tensor) – A tensor of edge labels.

  • condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.

  • condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.

Return type:

EdgeBatch

to_edge_tensors(condensed_edge_type_to_condensed_node_type_map)[source]#

Reconstructs the edge tensors from the EdgeBatch. This is used for debugging and sanity checking the EdgeBatch.

Parameters:

condensed_edge_type_to_condensed_node_type_map (dict[gigl.src.common.types.graph_data.CondensedEdgeType, tuple[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.CondensedNodeType]])

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

condensed_edge_types: torch.Tensor[source]#
labels: torch.Tensor[source]#
src_dst_pairs: torchrec.KeyedJaggedTensor[source]#
gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.build_tensors_from_edge_dicts(inputs)[source]#

Converts a list of HeterogeneousGraphEdgeDict into tensors.

Parameters:

inputs (list[HeterogeneousGraphEdgeDict]) – A list of edge dictionaries.

Returns:

A tuple containing:
  • edges (torch.Tensor): A tensor of shape [num_edges, 2] containing the source and destination node IDs.

  • condensed_edge_types (torch.Tensor): A tensor of shape [num_edges] containing the condensed edge types.

  • labels (torch.Tensor): A tensor of shape [num_edges] containing labels (all set to 1).

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.collate_edge_batch_from_heterogeneous_graph_edge_dict(inputs, condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, condensed_node_type_to_node_type_map, num_random_negatives_per_edge=0)[source]#

This is a collate function for the EdgeBatch. It takes a list of heterogeneous graph edge dictionaries (read from upstream dataset), converts them to tensors for “positive” edges, samples “negative” edges if applicable, and constructs an EdgeBatch (containing a TorchRec KeyedJaggedTensor and metadata).

Parameters:
  • inputs (list[HeterogeneousGraphEdgeDict]) – The input data.

  • condensed_edge_type_to_condensed_node_type_map (dict[CondensedEdgeType, tuple[CondensedNodeType, CondensedNodeType]]) – A mapping from condensed edge types to condensed node types.

  • condensed_node_type_to_vocab_size_map (dict[CondensedNodeType, int]) – A mapping from condensed node types to vocab sizes.

  • condensed_node_type_to_node_type_map (dict[CondensedNodeType, NodeType]) – A mapping from condensed node types to node types.

  • num_negative_samples_per_edge (int) – The number of negative samples to generate for each positive edge.

  • num_random_negatives_per_edge (int)

Returns:

The collated EdgeBatch.

Return type:

EdgeBatch

gigl.experimental.knowledge_graph_embedding.lib.data.edge_batch.relationwise_batch_random_negative_sampling(condensed_edge_type_to_condensed_node_type_map, condensed_node_type_to_vocab_size_map, num_negatives_per_condensed_edge_type=1)[source]#

Performs random negative sampling for each edge type.

This function generates num_negatives_per_condensed_edge_type with src and dst selected at random from the vocabulary associated with the node types, as defined by the edge type and provided type-to-vocabulary maps.

These can be consumed in model training as negative samples which are shared across edges.

Parameters:
  • condensed_edge_type_to_condensed_node_type_map (dict[int, tuple[int, int]]) – A mapping from each edge type to a tuple of (source_node_type, destination_node_type) [R].

  • condensed_node_type_to_vocab_size_map (dict[int, int]) – A mapping from each node type to the size of its vocabulary.

  • num_negatives_per_condensed_edge_type (int) – The number of negative edges to sample per edge type [K].

Returns:

A tensor of shape [R * K] containing negative edges. negative_edge_types (Tensor): A tensor of shape [R * K] containing the edge type

for each negative edge.

negative_labels (Tensor): A tensor of zeros with shape [R * K], suitable for

use in contrastive or classification losses.

Return type:

negative_edges (Tensor)