gigl.common.data.load_torch_tensors#

Attributes#

logger

Classes#

SerializedGraphMetadata

Stores information for all entities. If homogeneous, all types are of type SerializedTFRecordInfo. Otherwise, they are dictionaries with the corresponding mapping.

Functions#

load_torch_tensors_from_tf_record(...[, rank, ...])

Loads all torch tensors from a SerializedGraphMetadata object for all entity [node, edge, positive_label, negative_label] and edge / node types.

Module Contents#

class gigl.common.data.load_torch_tensors.SerializedGraphMetadata[source]#

Stores information for all entities. If homogeneous, all types are of type SerializedTFRecordInfo. Otherwise, they are dictionaries with the corresponding mapping.

edge_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo][source]#

negative_label_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo] | None = None[source]#

node_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.NodeType, gigl.common.data.dataloaders.SerializedTFRecordInfo][source]#

positive_label_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo] | None = None[source]#

gigl.common.data.load_torch_tensors.load_torch_tensors_from_tf_record(tf_record_dataloader, serialized_graph_metadata, should_load_tensors_in_parallel, rank=0, node_tf_dataset_options=TFDatasetOptions(), edge_tf_dataset_options=TFDatasetOptions())[source]#

Loads all torch tensors from a SerializedGraphMetadata object for all entity [node, edge, positive_label, negative_label] and edge / node types.

Running these processes in parallel slows the runtime of each individual process, but may still result in a net speedup across all entity types. As a result, there is a tradeoff that needs to be made between parallel and sequential tensor loading, which is why we don’t parallelize across node and edge types. We enable the should_load_tensors_in_parallel to allow some customization for loading strategies based on the input data.

Parameters:

tf_record_dataloader (TFRecordDataLoader) – TFRecordDataloader used for loading tensors from serialized tfrecords
serialized_graph_metadata (SerializedGraphMetadata) – Serialized graph metadata contained serialized information for loading tfrecords across node and edge types
should_load_tensors_in_parallel (bool) – Whether tensors should be loaded from serialized information in parallel or in sequence across the [node, edge, pos_label, neg_label] entity types.
rank (int) – Rank on current machine
node_tf_dataset_options (TFDatasetOptions) – The options to use for nodes when building the dataset.
edge_tf_dataset_options (TFDatasetOptions) – The options to use for edges when building the dataset.

Returns:

Unpartitioned Graph Tensors

Return type:

loaded_graph_tensors (LoadedGraphTensors)

gigl.common.data.load_torch_tensors.logger[source]#