gigl.common.data.load_torch_tensors#
Attributes#
Classes#
Stores information for all entities. If homogeneous, all types are of type SerializedTFRecordInfo. Otherwise, they are dictionaries with the corresponding mapping. |
Functions#
|
Loads all torch tensors from a SerializedGraphMetadata object for all entity [node, edge, positive_label, negative_label] and edge / node types. |
Module Contents#
- class gigl.common.data.load_torch_tensors.SerializedGraphMetadata[source]#
Stores information for all entities. If homogeneous, all types are of type SerializedTFRecordInfo. Otherwise, they are dictionaries with the corresponding mapping.
- edge_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo][source]#
- negative_label_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo | None] | None = None[source]#
- node_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.NodeType, gigl.common.data.dataloaders.SerializedTFRecordInfo][source]#
- positive_label_entity_info: gigl.common.data.dataloaders.SerializedTFRecordInfo | Dict[gigl.src.common.types.graph_data.EdgeType, gigl.common.data.dataloaders.SerializedTFRecordInfo | None] | None = None[source]#
- gigl.common.data.load_torch_tensors.load_torch_tensors_from_tf_record(tf_record_dataloader, serialized_graph_metadata, should_load_tensors_in_parallel, rank=0, node_tf_dataset_options=TFDatasetOptions(), edge_tf_dataset_options=TFDatasetOptions())[source]#
Loads all torch tensors from a SerializedGraphMetadata object for all entity [node, edge, positive_label, negative_label] and edge / node types.
Running these processes in parallel slows the runtime of each individual process, but may still result in a net speedup across all entity types. As a result, there is a tradeoff that needs to be made between parallel and sequential tensor loading, which is why we don’t parallelize across node and edge types. We enable the should_load_tensors_in_parallel to allow some customization for loading strategies based on the input data.
- Parameters:
tf_record_dataloader (TFRecordDataLoader) – TFRecordDataloader used for loading tensors from serialized tfrecords
serialized_graph_metadata (SerializedGraphMetadata) – Serialized graph metadata contained serialized information for loading tfrecords across node and edge types
should_load_tensors_in_parallel (bool) – Whether tensors should be loaded from serialized information in parallel or in sequence across the [node, edge, pos_label, neg_label] entity types.
rank (int) – Rank on current machine
node_tf_dataset_options (TFDatasetOptions) – The options to use for nodes when building the dataset.
edge_tf_dataset_options (TFDatasetOptions) – The options to use for edges when building the dataset.
- Returns:
Unpartitioned Graph Tensors
- Return type:
loaded_graph_tensors (LoadedGraphTensors)