gigl.experimental.knowledge_graph_embedding.lib.config#

Submodules#

Attributes#

Classes#

EnumeratedGraphData

Configuration for enumerated graph data after preprocessing.

EvaluationPhaseConfig

Configuration for evaluation phases (validation/testing) during knowledge graph embedding training.

GraphConfig

Main graph configuration containing metadata and data references.

GraphMetadataPbWrapper

HeterogeneousGraphSparseEmbeddingConfig

Main configuration class for heterogeneous graph sparse embedding training.

Logger

GiGL's custom logger class used for local and cloud logging (VertexAI, Dataflow, etc.)

ModelConfig

Configuration for knowledge graph embedding model architecture.

RawGraphData

Configuration for raw graph data references from BigQuery sources.

RunConfig

Configuration for runtime execution environment settings.

TrainConfig

Main training configuration that orchestrates all training-related settings.

Package Contents#

class gigl.experimental.knowledge_graph_embedding.lib.config.EnumeratedGraphData[source]#

Configuration for enumerated graph data after preprocessing.

Enumerated graph data refers to preprocessed node and edge data where node and edge identifiers have been mapped to integer IDs, making them suitable for embedding lookups into tables during model training.

node_data#

List of metadata for enumerated node types, containing mapping information from raw node IDs to integer IDs.

Type:

List[EnumeratorNodeTypeMetadata]

edge_data#

List of metadata for enumerated edge types, containing mapping information from raw node ID-based edges to corresponding integer ID-based edges.

Type:

List[EnumeratorEdgeTypeMetadata]

generate_hydra_config_yaml()[source]#

Generate a Hydra-compatible YAML configuration string for enumerated graph data.

Converts the enumerated node and edge data into a YAML format that can be used by Hydra for configuration management. Dynamically inserts ‘_target_’ fields based on object types, handling dataclasses and namedtuples.

Returns:

A YAML-formatted string containing the Hydra configuration for

enumerated graph data with proper ‘_target_’ fields for instantiation.

Return type:

str

to_dictconfig()[source]#

Convert enumerated graph data to an OmegaConf DictConfig object.

Creates a Hydra-compatible configuration object from the enumerated node and edge data. This is useful for programmatic configuration management without writing to files. Dynamically inserts ‘_target_’ fields based on object types.

Returns:

An OmegaConf DictConfig object containing the enumerated graph

data configuration with proper ‘_target_’ fields for Hydra instantiation.

Return type:

DictConfig

edge_data: List[gigl.src.data_preprocessor.lib.enumerate.utils.EnumeratorEdgeTypeMetadata]#
node_data: List[gigl.src.data_preprocessor.lib.enumerate.utils.EnumeratorNodeTypeMetadata]#
class gigl.experimental.knowledge_graph_embedding.lib.config.EvaluationPhaseConfig[source]#

Configuration for evaluation phases (validation/testing) during knowledge graph embedding training.

Controls how model performance is measured during training (validation phase) and after training completion (testing phase). Uses ranking-based metrics to assess link prediction quality.

dataloader#

Configuration for data loading during evaluation (workers, memory pinning). Defaults to DataloaderConfig() with standard settings.

Type:

DataloaderConfig

step_frequency#

How often to run evaluation during training (every N steps). If None, evaluation runs only at the end of training. Defaults to None.

Type:

Optional[int]

num_batches#

Maximum number of batches to evaluate. Useful for faster evaluation on large datasets by sampling a subset. If None, evaluates all data. Defaults to None.

Type:

Optional[int]

hit_rates_at_k#

List of k values for computing Hit@k (Hits at k) metrics. Hit@k measures if the correct answer appears in the top k predictions. Common values are [1, 10, 100]. Defaults to [1, 10, 100].

Type:

List[int]

sampling#

Negative sampling configuration for evaluation. Should match or be compatible with training sampling to ensure fair comparison. Defaults to SamplingConfig() with standard settings.

Type:

SamplingConfig

dataloader: gigl.experimental.knowledge_graph_embedding.lib.config.dataloader.DataloaderConfig#
hit_rates_at_k: List[int] = [1, 10, 100]#
num_batches: int | None = None#
sampling: gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig#
step_frequency: int | None = None#
class gigl.experimental.knowledge_graph_embedding.lib.config.GraphConfig[source]#

Main graph configuration containing metadata and data references.

This configuration encapsulates all information about the knowledge graph structure, including schema metadata and references to both raw and processed data sources.

metadata#

Graph metadata wrapper containing schema information (node types, edge types, feature schemas) wrapped in a protocol buffer format.

Type:

GraphMetadataPbWrapper

raw_graph_data#

Optional reference to raw BigQuery data sources. Used during initial data ingestion and preprocessing. None if not applicable.

Type:

Optional[RawGraphData]

enumerated_graph_data#

Optional reference to preprocessed enumerated data. Used during model training when data has been preprocessed into integer IDs. None if not applicable.

Type:

Optional[EnumeratedGraphData]

enumerated_graph_data: EnumeratedGraphData | None = None#
metadata: gigl.src.common.types.pb_wrappers.graph_metadata.GraphMetadataPbWrapper#
raw_graph_data: RawGraphData | None = None#
class gigl.experimental.knowledge_graph_embedding.lib.config.GraphMetadataPbWrapper[source]#
property condensed_edge_type_to_condensed_node_types: dict[gigl.src.common.types.graph_data.CondensedEdgeType, Tuple[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.CondensedNodeType]]#

Allows access to a mapping which simplifies looking up src/dst CondensedNodeTypes for each CondensedEdgeType. :return:

Return type:

dict[gigl.src.common.types.graph_data.CondensedEdgeType, Tuple[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.CondensedNodeType]]

property condensed_edge_type_to_edge_type_map: dict[gigl.src.common.types.graph_data.CondensedEdgeType, gigl.src.common.types.graph_data.EdgeType]#
Return type:

dict[gigl.src.common.types.graph_data.CondensedEdgeType, gigl.src.common.types.graph_data.EdgeType]

property condensed_edge_types: list[gigl.src.common.types.graph_data.CondensedEdgeType]#
Return type:

list[gigl.src.common.types.graph_data.CondensedEdgeType]

property condensed_node_type_to_node_type_map: dict[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.NodeType]#
Return type:

dict[gigl.src.common.types.graph_data.CondensedNodeType, gigl.src.common.types.graph_data.NodeType]

property condensed_node_types: list[gigl.src.common.types.graph_data.CondensedNodeType]#
Return type:

list[gigl.src.common.types.graph_data.CondensedNodeType]

property edge_type_to_condensed_edge_type_map: dict[gigl.src.common.types.graph_data.EdgeType, gigl.src.common.types.graph_data.CondensedEdgeType]#
Return type:

dict[gigl.src.common.types.graph_data.EdgeType, gigl.src.common.types.graph_data.CondensedEdgeType]

property edge_types: list[gigl.src.common.types.graph_data.EdgeType]#
Return type:

list[gigl.src.common.types.graph_data.EdgeType]

graph_metadata_pb: snapchat.research.gbml.graph_schema_pb2.GraphMetadata#
property homogeneous_condensed_edge_type: gigl.src.common.types.graph_data.CondensedEdgeType#

Returns the singular condensed edge type for a homogeneous graph. This property should only be called if the graph is known to be homogeneous.

Return type:

gigl.src.common.types.graph_data.CondensedEdgeType

property homogeneous_condensed_node_type: gigl.src.common.types.graph_data.CondensedNodeType#

Returns the singular condensed node type for a homogeneous graph. This property should only be called if the graph is known to be homogeneous.

Return type:

gigl.src.common.types.graph_data.CondensedNodeType

property homogeneous_edge_type: gigl.src.common.types.graph_data.EdgeType#

Returns the singular edge type for a homogeneous graph. This property should only be called if the graph is known to be homogeneous.

Return type:

gigl.src.common.types.graph_data.EdgeType

property homogeneous_node_type: gigl.src.common.types.graph_data.NodeType#

Returns the singular node type for a homogeneous graph. This property should only be called if the graph is known to be homogeneous.

Return type:

gigl.src.common.types.graph_data.NodeType

property is_heterogeneous: bool#
Return type:

bool

property node_type_to_condensed_node_type_map: dict[gigl.src.common.types.graph_data.NodeType, gigl.src.common.types.graph_data.CondensedNodeType]#
Return type:

dict[gigl.src.common.types.graph_data.NodeType, gigl.src.common.types.graph_data.CondensedNodeType]

property node_types: list[gigl.src.common.types.graph_data.NodeType]#
Return type:

list[gigl.src.common.types.graph_data.NodeType]

class gigl.experimental.knowledge_graph_embedding.lib.config.HeterogeneousGraphSparseEmbeddingConfig[source]#

Main configuration class for heterogeneous graph sparse embedding training.

This configuration orchestrates all aspects of knowledge graph embedding model training, including the graph data structure, model architecture, training parameters, and evaluation settings.

run[source]#

Runtime configuration specifying execution environment (GPU/CPU usage).

Type:

RunConfig

graph[source]#

Graph configuration containing metadata and data references for nodes and edges.

Type:

GraphConfig

model[source]#

Model architecture configuration including embedding dimensions and operators. Defaults to ModelConfig() with standard settings.

Type:

ModelConfig

training[source]#

Training configuration with optimization, sampling, and distributed settings. Defaults to TrainConfig() with standard settings.

Type:

TrainConfig

validation[source]#

Evaluation configuration for validation phase during training. Defaults to EvaluationPhaseConfig() with standard settings.

Type:

EvaluationPhaseConfig

testing[source]#

Evaluation configuration for final model testing phase. Defaults to EvaluationPhaseConfig() with standard settings.

Type:

EvaluationPhaseConfig

static from_omegaconf(config)[source]#

Create a HeterogeneousGraphSparseEmbeddingConfig object from an OmegaConf DictConfig. :param config: The OmegaConf DictConfig object containing the configuration.

Returns:

A HeterogeneousGraphSparseEmbeddingConfig object.

Parameters:

config (omegaconf.DictConfig)

Return type:

HeterogeneousGraphSparseEmbeddingConfig

graph: graph.GraphConfig[source]#
model: model.ModelConfig[source]#
run: run.RunConfig[source]#
testing: evaluation.EvaluationPhaseConfig[source]#
training: training.TrainConfig[source]#
validation: evaluation.EvaluationPhaseConfig[source]#
class gigl.experimental.knowledge_graph_embedding.lib.config.Logger(logger=None, name=None, log_to_file=False, extra=None)[source]#

Bases: logging.LoggerAdapter

GiGL’s custom logger class used for local and cloud logging (VertexAI, Dataflow, etc.) :param logger: A custom logger to use. If not provided, the default logger will be created. :type logger: Optional[logging.Logger] :param name: The name to be used for the logger. By default uses “root”. :type name: Optional[str] :param log_to_file: If True, logs will be written to a file. If False, logs will be written to the console. :type log_to_file: bool :param extra: Extra information to be added to the log message. :type extra: Optional[dict[str, Any]]

Initialize the adapter with a logger and a dict-like object which provides contextual information. This constructor signature allows easy stacking of LoggerAdapters, if so desired.

You can effectively pass keyword arguments as shown in the following example:

adapter = LoggerAdapter(someLogger, dict(p1=v1, p2=”v2”))

Parameters:
  • logger (Optional[logging.Logger])

  • name (Optional[str])

  • log_to_file (bool)

  • extra (Optional[dict[str, Any]])

process(msg, kwargs)[source]#

Process the logging message and keyword arguments passed in to a logging call to insert contextual information. You can either manipulate the message itself, the keyword args or both. Return the message and kwargs modified (or not) to suit your needs.

Normally, you’ll only need to override this one method in a LoggerAdapter subclass for your specific needs.

Parameters:
  • msg (str)

  • kwargs (MutableMapping[str, Any])

Return type:

Any

class gigl.experimental.knowledge_graph_embedding.lib.config.ModelConfig[source]#

Configuration for knowledge graph embedding model architecture.

Defines the structure and behavior of the embedding model used for link prediction in heterogeneous knowledge graphs.

node_embedding_dim#

Dimensionality of node embeddings. Higher dimensions can capture more complex relationships but require more memory and computation. Defaults to 128.

Type:

int

embedding_similarity_type#

Type of similarity function used to compute scores between node embeddings. Options include cosine similarity, dot product, etc. Defaults to SimilarityType.COSINE.

Type:

SimilarityType

src_operator#

Transformation operator applied to source node embeddings before computing edge scores. Can be identity (no transformation) or learned operators. Defaults to OperatorType.IDENTITY.

Type:

OperatorType

dst_operator#

Transformation operator applied to destination node embeddings before computing edge scores. Can be identity (no transformation) or learned operators. Defaults to OperatorType.IDENTITY.

Type:

OperatorType

training_sampling#

Sampling configuration used during training phase. Populated at runtime from training config. Defaults to None.

Type:

Optional[SamplingConfig]

validation_sampling#

Sampling configuration used during validation phase. Populated at runtime from validation config. Defaults to None.

Type:

Optional[SamplingConfig]

testing_sampling#

Sampling configuration used during testing phase. Populated at runtime from testing config. Defaults to None.

Type:

Optional[SamplingConfig]

num_edge_types#

Number of distinct edge types in the knowledge graph. Populated at runtime from graph metadata. Defaults to None.

Type:

Optional[int]

embeddings_config#

TorchRec embedding configuration for sparse embeddings. Specifies embedding tables, sharding strategies, and optimization settings. Populated at runtime. Defaults to None.

Type:

Optional[List[torchrec.EmbeddingBagConfig]]

dst_operator: gigl.experimental.knowledge_graph_embedding.lib.model.types.OperatorType#
embedding_similarity_type: gigl.experimental.knowledge_graph_embedding.lib.model.types.SimilarityType#
embeddings_config: List[torchrec.EmbeddingBagConfig] | None = None#
node_embedding_dim: int = 128#
num_edge_types: int | None = None#
src_operator: gigl.experimental.knowledge_graph_embedding.lib.model.types.OperatorType#
testing_sampling: gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig | None = None#
training_sampling: gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig | None = None#
validation_sampling: gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig | None = None#
class gigl.experimental.knowledge_graph_embedding.lib.config.RawGraphData[source]#

Configuration for raw graph data references from BigQuery sources.

Raw graph data refers to the original, unprocessed node and edge data stored in BigQuery tables before enumeration and preprocessing for model training.

node_data#

List of BigQuery data references for node data tables. Each reference specifies the location and schema of node information.

Type:

List[BigqueryNodeDataReference]

edge_data#

List of BigQuery data references for edge data tables. Each reference specifies the location and schema of relationship information.

Type:

List[BigqueryEdgeDataReference]

edge_data: List[gigl.src.data_preprocessor.lib.ingest.bigquery.BigqueryEdgeDataReference]#
node_data: List[gigl.src.data_preprocessor.lib.ingest.bigquery.BigqueryNodeDataReference]#
class gigl.experimental.knowledge_graph_embedding.lib.config.RunConfig[source]#

Configuration for runtime execution environment settings.

Controls the basic execution environment for knowledge graph embedding training, particularly hardware acceleration preferences.

should_use_cuda#

Whether to use CUDA (GPU) acceleration for training. If True, training will use available GPUs for faster computation. If False, training will run on CPU only. Automatically adjusted based on GPU availability during initialization. Defaults to True.

Type:

bool

should_use_cuda: bool = True#
class gigl.experimental.knowledge_graph_embedding.lib.config.TrainConfig[source]#

Main training configuration that orchestrates all training-related settings.

This configuration combines optimization, data loading, distributed training, checkpointing, and monitoring settings for knowledge graph embedding training.

max_steps#

Maximum number of training steps to perform. If None, training continues until early stopping or manual interruption. Defaults to None.

Type:

Optional[int]

early_stopping#

Configuration for early stopping based on validation metrics. Defaults to EarlyStoppingConfig() with no patience limit.

Type:

EarlyStoppingConfig

dataloader#

Configuration for data loading (number of workers, memory pinning). Defaults to DataloaderConfig() with standard settings.

Type:

DataloaderConfig

sampling#

Configuration for negative sampling strategy during training. Defaults to SamplingConfig() with standard settings.

Type:

SamplingConfig

optimizer#

Configuration for separate sparse and dense optimizers. Defaults to OptimizerConfig() with standard settings.

Type:

OptimizerConfig

distributed#

Configuration for multi-GPU/multi-process training. Defaults to DistributedConfig() with auto-detected GPU count.

Type:

DistributedConfig

checkpointing#

Configuration for saving and loading model checkpoints. Defaults to CheckpointingConfig() with standard settings.

Type:

CheckpointingConfig

logging#

Configuration for training progress logging frequency. Defaults to LoggingConfig() with log-every-step setting.

Type:

LoggingConfig

checkpointing: CheckpointingConfig#
dataloader: gigl.experimental.knowledge_graph_embedding.lib.config.dataloader.DataloaderConfig#
distributed: DistributedConfig#
early_stopping: EarlyStoppingConfig#
logging: LoggingConfig#
max_steps: int | None = None#
optimizer: OptimizerConfig#
sampling: gigl.experimental.knowledge_graph_embedding.lib.config.sampling.SamplingConfig#
gigl.experimental.knowledge_graph_embedding.lib.config.logger[source]#