gigl.experimental.knowledge_graph_embedding.common.torchrec.utils#

Attributes#

logger

Functions#

`apply_dense_optimizer`(model, optimizer_cls[, ...])	This creates an optimizer for the dense parts of the model.
`apply_sparse_optimizer`(parameters[, optimizer_cls, ...])	Apply a sparse optimizer to the sparse/EBC parts of a model.
`get_sharding_plan`(model, batch_size, local_world_size, ...)	Create a sharding plan for the model using the EmbeddingShardingPlanner.
`maybe_shard_model`(model, device[, sharding_plan])	If in a distributed environment, apply DistributedModelParallel to the model,

Module Contents#

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_dense_optimizer(model, optimizer_cls, optimizer_kwargs=dict())[source]#

This creates an optimizer for the dense parts of the model. It uses the KeyedOptimizerWrapper to wrap the optimizer.

Parameters:

model (nn.Module) – The model containing dense parameters.
optimizer_cls (Type[Optimizer]) – The optimizer class to use for dense parameters.
optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.

Returns:

A wrapped optimizer for dense parameters, or: None if no dense parameters are found.

Return type:

Optional[KeyedOptimizerWrapper]

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_sparse_optimizer(parameters, optimizer_cls=None, optimizer_kwargs=dict())[source]#

Apply a sparse optimizer to the sparse/EBC parts of a model. This optimizer is fused, so it will be applied directly in the backward pass.

This should only be used for sparse parameters.

Parameters:

parameters (Iterable[nn.Parameter]) – The sparse parameters to apply the optimizer to.
optimizer_cls (Type[Optimizer], optional) – The optimizer class to use. Defaults to RowWiseAdagrad.
optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.

Return type:

None

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.get_sharding_plan(model, batch_size, local_world_size, world_size, use_cuda=False, storage_reservation_percentage=0.15, qcomm_forward_precision=CommType.FP32, qcomm_backward_precision=CommType.FP32)[source]#

Create a sharding plan for the model using the EmbeddingShardingPlanner. :param model: The model to be sharded. :param batch_size: The batch size for the sharding plan. :param use_cuda: Whether to use CUDA for the sharding plan. :param storage_reservation_percentage: The percentage of storage reservation. :param qcomm_forward_precision: The precision for forward communication (can be FP32, FP16, etc.). :param qcomm_backward_precision: The precision for backward communication (can be FP32, FP16, etc.).

Returns:

A ShardingPlan object representing the sharding plan for the model.

Parameters:

model (torch.nn.Module)
batch_size (int)
local_world_size (int)
world_size (int)
use_cuda (bool)
storage_reservation_percentage (float)
qcomm_forward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)
qcomm_backward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)

Return type:

torchrec.distributed.types.ShardingPlan

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.maybe_shard_model(model, device, sharding_plan=None)[source]#

If in a distributed environment, apply DistributedModelParallel to the model, using an optionally specified ShardingPlan. If not in a distributed environment, return the model directly. :param model: The model to be wrapped. :param device: The device to use for the model. :param sharding_plan: An optional ShardingPlan to use for the DistributedModelParallel.

Returns:

The model wrapped in DistributedModelParallel if in a distributed environment, otherwise the model itself.

Parameters:

device (torch.device)
sharding_plan (torchrec.distributed.types.ShardingPlan)

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.logger[source]#