gigl.experimental.knowledge_graph_embedding.common.torchrec.utils#
Attributes#
Functions#
|
This creates an optimizer for the dense parts of the model. |
|
Apply a sparse optimizer to the sparse/EBC parts of a model. |
|
Create a sharding plan for the model using the EmbeddingShardingPlanner. |
|
If in a distributed environment, apply DistributedModelParallel to the model, |
Module Contents#
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_dense_optimizer(model, optimizer_cls, optimizer_kwargs=dict())[source]#
This creates an optimizer for the dense parts of the model. It uses the KeyedOptimizerWrapper to wrap the optimizer.
- Parameters:
model (nn.Module) – The model containing dense parameters.
optimizer_cls (Type[Optimizer]) – The optimizer class to use for dense parameters.
optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.
- Returns:
- A wrapped optimizer for dense parameters, or
None if no dense parameters are found.
- Return type:
Optional[KeyedOptimizerWrapper]
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_sparse_optimizer(parameters, optimizer_cls=None, optimizer_kwargs=dict())[source]#
Apply a sparse optimizer to the sparse/EBC parts of a model. This optimizer is fused, so it will be applied directly in the backward pass.
This should only be used for sparse parameters.
- Parameters:
parameters (Iterable[nn.Parameter]) – The sparse parameters to apply the optimizer to.
optimizer_cls (Type[Optimizer], optional) – The optimizer class to use. Defaults to RowWiseAdagrad.
optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.
- Return type:
None
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.get_sharding_plan(model, batch_size, local_world_size, world_size, use_cuda=False, storage_reservation_percentage=0.15, qcomm_forward_precision=CommType.FP32, qcomm_backward_precision=CommType.FP32)[source]#
Create a sharding plan for the model using the EmbeddingShardingPlanner. :param model: The model to be sharded. :param batch_size: The batch size for the sharding plan. :param use_cuda: Whether to use CUDA for the sharding plan. :param storage_reservation_percentage: The percentage of storage reservation. :param qcomm_forward_precision: The precision for forward communication (can be FP32, FP16, etc.). :param qcomm_backward_precision: The precision for backward communication (can be FP32, FP16, etc.).
- Returns:
A ShardingPlan object representing the sharding plan for the model.
- Parameters:
model (torch.nn.Module)
batch_size (int)
local_world_size (int)
world_size (int)
use_cuda (bool)
storage_reservation_percentage (float)
qcomm_forward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)
qcomm_backward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)
- Return type:
torchrec.distributed.types.ShardingPlan
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.maybe_shard_model(model, device, sharding_plan=None)[source]#
If in a distributed environment, apply DistributedModelParallel to the model, using an optionally specified ShardingPlan. If not in a distributed environment, return the model directly. :param model: The model to be wrapped. :param device: The device to use for the model. :param sharding_plan: An optional ShardingPlan to use for the DistributedModelParallel.
- Returns:
The model wrapped in DistributedModelParallel if in a distributed environment, otherwise the model itself.
- Parameters:
device (torch.device)
sharding_plan (torchrec.distributed.types.ShardingPlan)