gigl.experimental.knowledge_graph_embedding.common.torchrec.utils#

Attributes#

Functions#

apply_dense_optimizer(model, optimizer_cls[, ...])

This creates an optimizer for the dense parts of the model.

apply_sparse_optimizer(parameters[, optimizer_cls, ...])

Apply a sparse optimizer to the sparse/EBC parts of a model.

get_sharding_plan(model, batch_size, local_world_size, ...)

Create a sharding plan for the model using the EmbeddingShardingPlanner.

maybe_shard_model(model, device[, sharding_plan])

If in a distributed environment, apply DistributedModelParallel to the model,

Module Contents#

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_dense_optimizer(model, optimizer_cls, optimizer_kwargs=dict())[source]#

This creates an optimizer for the dense parts of the model. It uses the KeyedOptimizerWrapper to wrap the optimizer.

Parameters:
  • model (nn.Module) – The model containing dense parameters.

  • optimizer_cls (Type[Optimizer]) – The optimizer class to use for dense parameters.

  • optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.

Returns:

A wrapped optimizer for dense parameters, or

None if no dense parameters are found.

Return type:

Optional[KeyedOptimizerWrapper]

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_sparse_optimizer(parameters, optimizer_cls=None, optimizer_kwargs=dict())[source]#

Apply a sparse optimizer to the sparse/EBC parts of a model. This optimizer is fused, so it will be applied directly in the backward pass.

This should only be used for sparse parameters.

Parameters:
  • parameters (Iterable[nn.Parameter]) – The sparse parameters to apply the optimizer to.

  • optimizer_cls (Type[Optimizer], optional) – The optimizer class to use. Defaults to RowWiseAdagrad.

  • optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer.

Return type:

None

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.get_sharding_plan(model, batch_size, local_world_size, world_size, use_cuda=False, storage_reservation_percentage=0.15, qcomm_forward_precision=CommType.FP32, qcomm_backward_precision=CommType.FP32)[source]#

Create a sharding plan for the model using the EmbeddingShardingPlanner. :param model: The model to be sharded. :param batch_size: The batch size for the sharding plan. :param use_cuda: Whether to use CUDA for the sharding plan. :param storage_reservation_percentage: The percentage of storage reservation. :param qcomm_forward_precision: The precision for forward communication (can be FP32, FP16, etc.). :param qcomm_backward_precision: The precision for backward communication (can be FP32, FP16, etc.).

Returns:

A ShardingPlan object representing the sharding plan for the model.

Parameters:
  • model (torch.nn.Module)

  • batch_size (int)

  • local_world_size (int)

  • world_size (int)

  • use_cuda (bool)

  • storage_reservation_percentage (float)

  • qcomm_forward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)

  • qcomm_backward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType)

Return type:

torchrec.distributed.types.ShardingPlan

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.maybe_shard_model(model, device, sharding_plan=None)[source]#

If in a distributed environment, apply DistributedModelParallel to the model, using an optionally specified ShardingPlan. If not in a distributed environment, return the model directly. :param model: The model to be wrapped. :param device: The device to use for the model. :param sharding_plan: An optional ShardingPlan to use for the DistributedModelParallel.

Returns:

The model wrapped in DistributedModelParallel if in a distributed environment, otherwise the model itself.

Parameters:
  • device (torch.device)

  • sharding_plan (torchrec.distributed.types.ShardingPlan)

gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.logger[source]#