gigl.experimental.knowledge_graph_embedding.common.torchrec.utils#
Attributes#
Functions#
| 
 | This creates an optimizer for the dense parts of the model. | 
| 
 | Apply a sparse optimizer to the sparse/EBC parts of a model. | 
| 
 | Create a sharding plan for the model using the EmbeddingShardingPlanner. | 
| 
 | If in a distributed environment, apply DistributedModelParallel to the model, | 
Module Contents#
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_dense_optimizer(model, optimizer_cls, optimizer_kwargs=dict())[source]#
- This creates an optimizer for the dense parts of the model. It uses the KeyedOptimizerWrapper to wrap the optimizer. - Parameters:
- model (nn.Module) – The model containing dense parameters. 
- optimizer_cls (Type[Optimizer]) – The optimizer class to use for dense parameters. 
- optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer. 
 
- Returns:
- A wrapped optimizer for dense parameters, or
- None if no dense parameters are found. 
 
- Return type:
- Optional[KeyedOptimizerWrapper] 
 
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.apply_sparse_optimizer(parameters, optimizer_cls=None, optimizer_kwargs=dict())[source]#
- Apply a sparse optimizer to the sparse/EBC parts of a model. This optimizer is fused, so it will be applied directly in the backward pass. - This should only be used for sparse parameters. - Parameters:
- parameters (Iterable[nn.Parameter]) – The sparse parameters to apply the optimizer to. 
- optimizer_cls (Type[Optimizer], optional) – The optimizer class to use. Defaults to RowWiseAdagrad. 
- optimizer_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the optimizer. 
 
- Return type:
- None 
 
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.get_sharding_plan(model, batch_size, local_world_size, world_size, use_cuda=False, storage_reservation_percentage=0.15, qcomm_forward_precision=CommType.FP32, qcomm_backward_precision=CommType.FP32)[source]#
- Create a sharding plan for the model using the EmbeddingShardingPlanner. :param model: The model to be sharded. :param batch_size: The batch size for the sharding plan. :param use_cuda: Whether to use CUDA for the sharding plan. :param storage_reservation_percentage: The percentage of storage reservation. :param qcomm_forward_precision: The precision for forward communication (can be FP32, FP16, etc.). :param qcomm_backward_precision: The precision for backward communication (can be FP32, FP16, etc.). - Returns:
- A ShardingPlan object representing the sharding plan for the model. 
- Parameters:
- model (torch.nn.Module) 
- batch_size (int) 
- local_world_size (int) 
- world_size (int) 
- use_cuda (bool) 
- storage_reservation_percentage (float) 
- qcomm_forward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType) 
- qcomm_backward_precision (torchrec.distributed.fbgemm_qcomm_codec.CommType) 
 
- Return type:
- torchrec.distributed.types.ShardingPlan 
 
- gigl.experimental.knowledge_graph_embedding.common.torchrec.utils.maybe_shard_model(model, device, sharding_plan=None)[source]#
- If in a distributed environment, apply DistributedModelParallel to the model, using an optionally specified ShardingPlan. If not in a distributed environment, return the model directly. :param model: The model to be wrapped. :param device: The device to use for the model. :param sharding_plan: An optional ShardingPlan to use for the DistributedModelParallel. - Returns:
- The model wrapped in DistributedModelParallel if in a distributed environment, otherwise the model itself. 
- Parameters:
- device (torch.device) 
- sharding_plan (torchrec.distributed.types.ShardingPlan) 
 
 
