gigl.src.common.models.layers.feature_interaction#
Classes#
Derived from tensorflow_recommenders [implementation](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/dcn/Cross) |
|
Wraps around DCNCross for multi-layer feature crossing. See documentation for DCNCross for more details. |
Module Contents#
- class gigl.src.common.models.layers.feature_interaction.DCNCross(in_dim, projection_dim=None, diag_scale=0.0, use_bias=True, **kwargs)[source]#
Bases:
torch.nn.Module
Derived from tensorflow_recommenders [implementation](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/dcn/Cross) Cross Layer in Deep & Cross Network to learn explicit feature interactions.
A layer that creates explicit and bounded-degree feature interactions efficiently. The call method accepts inputs as a tuple of size 2 tensors. The first input x0 is the base layer that contains the original features (usually the embedding layer); the second input xi is the output of the previous DCNCross layer in the stack, i.e., the i-th DCNCross layer. For the first DCNCross layer in the stack, x0 = xi.
The output is x_{i+1} = x0 .* (W * xi + bias + diag_scale * xi) + xi, where .* designates elementwise multiplication, W could be a full-rank matrix, or a low-rank matrix U*V to reduce the computational cost, and diag_scale increases the diagonal of W to improve training stability (especially for the low-rank case).
References: - [R. Wang et al.](https://arxiv.org/pdf/2008.13535.pdf) See Eq. (1) for full-rank and Eq. (2) for low-rank version. - [R. Wang et al.](https://arxiv.org/pdf/1708.05123.pdf)
- Parameters:
in_dim (int) – The input feature dimension.
projection_dim (Optional[int]) – Projection dimension to reduce the computational cost. Default is None such that a full (in_dim by in_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size in_dim by projection_dim and V is of size projection_dim by in_dim. projection_dim needs to be smaller than in_dim/2 to improve the model efficiency. In practice, we’ve observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.
diag_scale (float) – A non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.
use_bias (bool) – Whether to add a bias term for this layer. If set to False, no bias term will be used.
- Input shape:
A tuple of 2 (batch_size, in_dim) dimensional inputs.
- Output shape:
A single (batch_size, in_dim) dimensional output.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(x0, x=None)[source]#
Computes the feature cross. Args: x0: The input tensor x: Optional second input tensor. If provided, the layer will compute
crosses between x0 and x; if not provided, the layer will compute crosses between x0 and itself.
Returns: Tensor of crosses.
- Parameters:
x0 (torch.Tensor)
x (Optional[torch.Tensor])
- Return type:
torch.Tensor
- class gigl.src.common.models.layers.feature_interaction.DCNv2(in_dim, num_layers=1, projection_dim=None, diag_scale=0.0, use_bias=True, **kwargs)[source]#
Bases:
torch.nn.Module
Wraps around DCNCross for multi-layer feature crossing. See documentation for DCNCross for more details.
- Parameters:
in_dim (int) – The input feature dimension.
num_layers (int) – How many feature crossing layers to use. K layers will produce as high as (K+1)-order features.
projection_dim (Optional[int]) – Projection dimension to reduce the computational cost. Default is None such that a full (in_dim by in_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size in_dim by projection_dim and V is of size projection_dim by in_dim. projection_dim needs to be smaller than in_dim/2 to improve the model efficiency. In practice, we’ve observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.
diag_scale (float) – A non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.
use_bias (bool) – Whether to add a bias term for this layer. If set to False, no bias term will be used.
Initialize internal Module state, shared by both nn.Module and ScriptModule.