gigl.src.common.models.layers.feature_interaction#

Classes#

DCNCross

Derived from tensorflow_recommenders [implementation](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/dcn/Cross)

DCNv2

Wraps around DCNCross for multi-layer feature crossing. See documentation for DCNCross for more details.

Module Contents#

class gigl.src.common.models.layers.feature_interaction.DCNCross(in_dim, projection_dim=None, diag_scale=0.0, use_bias=True, **kwargs)[source]#

Bases: torch.nn.Module

Derived from tensorflow_recommenders [implementation](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/dcn/Cross) Cross Layer in Deep & Cross Network to learn explicit feature interactions.

A layer that creates explicit and bounded-degree feature interactions efficiently. The call method accepts inputs as a tuple of size 2 tensors. The first input x0 is the base layer that contains the original features (usually the embedding layer); the second input xi is the output of the previous DCNCross layer in the stack, i.e., the i-th DCNCross layer. For the first DCNCross layer in the stack, x0 = xi.

The output is x_{i+1} = x0 .* (W * xi + bias + diag_scale * xi) + xi, where .* designates elementwise multiplication, W could be a full-rank matrix, or a low-rank matrix U*V to reduce the computational cost, and diag_scale increases the diagonal of W to improve training stability (especially for the low-rank case).

References: - [R. Wang et al.](https://arxiv.org/pdf/2008.13535.pdf) See Eq. (1) for full-rank and Eq. (2) for low-rank version. - [R. Wang et al.](https://arxiv.org/pdf/1708.05123.pdf)

Parameters:
  • in_dim (int) – The input feature dimension.

  • projection_dim (Optional[int]) – Projection dimension to reduce the computational cost. Default is None such that a full (in_dim by in_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size in_dim by projection_dim and V is of size projection_dim by in_dim. projection_dim needs to be smaller than in_dim/2 to improve the model efficiency. In practice, we’ve observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.

  • diag_scale (float) – A non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.

  • use_bias (bool) – Whether to add a bias term for this layer. If set to False, no bias term will be used.

Input shape:

A tuple of 2 (batch_size, in_dim) dimensional inputs.

Output shape:

A single (batch_size, in_dim) dimensional output.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x0, x=None)[source]#

Computes the feature cross. Args: x0: The input tensor x: Optional second input tensor. If provided, the layer will compute

crosses between x0 and x; if not provided, the layer will compute crosses between x0 and itself.

Returns: Tensor of crosses.

Parameters:
  • x0 (torch.Tensor)

  • x (Optional[torch.Tensor])

Return type:

torch.Tensor

reset_parameters()[source]#
class gigl.src.common.models.layers.feature_interaction.DCNv2(in_dim, num_layers=1, projection_dim=None, diag_scale=0.0, use_bias=True, **kwargs)[source]#

Bases: torch.nn.Module

Wraps around DCNCross for multi-layer feature crossing. See documentation for DCNCross for more details.

Parameters:
  • in_dim (int) – The input feature dimension.

  • num_layers (int) – How many feature crossing layers to use. K layers will produce as high as (K+1)-order features.

  • projection_dim (Optional[int]) – Projection dimension to reduce the computational cost. Default is None such that a full (in_dim by in_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size in_dim by projection_dim and V is of size projection_dim by in_dim. projection_dim needs to be smaller than in_dim/2 to improve the model efficiency. In practice, we’ve observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.

  • diag_scale (float) – A non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.

  • use_bias (bool) – Whether to add a bias term for this layer. If set to False, no bias term will be used.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

reset_parameters()[source]#