gigl.transforms.graph_transformer#

Transform HeteroData to Graph Transformer sequence input.

This module provides functionality to convert PyG HeteroData objects (typically batched 2-hop subgraphs) into sequence format suitable for Graph Transformers.

For each anchor node in the batch, the transform extracts its k-hop neighborhood and creates a fixed-length sequence of node features with padding.

Example Usage:

>>> from torch_geometric.data import HeteroData
>>> from gigl.transforms.graph_transformer import heterodata_to_graph_transformer_input
>>>
>>> # Create batched HeteroData (e.g., from NeighborLoader)
>>> # First batch_size nodes in each node type are anchor nodes
>>> data = HeteroData()
>>> data['user'].x = torch.randn(100, 64)  # 100 users, first N are anchors
>>> data['item'].x = torch.randn(50, 32)
>>> data['user', 'buys', 'item'].edge_index = ...
>>>
>>> # Transform to Graph Transformer input
>>> sequences, valid_mask, attention_bias_data = heterodata_to_graph_transformer_input(
...     data=data,
...     batch_size=32,
...     max_seq_len=128,
...     anchor_node_type='user',
... )
>>> # sequences: (batch_size, max_seq_len, feature_dim)
>>> # valid_mask: (batch_size, max_seq_len)

With Relative Encodings: Relative encodings stored as sparse graph-level attributes can be returned as raw attention-bias features:

>>> from torch_geometric.transforms import Compose
>>> from gigl.transforms.add_positional_encodings import (
...     AddHeteroRandomWalkEncodings,
...     AddHeteroHopDistanceEncoding,
... )
>>>
>>> # First apply PE transforms to the data
>>> pe_transform = Compose([
...     AddHeteroRandomWalkEncodings(walk_length=8),
...     AddHeteroHopDistanceEncoding(h_max=5),
... ])
>>> data = pe_transform(data)
>>>
>>> # Transform to sequences with relative bias features
>>> sequences, valid_mask, attention_bias_data = heterodata_to_graph_transformer_input(
...     data=data,
...     batch_size=32,
...     max_seq_len=128,
...     anchor_node_type='user',
...     anchor_based_attention_bias_attr_names=['hop_distance'],
... )
>>> # sequences: (batch_size, max_seq_len, feature_dim)
>>> # attention_bias_data['anchor_bias']: (batch_size, max_seq_len, 1)

Attributes#

`PPR_WEIGHT_FEATURE_NAME`
`TokenInputData`

Classes#

SequenceAuxiliaryData

dict() -> new empty dictionary

Functions#

heterodata_to_graph_transformer_input(data, ...[, ...])

Transform a HeteroData object to Graph Transformer sequence input.

Module Contents#

class gigl.transforms.graph_transformer.SequenceAuxiliaryData[source]#

Bases: TypedDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:: d = {} for k, v in iterable:

d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs: in the keyword argument list. For example: dict(one=1, two=2)

Initialize self. See help(type(self)) for accurate signature.

anchor_bias: torch.Tensor | None[source]#

pairwise_bias: torch.Tensor | None[source]#

pairwise_nonmissing_indices: torch.Tensor | None[source]#

pairwise_relation_indices: torch.Tensor | None[source]#

token_input: TokenInputData | None[source]#

gigl.transforms.graph_transformer.heterodata_to_graph_transformer_input(data, batch_size, max_seq_len, anchor_node_type, anchor_node_ids=None, hop_distance=2, sequence_construction_method='khop', include_anchor_first=True, padding_value=0.0, anchor_based_attention_bias_attr_names=None, anchor_based_input_attr_names=None, pairwise_attention_bias_attr_names=None, relation_edge_types=None, sampling_direction='out')[source]#

Transform a HeteroData object to Graph Transformer sequence input.

Given a batched HeteroData object where the first batch_size nodes of anchor_node_type are anchor nodes, this function extracts the k-hop neighborhood for each anchor and creates padded sequences.

Uses sparse matrix operations for efficient batched k-hop neighbor extraction.

Parameters:

data (torch_geometric.data.HeteroData) – HeteroData object containing node features and edge indices. Expected to have node features as data[node_type].x. All node types must have the same feature dimension.
batch_size (int) – Number of anchor nodes (first batch_size nodes of anchor_node_type). Ignored if anchor_node_ids is provided.
max_seq_len (int) – Maximum sequence length (neighbors beyond this are truncated).
anchor_node_type (torch_geometric.typing.NodeType) – The node type of anchor nodes.
anchor_node_ids (Optional[torch.Tensor]) – Optional tensor of local node indices within anchor_node_type to use as anchors. If None, uses first batch_size nodes. (default: None)
hop_distance (int) – Number of hops to consider for neighborhood when sequence_construction_method="khop". (default: 2)
sequence_construction_method (Literal['khop', 'ppr']) – Strategy used to build per-anchor sequences. "khop" performs the existing k-hop expansion over the sampled graph. "ppr" uses outgoing (anchor_type, "ppr", neighbor_type) edges, sorted by descending PPR weight from edge_attr. (default: "khop")
include_anchor_first (bool) – If True, anchor node is always first in sequence.
padding_value (float) – Value to use for padding (default: 0.0).
anchor_based_attention_bias_attr_names (Optional[list[str]]) – List of anchor-relative feature names used as attention bias. Sparse graph-level attributes are looked up from data and the reserved name "ppr_weight" resolves to PPR edge weights in PPR sequence mode. Example: [‘hop_distance’, ‘ppr_weight’].
anchor_based_input_attr_names (Optional[list[str]]) – List of anchor-relative attribute names returned as token-aligned model-input features. Sparse graph-level attributes are looked up from data and "ppr_weight" resolves to PPR edge weights in PPR sequence mode. Example: [‘hop_distance’, ‘ppr_weight’].
pairwise_attention_bias_attr_names (Optional[list[str]]) – List of pairwise feature names used as attention bias. These must correspond to sparse graph-level attributes on data. Example: [‘pairwise_distance’].
relation_edge_types (Optional[list[gigl.src.common.types.graph_data.EdgeType]]) – Optional ordered edge types used to materialize sparse relation coordinates. Each output relation index corresponds to one edge type in this list. Directed edges are stored as (batch_idx, query_pos=dst_token, key_pos=src_token, relation_idx).
sampling_direction (Literal['in', 'out']) – Direction used for token construction. "out" preserves the existing k-hop reachability expansion. "in" expands over reversed edges and is supported only with sequence_construction_method="khop". Directed relative encodings such as "hop_distance" should be computed with the same direction.

Returns:

sequences: (batch_size, max_seq_len, feature_dim) padded node features: taken directly from data[node_type].x in homogeneous order.
valid_mask: (batch_size, max_seq_len) bool tensor indicating which: sequence positions correspond to real nodes.
sequence_auxiliary_data: dictionary of raw token-aligned and: attention-bias features with: "anchor_bias" shaped (batch, seq, num_anchor_attrs) or None "pairwise_bias" shaped (batch, seq, seq, num_pairwise_attrs) or None "pairwise_relation_indices" shaped (num_relation_edges, 4) or None, storing (batch_idx, query_pos, key_pos, relation_idx) coordinates "pairwise_nonmissing_indices" shaped (num_pairs, 3) or None, storing (batch_idx, row_pos, col_pos) coordinates for nonmissing pairwise entries "token_input" as a dict mapping attribute name to a (batch, seq, 1) tensor, or None

Return type:

(sequences, valid_mask, attention_bias_data), where

Raises:

ValueError – If node types have different feature dimensions.
ValueError – If no node features exist in the data.

gigl.transforms.graph_transformer.PPR_WEIGHT_FEATURE_NAME = 'ppr_weight'[source]#

gigl.transforms.graph_transformer.TokenInputData[source]#