gigl.distributed.dist_ppr_sampler#
Attributes#
Classes#
Personalized PageRank (PPR) based neighbor sampler that inherits from GLT DistNeighborSampler. |
Module Contents#
- class gigl.distributed.dist_ppr_sampler.DistPPRNeighborSampler(*args, alpha=0.5, eps=0.0001, max_ppr_nodes=50, num_neighbors_per_hop=100000, total_degree_dtype=torch.int32, degree_tensors, **kwargs)[source]#
Bases:
gigl.distributed.dist_neighbor_sampler.DistNeighborSamplerPersonalized PageRank (PPR) based neighbor sampler that inherits from GLT DistNeighborSampler.
Instead of uniform random sampling, this sampler uses Personalized PageRank (PPR) scores to select the most relevant neighbors for each seed node. PPR scores are approximated here using the Forward Push algorithm (Andersen et al., 2006).
This sampler supports both homogeneous and heterogeneous graphs. For heterogeneous graphs, the PPR algorithm traverses across all edge types, switching edge types based on the current node type and the configured edge direction.
The
edge_indexandedge_attrfields on the output Data/HeteroData objects are populated with PPR seed-to-neighbor relationships (not edges in the original graph).Nis the total number of (seed, neighbor) pairs across all seeds in the batch.- Homogeneous (Data):
data.edge_index:[2, N]int64 — row 0 is local seed indices, row 1 is local neighbor indices.data.edge_attr:[N]float — PPR score for each pair.
Heterogeneous (HeteroData) — one PPR edge type per
(seed_type, neighbor_type)pair, with"ppr"as the relation:data[(seed_type, "ppr", ntype)].edge_index: same format as above.data[(seed_type, "ppr", ntype)].edge_attr: same format as above.
- Parameters:
alpha (float) – Restart probability (teleport probability back to seed). Higher values keep samples closer to seeds. Typical values: 0.15-0.25.
eps (float) – Convergence threshold. Smaller values give more accurate PPR scores but require more computation. Typical values: 1e-4 to 1e-6.
max_ppr_nodes (int) – Maximum number of nodes to return per seed based on PPR scores.
num_neighbors_per_hop (int) – Maximum number of neighbors to fetch per hop.
total_degree_dtype (torch.dtype) – Dtype for precomputed total-degree tensors. Defaults to
torch.int32, which supports total degrees up to ~2 billion. Use a larger dtype if nodes have exceptionally high aggregate degrees.degree_tensors (Union[torch.Tensor, dict[graphlearn_torch.typing.EdgeType, torch.Tensor]])