gigl.distributed.dist_ppr_sampler#

Attributes#

`PPR_EDGE_INDEX_METADATA_KEY`
`PPR_WEIGHT_METADATA_KEY`

Classes#

DistPPRNeighborSampler

Personalized PageRank (PPR) based distributed neighbor sampler.

Module Contents#

class gigl.distributed.dist_ppr_sampler.DistPPRNeighborSampler(*args, alpha=0.5, eps=0.0001, max_ppr_nodes=50, num_neighbors_per_hop=100000, degree_tensors, max_fetch_iterations=None, **kwargs)[source]#

Bases: gigl.distributed.base_sampler.BaseDistNeighborSampler

Personalized PageRank (PPR) based distributed neighbor sampler.

Extends BaseGiGLSampler (which provides shared input preparation utilities) and overrides _sample_from_nodes with PPR-based neighbor selection.

Instead of uniform random sampling, this sampler uses Personalized PageRank (PPR) scores to select the most relevant neighbors for each seed node. PPR scores are approximated here using the Forward Push algorithm (Andersen et al., 2006).

This sampler supports both homogeneous and heterogeneous graphs. For heterogeneous graphs, the PPR algorithm traverses across all edge types, switching edge types based on the current node type and the configured edge direction.

The edge_index and edge_attr fields on the output Data/HeteroData objects are populated with PPR seed-to-neighbor relationships (not edges in the original graph). N is the total number of (seed, neighbor) pairs across all seeds in the batch.

Homogeneous (Data):

data.edge_index: [2, N] int64 — row 0 is local seed indices, row 1 is local neighbor indices.
data.edge_attr: [N] float — PPR score for each pair.

Heterogeneous (HeteroData) — one PPR edge type per (seed_type, neighbor_type) pair, with "ppr" as the relation:

data[(seed_type, "ppr", ntype)].edge_index: same format as above.

data[(seed_type, "ppr", ntype)].edge_attr: same format as above.

Parameters:

alpha (float) – Restart probability (teleport probability back to seed). Higher values keep samples closer to seeds. Typical values: 0.15-0.25.
eps (float) – Convergence threshold. Smaller values give more accurate PPR scores but require more computation. Typical values: 1e-4 to 1e-6.
max_ppr_nodes (int) – Maximum number of nodes to return per seed based on PPR scores.
num_neighbors_per_hop (int) – Maximum number of neighbors to fetch per hop.
degree_tensors (Union[torch.Tensor, dict[graphlearn_torch.typing.NodeType, torch.Tensor]]) – Pre-computed total-degree tensors (int32). Homogeneous graphs use a single tensor; heterogeneous graphs use tensors keyed by NodeType. The colocated and graph-store loader paths retrieve these through DistDataset.degree_tensor and move them to shared memory before worker handoff.
max_fetch_iterations (Optional[int])

gigl.distributed.dist_ppr_sampler.PPR_EDGE_INDEX_METADATA_KEY = 'ppr_edge_index.'[source]#

gigl.distributed.dist_ppr_sampler.PPR_WEIGHT_METADATA_KEY = 'ppr_weight.'[source]#