gigl.distributed.sampler_options#

Sampler option types for configuring which BaseGiGLSampler subclass to use in distributed loading.

Provides KHopNeighborSamplerOptions for k-hop sampling via DistNeighborSampler, and PPRSamplerOptions for PPR-based sampling via DistPPRNeighborSampler.

Frozen dataclasses so they are safe to pickle across RPC boundaries (required for Graph Store mode).

Attributes#

`SamplerOptions`
`logger`

Classes#

`KHopNeighborSamplerOptions`	Sampler options for k-hop neighbor sampling via DistNeighborSampler.
`PPRSamplerOptions`	Sampler options for PPR-based neighbor sampling using DistPPRNeighborSampler.

Functions#

resolve_sampler_options(num_neighbors, sampler_options)

Resolve sampler_options from user-provided values.

Module Contents#

class gigl.distributed.sampler_options.KHopNeighborSamplerOptions[source]#

Sampler options for k-hop neighbor sampling via DistNeighborSampler.

num_neighbors[source]#: Fanout per hop, either a flat list (homogeneous) or a dict mapping edge types to per-hop fanout lists (heterogeneous).

num_neighbors: list[int] | dict[graphlearn_torch.typing.EdgeType, list[int]][source]#

class gigl.distributed.sampler_options.PPRSamplerOptions[source]#

Sampler options for PPR-based neighbor sampling using DistPPRNeighborSampler.

Output format: When this sampler is active, each output Data/HeteroData batch contains only PPR edges — no message-passing edges from the original graph are included. For each (seed_type, neighbor_type) pair reachable via PPR walks, the batch will have an edge type (seed_type, "ppr", neighbor_type) with:

edge_index: [2, N] int64 — row 0 is local seed indices, row 1 is local neighbor indices.
edge_attr: [N] float — PPR score for each (seed, neighbor) pair.

For homogeneous graphs these live directly on data.edge_index / data.edge_attr.

alpha[source]#: Restart probability (teleport probability back to seed). Higher values keep samples closer to seeds. Typical values: 0.15-0.25.

eps[source]#: Convergence threshold for the Forward Push algorithm. Smaller values give more accurate PPR scores but require more computation. Typical values: 1e-4 to 1e-6.

max_ppr_nodes[source]#: Maximum number of nodes to return per seed based on PPR scores.

num_neighbors_per_hop[source]#: Maximum number of neighbors fetched per node per edge type during PPR traversal. 1000 is sufficient in practice — high-degree hub nodes receive diminishing residual per neighbor, so capping the fetch has little effect on PPR accuracy while keeping per-hop RPC cost bounded. Set large to approximate fetching all neighbors.

max_fetch_iterations[source]#: Maximum number of iterations that issue RPC neighbor fetches. After this many fetch iterations, subsequent iterations push residuals using only already-cached neighbor lists (no new RPCs). The algorithm still runs to convergence — re-enqueued nodes propagate through cached neighbors at negligible cost. None (default) means no fetch limit.

alpha: float = 0.5[source]#

eps: float = 0.0001[source]#

max_fetch_iterations: int | None = None[source]#

max_ppr_nodes: int = 50[source]#

num_neighbors_per_hop: int = 1000[source]#

gigl.distributed.sampler_options.resolve_sampler_options(num_neighbors, sampler_options)[source]#

Resolve sampler_options from user-provided values.

If sampler_options is a PPRSamplerOptions, returns it directly (num_neighbors is unused for PPR). If sampler_options is None, wraps num_neighbors in a KHopNeighborSamplerOptions. If KHopNeighborSamplerOptions is provided, validates that its num_neighbors matches the explicit value.

Parameters:

num_neighbors (Union[list[int], dict[graphlearn_torch.typing.EdgeType, list[int]]]) – Fanout per hop (required for KHop; ignored for PPR).
sampler_options (Optional[SamplerOptions]) – Sampler configuration, or None.

Returns:

The resolved SamplerOptions.

Raises:

ValueError – If KHopNeighborSamplerOptions.num_neighbors conflicts with the explicit num_neighbors.

Return type:

SamplerOptions

gigl.distributed.sampler_options.SamplerOptions[source]#

gigl.distributed.sampler_options.logger[source]#