gigl.distributed.sampler_options#

Sampler option types for configuring which sampler class to use in distributed loading.

Provides KHopNeighborSamplerOptions for using GiGL’s built-in DistNeighborSampler, and PPRSamplerOptions for PPR-based sampling using DistPPRNeighborSampler.

Frozen dataclasses so they are safe to pickle across RPC boundaries (required for Graph Store mode).

Attributes#

Classes#

KHopNeighborSamplerOptions

Default sampler options using GiGL's DistNeighborSampler.

PPRSamplerOptions

Sampler options for PPR-based neighbor sampling using DistPPRNeighborSampler.

Functions#

resolve_sampler_options(num_neighbors, sampler_options)

Resolve sampler_options from user-provided values.

Module Contents#

class gigl.distributed.sampler_options.KHopNeighborSamplerOptions[source]#

Default sampler options using GiGL’s DistNeighborSampler.

num_neighbors[source]#

Fanout per hop, either a flat list (homogeneous) or a dict mapping edge types to per-hop fanout lists (heterogeneous).

num_neighbors: list[int] | dict[graphlearn_torch.typing.EdgeType, list[int]][source]#
class gigl.distributed.sampler_options.PPRSamplerOptions[source]#

Sampler options for PPR-based neighbor sampling using DistPPRNeighborSampler.

Output format: When this sampler is active, each output Data/HeteroData batch contains only PPR edges — no message-passing edges from the original graph are included. For each (seed_type, neighbor_type) pair reachable via PPR walks, the batch will have an edge type (seed_type, "ppr", neighbor_type) with:

  • edge_index: [2, N] int64 — row 0 is local seed indices, row 1 is local neighbor indices.

  • edge_attr: [N] float — PPR score for each (seed, neighbor) pair.

For homogeneous graphs these live directly on data.edge_index / data.edge_attr.

alpha[source]#

Restart probability (teleport probability back to seed). Higher values keep samples closer to seeds. Typical values: 0.15-0.25.

eps[source]#

Convergence threshold for the Forward Push algorithm. Smaller values give more accurate PPR scores but require more computation. Typical values: 1e-4 to 1e-6.

max_ppr_nodes[source]#

Maximum number of nodes to return per seed based on PPR scores.

num_neighbors_per_hop[source]#

Maximum number of neighbors fetched per node per edge type during PPR traversal. Set large to approximate fetching all neighbors.

total_degree_dtype[source]#

Dtype for precomputed total-degree tensors. Defaults to torch.int32, which supports total degrees up to ~2 billion. Use a larger dtype if nodes have exceptionally high aggregate degrees.

alpha: float = 0.5[source]#
eps: float = 0.0001[source]#
max_ppr_nodes: int = 50[source]#
num_neighbors_per_hop: int = 100000[source]#
total_degree_dtype: torch.dtype = Ellipsis[source]#
gigl.distributed.sampler_options.resolve_sampler_options(num_neighbors, sampler_options)[source]#

Resolve sampler_options from user-provided values.

If sampler_options is a PPRSamplerOptions, returns it directly (num_neighbors is unused for PPR). If sampler_options is None, wraps num_neighbors in a KHopNeighborSamplerOptions. If KHopNeighborSamplerOptions is provided, validates that its num_neighbors matches the explicit value.

Parameters:
  • num_neighbors (Union[list[int], dict[graphlearn_torch.typing.EdgeType, list[int]]]) – Fanout per hop (required for KHop; ignored for PPR).

  • sampler_options (Optional[SamplerOptions]) – Sampler configuration, or None.

Returns:

The resolved SamplerOptions.

Raises:

ValueError – If KHopNeighborSamplerOptions.num_neighbors conflicts with the explicit num_neighbors.

Return type:

SamplerOptions

gigl.distributed.sampler_options.SamplerOptions[source]#
gigl.distributed.sampler_options.logger[source]#