Post Processor#
The Post Processor is the final stage in the GiGL pipeline. It runs after inference, performs shared cleanup, and can optionally execute user-defined post-processing logic.
Input#
job_name (AppliedTaskIdentifier): which uniquely identifies an end-to-end task.
task_config_uri (Uri): Path which points to a frozen
GbmlConfigproto yaml file.resource_config_uri (Uri): Path which points to a
GiGLResourceConfigyaml.
What does it do?#
The Post Processor undertakes the following actions:
Reads the frozen
GbmlConfig.When the in-memory subgraph sampling path is enabled (
featureFlags.should_run_glt_backend), automatically unenumerates inference outputs by joining the enumerated embedding or prediction tables with the enumeration mapping tables written by the Data Preprocessor.Instantiates the user-defined post processor class from
postProcessorConfig.postProcessorClsPath, if one is configured.Writes any returned evaluation metrics to the configured post-processor metrics path.
Cleans up temporary assets unless
should_skip_automatic_temp_asset_cleanupis set.
Why unenumeration lives here#
The Data Preprocessor enumerates node IDs into compact integers so later stages can use more efficient internal IDs. For in-memory inference, embeddings and predictions are written for those enumerated IDs first. The Post Processor then restores the original user-facing IDs before downstream systems consume the outputs.
This makes the Post Processor part of the core in-memory subgraph sampling architecture rather than an optional afterthought.
How do I run it?#
Import GiGL
from gigl.src.post_process.post_processor import PostProcessor
from gigl.common import UriFactory
from gigl.src.common.types import AppliedTaskIdentifier
post_processor = PostProcessor()
post_processor.run(
applied_task_identifier=AppliedTaskIdentifier("sample_job_name"),
task_config_uri=UriFactory.create_uri("gs://MY TEMP ASSETS BUCKET/frozen_task_config.yaml"),
resource_config_uri=UriFactory.create_uri("gs://MY TEMP ASSETS BUCKET/resource_config.yaml"),
)
Command Line
python -m \
gigl.src.post_process.post_processor \
--job_name="sample_job_name" \
--task_config_uri="gs://MY TEMP ASSETS BUCKET/frozen_task_config.yaml" \
--resource_config_uri="gs://MY TEMP ASSETS BUCKET/resource_config.yaml"
Output#
The Post Processor may produce:
unenumerated embedding tables,
unenumerated prediction tables,
user-defined post-processed assets,
post-processor evaluation metrics.
Custom Usage#
If postProcessorConfig.postProcessorClsPath is set, GiGL instantiates that class and runs it after shared
unenumeration logic completes. Custom post processors should inherit from
gigl.src.post_process.lib.base_post_processor.BasePostProcessor.
Other#
The shared unenumeration logic is currently tied to the in-memory subgraph sampling path.
If no user-defined post processor is configured, GiGL still runs the shared cleanup and automatic unenumeration steps for in-memory sampling outputs.