gigl.src.inference.v1.lib.utils#
Attributes#
Classes#
A transform object used to modify one or more PCollections. |
Functions#
Gets the beam pipeline for running the inference dataflow job |
Module Contents#
- class gigl.src.inference.v1.lib.utils.UnenumerateAssets(tagged_output_key)[source]#
Bases:
apache_beam.PTransform
A transform object used to modify one or more PCollections.
Subclasses must define an expand() method that will be used when the transform is applied to some arguments. Typical usage pattern will be:
input | CustomTransform(…)
The expand() method of the CustomTransform object passed in will be called with input as an argument.
- Parameters:
tagged_output_key (str)
- expand(pcolls)[source]#
Performs unenumeration on two PCollections through a join between the two collections. The first PCollection should contain the DEFAULT_NODE_ID_FIELD and either DEFAULT_PREDICTION_FIELD or DEFAULT_EMBEDDING_FIELD columns. The second PCollection should contain the DEFAULT_ENUMERATED_NODE_ID_FIELD and DEFAULT_ORIGINAL_NODE_ID_FIELD columns. The two pcollections will be joined by the values in the DEFAULT_NODE_ID_FIELD and DEFAULT_ENUMERATED_NODE_ID_FIELD columns.
- Parameters:
pcolls (Tuple[apache_beam.pvalue.PCollection, apache_beam.pvalue.PCollection])
- Return type:
apache_beam.pvalue.PCollection
- gigl.src.inference.v1.lib.utils.get_inferencer_pipeline_component_for_single_node_type(gbml_config_pb_wrapper, inference_blueprint, applied_task_identifier, custom_worker_image_uri, node_type, uri_prefix_list, temp_predictions_gcs_path, temp_embeddings_gcs_path)[source]#
Gets the beam pipeline for running the inference dataflow job :param gbml_config_pb_wrapper: GBML config wrapper for this inference run :type gbml_config_pb_wrapper: GbmlConfigPbWrapper :param inference_blueprint: Blueprint for running and saving inference for GBML pipelines :type inference_blueprint: BaseInferenceBlueprint :param applied_task_identifier: Identifier for the GiGL job :type applied_task_identifier: AppliedTaskIdentifier :param custom_worker_image_uri: Uri to custom worker image :type custom_worker_image_uri: Optional[str] :param node_type: Node type being inferred :type node_type: NodeType :param uri_prefix_list: List of prefixes for running inference for given node type :type uri_prefix_list: List[Uri] :param temp_predictions_gcs_path: Gcs uri for writing temp predictions :type temp_predictions_gcs_path: Optional[GcsUri] :param temp_embeddings_gcs_path: Gcs uri for writing temp embeddings :type temp_embeddings_gcs_path: Optional[GcsUri]
- Returns:
Dataflow pipeline for running inference
- Return type:
pipeline (beam.Pipeline)
- Parameters:
gbml_config_pb_wrapper (gigl.src.common.types.pb_wrappers.gbml_config.GbmlConfigPbWrapper)
inference_blueprint (gigl.src.inference.v1.lib.base_inference_blueprint.BaseInferenceBlueprint)
applied_task_identifier (gigl.src.common.types.AppliedTaskIdentifier)
custom_worker_image_uri (Optional[str])
node_type (gigl.src.common.types.graph_data.NodeType)
uri_prefix_list (List[gigl.common.Uri])
temp_predictions_gcs_path (Optional[gigl.common.GcsUri])
temp_embeddings_gcs_path (Optional[gigl.common.GcsUri])