gigl.src.data_preprocessor.lib.data_preprocessor_config#

Attributes#

Classes#

DataPreprocessorConfig

Users should inherit from this and define the relevant specs for their preprocessing job.

Functions#

build_ingestion_feature_spec_fn([fixed_string_fields, ...])

Returns a callable, which when called, generates the FeatureSpecDict which lets TFTransform know how to

build_passthrough_transform_preprocessing_fn()

Produces a callable which acts as a pass-through preprocessing_fn for TFT to use. In other words, it simply

Module Contents#

class gigl.src.data_preprocessor.lib.data_preprocessor_config.DataPreprocessorConfig[source]#

Bases: abc.ABC

Users should inherit from this and define the relevant specs for their preprocessing job.

abstract get_edges_preprocessing_spec()[source]#

Defines transformation imperatives for different edge types

Return type:

Dict[gigl.src.data_preprocessor.lib.ingest.reference.EdgeDataReference, gigl.src.data_preprocessor.lib.types.EdgeDataPreprocessingSpec]

abstract get_nodes_preprocessing_spec()[source]#

Defines transformation imperatives for different node types

Return type:

Dict[gigl.src.data_preprocessor.lib.ingest.reference.NodeDataReference, gigl.src.data_preprocessor.lib.types.NodeDataPreprocessingSpec]

prepare_for_pipeline(applied_task_identifier)[source]#

This function is called at the very start of the pipeline before enumerator and datapreprocessor. This function does not return anything. It can be overwritten to perform any operation needed before running the pipeline, such as gathering data for node and edge sources

Parameters:

applied_task_identifier (gigl.src.common.types.AppliedTaskIdentifier)

Return type:

None

gigl.src.data_preprocessor.lib.data_preprocessor_config.build_ingestion_feature_spec_fn(fixed_string_fields=None, fixed_string_field_shapes={}, fixed_float_fields=None, fixed_float_field_shapes={}, fixed_int_fields=None, fixed_int_field_shapes={}, varlen_string_fields=None, varlen_float_fields=None, varlen_int_fields=None)[source]#

Returns a callable, which when called, generates the FeatureSpecDict which lets TFTransform know how to construe input data as tensors.

Parameters:
  • fixed_string_fields (Optional[List[str]]) – Fixed-length string features.

  • fixed_string_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length string features.

  • fixed_float_fields (Optional[List[str]]) – Fixed-length float features.

  • fixed_float_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length float features.

  • fixed_int_fields (Optional[List[str]]) – Fixed-length int features.

  • fixed_int_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length int features.

  • varlen_string_fields (Optional[List[str]]) – Variable-length string features.

  • varlen_float_fields (Optional[List[str]]) – Variable-length float features.

  • varlen_int_fields (Optional[List[str]]) – Variable-length int features.

Returns:

Return type:

Callable[[], gigl.src.data_preprocessor.lib.types.FeatureSpecDict]

gigl.src.data_preprocessor.lib.data_preprocessor_config.build_passthrough_transform_preprocessing_fn()[source]#

Produces a callable which acts as a pass-through preprocessing_fn for TFT to use. In other words, it simply passes all keys available in the input onwards to the output.

See https://www.tensorflow.org/tfx/tutorials/transform/census#create_a_tftransform_preprocessing_fn/ for details. :return:

Return type:

Callable[[gigl.src.data_preprocessor.lib.types.TFTensorDict], gigl.src.data_preprocessor.lib.types.TFTensorDict]

gigl.src.data_preprocessor.lib.data_preprocessor_config.logger[source]#