gigl.src.data_preprocessor.lib.data_preprocessor_config#
Attributes#
Classes#
Users should inherit from this and define the relevant specs for their preprocessing job. |
Functions#
|
Returns a callable, which when called, generates the FeatureSpecDict which lets TFTransform know how to |
Produces a callable which acts as a pass-through preprocessing_fn for TFT to use. In other words, it simply |
Module Contents#
- class gigl.src.data_preprocessor.lib.data_preprocessor_config.DataPreprocessorConfig[source]#
Bases:
abc.ABC
Users should inherit from this and define the relevant specs for their preprocessing job.
- abstract get_edges_preprocessing_spec()[source]#
Defines transformation imperatives for different edge types
- abstract get_nodes_preprocessing_spec()[source]#
Defines transformation imperatives for different node types
- prepare_for_pipeline(applied_task_identifier)[source]#
This function is called at the very start of the pipeline before enumerator and datapreprocessor. This function does not return anything. It can be overwritten to perform any operation needed before running the pipeline, such as gathering data for node and edge sources
- Parameters:
applied_task_identifier (gigl.src.common.types.AppliedTaskIdentifier)
- Return type:
None
- gigl.src.data_preprocessor.lib.data_preprocessor_config.build_ingestion_feature_spec_fn(fixed_string_fields=None, fixed_string_field_shapes={}, fixed_float_fields=None, fixed_float_field_shapes={}, fixed_int_fields=None, fixed_int_field_shapes={}, varlen_string_fields=None, varlen_float_fields=None, varlen_int_fields=None)[source]#
Returns a callable, which when called, generates the FeatureSpecDict which lets TFTransform know how to construe input data as tensors.
- Parameters:
fixed_string_fields (Optional[List[str]]) – Fixed-length string features.
fixed_string_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length string features.
fixed_float_fields (Optional[List[str]]) – Fixed-length float features.
fixed_float_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length float features.
fixed_int_fields (Optional[List[str]]) – Fixed-length int features.
fixed_int_field_shapes (Dict[str, List[int]]) – Data shape lookup for fixed-length int features.
varlen_string_fields (Optional[List[str]]) – Variable-length string features.
varlen_float_fields (Optional[List[str]]) – Variable-length float features.
varlen_int_fields (Optional[List[str]]) – Variable-length int features.
- Returns:
- Return type:
Callable[[], gigl.src.data_preprocessor.lib.types.FeatureSpecDict]
- gigl.src.data_preprocessor.lib.data_preprocessor_config.build_passthrough_transform_preprocessing_fn()[source]#
Produces a callable which acts as a pass-through preprocessing_fn for TFT to use. In other words, it simply passes all keys available in the input onwards to the output.
See https://www.tensorflow.org/tfx/tutorials/transform/census#create_a_tftransform_preprocessing_fn/ for details. :return:
- Return type:
Callable[[gigl.src.data_preprocessor.lib.types.TFTensorDict], gigl.src.data_preprocessor.lib.types.TFTensorDict]