gigl.common.beam.coders#
Classes#
| Used as a dummy coder to just pass through the value without any special processing | |
| Encode pyarrow.RecordBatch to serialized tf.train.Example(s) | |
| Can be used on runtime to encode msgs to tf.Example proto msgs | 
Module Contents#
- class gigl.common.beam.coders.PassthroughCoder[source]#
- Bases: - apache_beam.coders.Coder- Used as a dummy coder to just pass through the value without any special processing - encode(value)[source]#
- Encodes the given object into a byte string. - Parameters:
- value (Any) 
- Return type:
- bytes 
 
 - is_deterministic()[source]#
- Whether this coder is guaranteed to encode values deterministically. - A deterministic coder is required for key coders in GroupByKey operations to produce consistent results. - For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently. - Returns:
- Whether coder is deterministic. 
- Return type:
- bool 
 
 
- class gigl.common.beam.coders.RecordBatchToTFExampleCoderFn[source]#
- Bases: - apache_beam.DoFn- Encode pyarrow.RecordBatch to serialized tf.train.Example(s) - process(element, transformed_metadata, *args, **kwargs)[source]#
- Note that transformed_metadata needs to be passed in as side input, i.e., as an argument of process function, instead of being passed to class init, since it could potentially materialize (depending on whether it is read from file or built by tft_beam.AnalyzeDataset) after the class is constructed. - Parameters:
- element (pa.RecordBatch) – A batch of records, e.g., a batch of transformed features 
- transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – containing the schema needed by RecordBatchToExamplesEncoder for encoding 
 
- Yields:
- bytes – serialized tf.Example 
- Return type:
- Iterable[bytes] 
 
 
- class gigl.common.beam.coders.RuntimeTFExampleProtoCoderFn[source]#
- Bases: - apache_beam.DoFn- Can be used on runtime to encode msgs to tf.Example proto msgs - process(element, transformed_metadata, *args, **kwargs)[source]#
- Note that transformed_metadata actually needs to be passed in as part of process rather than class init. This is because the transformed_metadata that gets passed in is a side input which only materializes as the true transformed-metadata when passed in as part of process. - Parameters:
- sample (dict[str, common_types.TensorType]) – TfExample Instance Dict 
- transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – Used to generate the ExampleProtoCoder 
- element (dict[str, tensorflow_transform.common_types.TensorType]) 
 
- Yields:
- tf.Example – Encoded tf.Example 
- Return type:
- Iterable[tensorflow.train.Example] 
 
 
