gigl.common.beam.coders#
Classes#
Used as a dummy coder to just pass through the value without any special processing |
|
Encode pyarrow.RecordBatch to serialized tf.train.Example(s) |
|
Can be used on runtime to encode msgs to tf.Example proto msgs |
Module Contents#
- class gigl.common.beam.coders.PassthroughCoder[source]#
Bases:
apache_beam.coders.Coder
Used as a dummy coder to just pass through the value without any special processing
- encode(value)[source]#
Encodes the given object into a byte string.
- Parameters:
value (Any)
- Return type:
bytes
- is_deterministic()[source]#
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- Return type:
bool
- class gigl.common.beam.coders.RecordBatchToTFExampleCoderFn[source]#
Bases:
apache_beam.DoFn
Encode pyarrow.RecordBatch to serialized tf.train.Example(s)
- process(element, transformed_metadata, *args, **kwargs)[source]#
Note that transformed_metadata needs to be passed in as side input, i.e., as an argument of process function, instead of being passed to class init, since it could potentially materialize (depending on whether it is read from file or built by tft_beam.AnalyzeDataset) after the class is constructed.
- Parameters:
element (pa.RecordBatch) – A batch of records, e.g., a batch of transformed features
transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – containing the schema needed by RecordBatchToExamplesEncoder for encoding
- Yields:
bytes – serialized tf.Example
- Return type:
Iterable[bytes]
- class gigl.common.beam.coders.RuntimeTFExampleProtoCoderFn[source]#
Bases:
apache_beam.DoFn
Can be used on runtime to encode msgs to tf.Example proto msgs
- process(element, transformed_metadata, *args, **kwargs)[source]#
Note that transformed_metadata actually needs to be passed in as part of process rather than class init. This is because the transformed_metadata that gets passed in is a side input which only materializes as the true transformed-metadata when passed in as part of process.
- Parameters:
sample (Dict[str, common_types.TensorType]) – TfExample Instance Dict
transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – Used to generate the ExampleProtoCoder
element (Dict[str, tensorflow_transform.common_types.TensorType])
- Yields:
tf.Example – Encoded tf.Example
- Return type:
Iterable[tensorflow.train.Example]