gigl.common.beam.coders#

Classes#

PassthroughCoder

Used as a dummy coder to just pass through the value without any special processing

RecordBatchToTFExampleCoderFn

Encode pyarrow.RecordBatch to serialized tf.train.Example(s)

RuntimeTFExampleProtoCoderFn

Can be used on runtime to encode msgs to tf.Example proto msgs

Module Contents#

class gigl.common.beam.coders.PassthroughCoder[source]#

Bases: apache_beam.coders.Coder

Used as a dummy coder to just pass through the value without any special processing

decode(encoded)[source]#

Decodes the given byte string into the corresponding object.

encode(value)[source]#

Encodes the given object into a byte string.

Parameters:

value (Any)

Return type:

bytes

is_deterministic()[source]#

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:

Whether coder is deterministic.

Return type:

bool

class gigl.common.beam.coders.RecordBatchToTFExampleCoderFn[source]#

Bases: apache_beam.DoFn

Encode pyarrow.RecordBatch to serialized tf.train.Example(s)

process(element, transformed_metadata, *args, **kwargs)[source]#

Note that transformed_metadata needs to be passed in as side input, i.e., as an argument of process function, instead of being passed to class init, since it could potentially materialize (depending on whether it is read from file or built by tft_beam.AnalyzeDataset) after the class is constructed.

Parameters:
  • element (pa.RecordBatch) – A batch of records, e.g., a batch of transformed features

  • transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – containing the schema needed by RecordBatchToExamplesEncoder for encoding

Yields:

bytes – serialized tf.Example

Return type:

Iterable[bytes]

class gigl.common.beam.coders.RuntimeTFExampleProtoCoderFn[source]#

Bases: apache_beam.DoFn

Can be used on runtime to encode msgs to tf.Example proto msgs

process(element, transformed_metadata, *args, **kwargs)[source]#

Note that transformed_metadata actually needs to be passed in as part of process rather than class init. This is because the transformed_metadata that gets passed in is a side input which only materializes as the true transformed-metadata when passed in as part of process.

Parameters:
  • sample (Dict[str, common_types.TensorType]) – TfExample Instance Dict

  • transformed_metadata (tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata) – Used to generate the ExampleProtoCoder

  • element (Dict[str, tensorflow_transform.common_types.TensorType])

Yields:

tf.Example – Encoded tf.Example

Return type:

Iterable[tensorflow.train.Example]