PySpark extension types - AWS Glue
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

PySpark extension types

The types that are used by the AWS Glue PySpark extensions.

DataType

The base class for the other AWS Glue types.

__init__(properties={})
  • properties – Properties of the data type (optional).

typeName(cls)

Returns the type of the AWS Glue type class (that is, the class name with "Type" removed from the end).

  • cls – An AWS Glue class instance derived from DataType.

jsonValue( )

Returns a JSON object that contains the data type and properties of the class:

{ "dataType": typeName, "properties": properties }

AtomicType and simple derivatives

Inherits from and extends the DataType class, and serves as the base class for all the AWS Glue atomic data types.

fromJsonValue(cls, json_value)

Initializes a class instance with values from a JSON object.

  • cls – An AWS Glue type class instance to initialize.

  • json_value – The JSON object to load key-value pairs from.

The following types are simple derivatives of the AtomicType class:

  • BinaryType – Binary data.

  • BooleanType – Boolean values.

  • ByteType – A byte value.

  • DateType – A datetime value.

  • DoubleType – A floating-point double value.

  • IntegerType – An integer value.

  • LongType – A long integer value.

  • NullType – A null value.

  • ShortType – A short integer value.

  • StringType – A text string.

  • TimestampType – A timestamp value (typically in seconds from 1/1/1970).

  • UnknownType – A value of unidentified type.

DecimalType(AtomicType)

Inherits from and extends the AtomicType class to represent a decimal number (a number expressed in decimal digits, as opposed to binary base-2 numbers).

__init__(precision=10, scale=2, properties={})
  • precision – The number of digits in the decimal number (optional; the default is 10).

  • scale – The number of digits to the right of the decimal point (optional; the default is 2).

  • properties – The properties of the decimal number (optional).

EnumType(AtomicType)

Inherits from and extends the AtomicType class to represent an enumeration of valid options.

__init__(options)
  • options – A list of the options being enumerated.

 collection types

ArrayType(DataType)

__init__(elementType=UnknownType(), properties={})
  • elementType – The type of elements in the array (optional; the default is UnknownType).

  • properties – Properties of the array (optional).

ChoiceType(DataType)

__init__(choices=[], properties={})
  • choices – A list of possible choices (optional).

  • properties – Properties of these choices (optional).

add(new_choice)

Adds a new choice to the list of possible choices.

  • new_choice – The choice to add to the list of possible choices.

merge(new_choices)

Merges a list of new choices with the existing list of choices.

  • new_choices – A list of new choices to merge with existing choices.

MapType(DataType)

__init__(valueType=UnknownType, properties={})
  • valueType – The type of values in the map (optional; the default is UnknownType).

  • properties – Properties of the map (optional).

Field(Object)

Creates a field object out of an object that derives from DataType.

__init__(name, dataType, properties={})
  • name – The name to be assigned to the field.

  • dataType – The object to create a field from.

  • properties – Properties of the field (optional).

StructType(DataType)

Defines a data structure (struct).

__init__(fields=[], properties={})
  • fields – A list of the fields (of type Field) to include in the structure (optional).

  • properties – Properties of the structure (optional).

add(field)
  • field – An object of type Field to add to the structure.

hasField(field)

Returns True if this structure has a field of the same name, or False if not.

  • field – A field name, or an object of type Field whose name is used.

getField(field)
  • field – A field name or an object of type Field whose name is used. If the structure has a field of the same name, it is returned.

EntityType(DataType)

__init__(entity, base_type, properties)

This class is not yet implemented.

 other types

DataSource(object)

__init__(j_source, sql_ctx, name)
  • j_source – The data source.

  • sql_ctx – The SQL context.

  • name – The data-source name.

setFormat(format, **options)

getFrame()

Returns a DynamicFrame for the data source.

DataSink(object)

__init__(j_sink, sql_ctx)
  • j_sink – The sink to create.

  • sql_ctx – The SQL context for the data sink.

setFormat(format, **options)

setAccumulableSize(size)
  • size – The accumulable size to set, in bytes.

writeFrame(dynamic_frame, info="")
  • dynamic_frame – The DynamicFrame to write.

  • info – Information about the DynamicFrame (optional).

write(dynamic_frame_or_dfc, info="")

Writes a DynamicFrame or a DynamicFrameCollection.

  • dynamic_frame_or_dfc – Either a DynamicFrame object or a DynamicFrameCollection object to be written.

  • info – Information about the DynamicFrame or DynamicFrames to be written (optional).