tlc.export.exporter¶

The base class for all Exporters.

Module Contents¶

Classes¶

Class

Description

Exporter

The base class for all Exporters.

RowExporter

Base class for row-by-row exporters where each row maps to one output line.

SerializingExporter

Base class for exporters that serialize a table to a single string/file.

API¶

class Exporter¶

The base class for all Exporters.

Exporters are used to export tables to various formats, typically after a user is done cleaning their data with 3LC. Subclasses of Exporter should be registered using the register_exporter() decorator, which makes them available for use in Table.export().

There are three patterns for implementing exporters:

  1. Row exporters (simplest): Subclass RowExporter and implement export_row(), which converts a single row to a string. Declare a separator and the framework handles iteration, weight filtering, joining, and writing.

  2. Serializing exporters (single-file output): Subclass SerializingExporter and implement the serialize() method, which returns a string to be written to the output URL.

  3. Free-form exporters (e.g., directory output): Subclass Exporter directly and override the internal export hooks (_do_export and _get_export_impl_method — see the source for the full contract).

Subclasses can also override the can_export() method, which determines whether the exporter can export a given table to a given URL. The default implementation checks the file_extensions class attribute. If neither can_export() nor file_extensions is provided, the exporter will only be used when the format argument is specified explicitly in Table.export().

Subclasses of Exporter must define the class attribute supported_format, which is a string indicating the format that the exporter supports. Whenever the format argument is not specified in Table.export(), it will call can_export() for all registered exporters to find compatible ones. If multiple exporters are compatible, the one with the highest priority will be used.

Variables:
  • supported_format – A string indicating the format that the exporter supports.

  • priority – An integer indicating the priority of the exporter. Used to break ties when multiple exporters are compatible with a given table and URL. The exporter with the highest priority will be used.

  • file_extensions – A frozenset of file extensions (e.g., {".csv"}) that the exporter handles. The base can_export() implementation checks if the output URL’s extension is in this set.

  • force – If True, allows this exporter to override an already-registered format during entry point discovery.

can_export(
table: Table,
output_url: Url,
) bool¶

Check if the exporter can export the given table to the given output URL.

The default implementation checks if the output URL’s extension is in file_extensions. Subclasses can override this for content-based detection (e.g., checking for annotation columns).

This method is called for all registered exporters when format is not specified in Table.export(), so it should be fast.

Parameters:
  • table – The table to export.

  • output_url – The URL to export to.

Returns:

True if the exporter can export the table to the given URL, False otherwise.

file_extensions: ClassVar[frozenset[str]] = frozenset(...)¶
force: ClassVar[bool] = False¶
priority: int = 0¶
static remaining_table_rows(
table: Table,
weight_threshold: float,
) Iterator[tlc._core.objects.table.TableRow]¶

Return an iterator of the remaining rows in the table after filtering out rows with a weight below the given threshold.

Parameters:
  • table – The table to filter.

  • weight_threshold – The weight threshold.

Returns:

An iterator of the remaining rows in the table.

supported_format: str = None¶
class RowExporter¶

Bases: tlc.export.exporter.SerializingExporter

Base class for row-by-row exporters where each row maps to one output line.

This is the simplest exporter pattern. Subclasses implement export_row() which converts a single table row to a string. The framework handles iteration, weight filtering, joining with separator, and writing to the output URL.

Example:

@register_exporter
class NdjsonExporter(RowExporter):
    supported_format = "ndjson"
    file_extensions = frozenset({".ndjson", ".jsonl"})
    separator = "\n"

    def export_row(self, row, **kwargs):
        import json
        return json.dumps(row)
Variables:

separator – The string used to join row outputs. Defaults to "\n".

abstract export_row(
row: tlc._core.objects.table.TableRow,
**kwargs: Any,
) str¶

Convert a single table row to a string.

Parameters:
  • row – A single row from the table (dict-like mapping column names to values).

  • **kwargs – Additional format-specific arguments.

Returns:

The string representation of the row.

separator: str = \n¶
serialize(
table: Table,
output_url: Url,
weight_threshold: float = 0.0,
**kwargs: Any,
) str¶

Serialize the table by converting each row and joining with the separator.

Parameters:
  • table – The table to serialize.

  • output_url – The URL to export to (available for path resolution).

  • weight_threshold – The weight threshold for filtering rows.

  • **kwargs – Additional arguments passed to export_row().

Returns:

The serialized table as a string.

class SerializingExporter¶

Bases: tlc.export.exporter.Exporter

Base class for exporters that serialize a table to a single string/file.

Subclasses must implement the serialize() method, which returns a string representation of the table. The string is then written to the output URL.

This is the most common exporter pattern, suitable for formats like JSON, CSV, and COCO.

abstract serialize(
table: Table,
output_url: Url,
weight_threshold: float = 0.0,
**kwargs: Any,
) str¶

Serialize a table to a string which can be written to a URL.

Parameters:
  • table – The table to serialize.

  • output_url – The URL to export to (available for path resolution).

  • weight_threshold – The weight threshold for filtering rows.

  • **kwargs – Additional format-specific arguments.

Returns:

The serialized table as a string.