tlc.export.exporter¶
The base class for all Exporters.
Module Contents¶
Classes¶
Class |
Description |
|---|---|
The base class for all Exporters. |
|
Base class for row-by-row exporters where each row maps to one output line. |
|
Base class for exporters that serialize a table to a single string/file. |
API¶
- class Exporter¶
The base class for all Exporters.
Exporters are used to export tables to various formats, typically after a user is done cleaning their data with 3LC. Subclasses of Exporter should be registered using the
register_exporter()decorator, which makes them available for use inTable.export().There are three patterns for implementing exporters:
Row exporters (simplest): Subclass
RowExporterand implementexport_row(), which converts a single row to a string. Declare aseparatorand the framework handles iteration, weight filtering, joining, and writing.Serializing exporters (single-file output): Subclass
SerializingExporterand implement theserialize()method, which returns a string to be written to the output URL.Free-form exporters (e.g., directory output): Subclass
Exporterdirectly and override the internal export hooks (_do_exportand_get_export_impl_method— see the source for the full contract).
Subclasses can also override the
can_export()method, which determines whether the exporter can export a given table to a given URL. The default implementation checks thefile_extensionsclass attribute. If neithercan_export()norfile_extensionsis provided, the exporter will only be used when theformatargument is specified explicitly inTable.export().Subclasses of Exporter must define the class attribute
supported_format, which is a string indicating the format that the exporter supports. Whenever theformatargument is not specified inTable.export(), it will callcan_export()for all registered exporters to find compatible ones. If multiple exporters are compatible, the one with the highestprioritywill be used.- Variables:
supported_format – A string indicating the format that the exporter supports.
priority – An integer indicating the priority of the exporter. Used to break ties when multiple exporters are compatible with a given table and URL. The exporter with the highest priority will be used.
file_extensions – A frozenset of file extensions (e.g.,
{".csv"}) that the exporter handles. The basecan_export()implementation checks if the output URL’s extension is in this set.force – If True, allows this exporter to override an already-registered format during entry point discovery.
- can_export( ) bool¶
Check if the exporter can export the given table to the given output URL.
The default implementation checks if the output URL’s extension is in
file_extensions. Subclasses can override this for content-based detection (e.g., checking for annotation columns).This method is called for all registered exporters when
formatis not specified inTable.export(), so it should be fast.- Parameters:
table – The table to export.
output_url – The URL to export to.
- Returns:
True if the exporter can export the table to the given URL, False otherwise.
- static remaining_table_rows( ) Iterator[tlc._core.objects.table.TableRow]¶
Return an iterator of the remaining rows in the table after filtering out rows with a weight below the given threshold.
- Parameters:
table – The table to filter.
weight_threshold – The weight threshold.
- Returns:
An iterator of the remaining rows in the table.
- class RowExporter¶
Bases:
tlc.export.exporter.SerializingExporterBase class for row-by-row exporters where each row maps to one output line.
This is the simplest exporter pattern. Subclasses implement
export_row()which converts a single table row to a string. The framework handles iteration, weight filtering, joining withseparator, and writing to the output URL.Example:
@register_exporter class NdjsonExporter(RowExporter): supported_format = "ndjson" file_extensions = frozenset({".ndjson", ".jsonl"}) separator = "\n" def export_row(self, row, **kwargs): import json return json.dumps(row)
- Variables:
separator – The string used to join row outputs. Defaults to
"\n".
- abstract export_row(
- row: tlc._core.objects.table.TableRow,
- **kwargs: Any,
Convert a single table row to a string.
- Parameters:
row – A single row from the table (dict-like mapping column names to values).
**kwargs – Additional format-specific arguments.
- Returns:
The string representation of the row.
- serialize( ) str¶
Serialize the table by converting each row and joining with the separator.
- Parameters:
table – The table to serialize.
output_url – The URL to export to (available for path resolution).
weight_threshold – The weight threshold for filtering rows.
**kwargs – Additional arguments passed to
export_row().
- Returns:
The serialized table as a string.
- class SerializingExporter¶
Bases:
tlc.export.exporter.ExporterBase class for exporters that serialize a table to a single string/file.
Subclasses must implement the
serialize()method, which returns a string representation of the table. The string is then written to the output URL.This is the most common exporter pattern, suitable for formats like JSON, CSV, and COCO.
- abstract serialize( ) str¶
Serialize a table to a string which can be written to a URL.
- Parameters:
table – The table to serialize.
output_url – The URL to export to (available for path resolution).
weight_threshold – The weight threshold for filtering rows.
**kwargs – Additional format-specific arguments.
- Returns:
The serialized table as a string.