tlc.core.writers.table_writer#

Write batches of rows to persistent storage.

Module Contents#

Classes#

Class

Description

TableWriter

A class for writing batches of rows to persistent storage.

Data#

Data

Description

COLUMN_NAME_REGEX

API#

tlc.core.writers.table_writer.COLUMN_NAME_REGEX = None#
class tlc.core.writers.table_writer.TableWriter(table_name: str = _3LC_FALLBACK_TABLE_NAME, dataset_name: str = _3LC_FALLBACK_DATASET_NAME, project_name: str = _3LC_FALLBACK_PROJECT_NAME, description: str = '', column_schemas: typing.Mapping[str, tlc.client.sample_type._SampleTypeStructure] | None = None, if_exists: typing.Literal[overwrite, rename, raise] = 'rename', *, table_url: tlc.core.url.Url | str | None = None)#

A class for writing batches of rows to persistent storage.

This class is primarily used for writing data in a structured format to parquet files. It supports batching of data and managing the schema of the columns.

Example:

table_writer = TableWriter(
    project_name="My Project",
    dataset_name="My Dataset",
    table_name="My Table"
)
table_writer.add_batch({"column1": [1, 2, 3], "column2": ["a", "b", "c"]})
table_writer.add_row({"column1": 4, "column2": "d"})
table = table_writer.finalize()

Initialize a TableWriter.

Parameters:
  • table_name – The name of the table, defaults to “table”.

  • dataset_name – The name of the dataset, defaults to “default-dataset”.

  • project_name – The name of the project, defaults to “default-project”.

  • column_schemas – Optional schemas to override the default inferred column schemas.

  • table_url – An optional url to manually specify the Url of the written table. Mutually exclusive with table_name, dataset_name, and project_name.

add_row(table_row: MutableMapping[str, tlc.core.builtins.types.MetricData]) None#

Add a single row to the table being written.

Parameters:

table_row – A dictionary mapping column names to values.

add_batch(table_batch: MutableMapping[str, tlc.core.builtins.types.MetricData]) None#

Add a batch of rows to the buffer for writing.

This method validates the consistency of the batch and appends it to the buffer. When the buffer reaches its maximum size, it is automatically flushed to disk.

Parameters:

table_batch – A dictionary mapping column names to lists of values.

Raises:

ValueError – If the columns in the batch have unequal lengths or mismatch with existing columns.

clear() None#

Clear the buffer and reset the internal state.

flush() tlc.core.objects.table.Table#

Flush the buffer and return the written table. This method is deprecated.

finalize() tlc.core.objects.table.Table#

Write all added batches to disk and return the written table.