tlc.core.metrics_writer.table_writer#

Write batches of rows to persistent storage.

Module Contents#

Classes#

Class

Description

TableWriter

A class for writing batches of rows to persistent storage.

API#

class tlc.core.metrics_writer.table_writer.TableWriter(table_name: str = _3LC_FALLBACK_TABLE_NAME, dataset_name: str = _3LC_FALLBACK_DATASET_NAME, project_name: str = _3LC_FALLBACK_PROJECT_NAME, column_schemas: dict[str, tlc.core.schema.Schema] = {}, if_exists: typing.Literal[overwrite, rename, raise] = 'rename', *, table_url: tlc.core.url.Url | str | None = None)#

A class for writing batches of rows to persistent storage.

This class is primarily used for writing data in a structured format to parquet files. It supports batching of data and managing the schema of the columns. The data written can be identified and retrieved using a unique key.

Example:

table_writer = TableWriter(
    project_name="My Project",
    dataset_name="My Dataset",
    table_name="My Table"
)
table_writer.add_batch({"column1": [1, 2, 3], "column2": ["a", "b", "c"]})
table_writer.add_row({"column1": 4, "column2": "d"})
table_writer.finalize()

Initialize a TableWriter.

Parameters:
  • table_name – The name of the table, defaults to “table”.

  • dataset_name – The name of the dataset, defaults to “default-dataset”.

  • project_name – The name of the project, defaults to “default-project”.

  • column_schemas – Optional schemas to override the default inferred column schemas.

  • table_url – An optional url to manually specify the Url of the written table. Mutually exclusive with table_name, dataset_name, and project_name.

add_row(table_row: MutableMapping[str, tlc.core.builtins.types.MetricData]) None#

Add a single row to the table being written.

add_batch(table_batch: MutableMapping[str, tlc.core.builtins.types.MetricData]) None#

Add a batch of rows to the buffer for writing.

This method validates the consistency of the batch and appends it to the buffer. When the buffer reaches its maximum size, it is automatically flushed to disk.

Parameters:

table_batch – A dictionary mapping column names to lists of values.

Raises:

ValueError – If the columns in the batch have unequal lengths or mismatch with existing columns.

clear() None#

Clear the buffer and reset the internal state.

flush() tlc.core.objects.table.Table#

Flush the buffer and return the written table. This method is deprecated.

finalize() tlc.core.objects.table.Table#

Write all added batches to disk and return the written table.

set_override_column_schemas(override_column_schemas: dict[str, tlc.core.schema.Schema]) None#

Set or update the override column schemas.

This method allows setting custom schemas for specific columns, overriding the default inferred schemas.

Parameters:

override_column_schemas – A dictionary of column names and their corresponding custom schemas.

Raises:

ValueError – If called after the first batch has been added.