tlc.core.writers.table_writer
#
Write batches of rows to persistent storage.
Module Contents#
Classes#
Class |
Description |
---|---|
A class for writing batches of rows to persistent storage. |
Data#
Data |
Description |
---|---|
API#
- tlc.core.writers.table_writer.COLUMN_NAME_REGEX = compile(...)#
- class tlc.core.writers.table_writer.TableWriter(table_name: str = _3LC_FALLBACK_TABLE_NAME, dataset_name: str = _3LC_FALLBACK_DATASET_NAME, project_name: str = _3LC_FALLBACK_PROJECT_NAME, description: str = '', column_schemas: typing.Mapping[str, tlc.client.sample_type._SampleTypeStructure] | None = None, if_exists: typing.Literal[overwrite, rename, raise] = 'rename', root_url: tlc.core.url.Url | str | None = None, input_tables: list[tlc.core.url.Url] | None = None, *, table_url: tlc.core.url.Url | str | None = None)#
A class for writing batches of rows to persistent storage.
This class is primarily used for writing data in a structured format to parquet files. It supports batching of data and managing the schema of the columns.
- Example:
table_writer = TableWriter( project_name="My Project", dataset_name="My Dataset", table_name="My Table" ) table_writer.add_batch({"column1": [1, 2, 3], "column2": ["a", "b", "c"]}) table_writer.add_row({"column1": 4, "column2": "d"}) table = table_writer.finalize()
Initialize a TableWriter.
- Parameters:
table_name – The name of the table, defaults to “table”.
dataset_name – The name of the dataset, defaults to “default-dataset”.
project_name – The name of the project, defaults to “default-project”.
description – An optional description of the table.
column_schemas – Optional schemas to override the default inferred column schemas.
table_url – An optional url to manually specify the Url of the written table. Mutually exclusive with table_name, dataset_name, and project_name.
root_url – The root URL to write the table to. If not provided, the default root URL is used.
- add_row(table_row: MutableMapping[str, tlc.core.builtins.types.MetricData]) None #
Add a single row to the table being written.
- Parameters:
table_row – A dictionary mapping column names to values.
- add_batch(table_batch: MutableMapping[str, tlc.core.builtins.types.MetricData]) None #
Add a batch of rows to the buffer for writing.
This method validates the consistency of the batch and appends it to the buffer. When the buffer reaches its maximum size, it is automatically flushed to disk.
- Parameters:
table_batch – A dictionary mapping column names to lists of values.
- Raises:
ValueError – If the columns in the batch have unequal lengths or mismatch with existing columns.
- finalize() tlc.core.objects.table.Table #
Write all added batches to disk and return the written table.