tlc

3LC (Three Lines of Code) Python Package.

3LC is a tool for understanding and improving machine learning models and datasets. The tlc package is the Python entry point: it constructs and reads 3LC tlc.Tables and tlc.Runs, collects per-sample metrics, and serves data to the 3LC Dashboard through the Object Service.

The top-level tlc namespace is the supported public interface. The names listed in tlc.__all__, together with the curated sub-namespaces listed below, form the stable API and follow semantic versioning: breaking changes are reserved for major releases, additions land in minor releases, and patch releases are non-breaking.

Anything underscored is private. tlc._core and any module or attribute whose name starts with _ may move, rename, or be removed at any time without notice. Reach into them only when nothing in the public surface fits, and expect to update on every release.

Modules and Packages

Module

Description

configuration

The 3LC configuration types.

constants

Public constants for the tlc package.

data_types

Data-bearing types for working with 3LC Table data.

export

Exporters for converting 3LC tables into common dataset formats.

helpers

Utility helper classes for working with 3LC concepts.

integration

Integrations with third party libraries which are optional dependencies.

metrics

Core functionality for collecting per-sample metrics with a model on a tlc.Table, writing them to a tlc.Run. This includes a variety of metrics_collectors which can be applied during metrics collection inference passes through tlc.collect_metrics().

objects

Base classes for the 3LC object hierarchy.

reduction

Dimensionality reduction methods and utilities.

sample_types

Built-in and custom sample types for 3LC tables.

schemas

Built-in schemas for describing 3LC table columns.

url

Public URL API.

Package Contents

Classes

Class

Description

MetricsTableWriter

A class for writing metrics tables to runs.

Run

Represents a single execution of a specific process or experiment.

Schema

A schema is a recursive structure which defines the layout of an object. It defines what elements the object consists of, which must be either

Table

The abstract base class for all Table types.

TableView

A map-style view over a tlc.Table that applies a sample-level transform on every read.

TableWriter

A class for writing batches of rows to persistent storage.

Url

A class which represents a URL.

Functions

Function

Description

active_project_name

Return the active project name, if any.

active_run

Return the active Run, if any.

close

Close a run session

collect_metrics

Collect per-sample metrics with a map-style dataset.

init

Initialize a 3LC Run.

log

Log output data to the active Run or a specified Run.

set_active_run

Set the active Run.

Data

Data

Description

config

A lazy alias for the live Configuration singleton. Use this to access and modify the live configuration.

API

class MetricsTableWriter(
*,
run_url: Url | str | None = None,
foreign_table_url: Url | str = '',
schema: dict[str, Schema] | None = None,
stream_name: str = 'default_stream',
)

Bases: tlc._core.writers.table_writer.TableWriter

A class for writing metrics tables to runs.

Calling finalize() writes the metrics table to persistent storage and automatically updates the corresponding Run to reference the newly written table.

If a foreign_table_url is supplied, the written metrics table will also be associated with the given foreign table, indicating that each metric value is associated with a specific row in the foreign table.

For this to work, each added metrics batch must contain a column called example_id. This is the foreign key that links the metrics table to the foreign table. The values of example_id are linear indices into the foreign table, starting from 0. A single metrics table can contain multiple values for the same example_id, and does not need to contain values for all example_ids in the foreign table.

Example:

from tlc import MetricsTableWriter

# Assuming a input table of length 8 exists at the url "input_table_url"

run = tlc.init()

with MetricsTableWriter(
    run_url=run.url,
    foreign_table_url="input_table_url",
) as metrics_writer:
    # First batch of metrics, corresponding to the first 4 rows of the foreign table
    metrics_writer.add_batch({
        "loss": [0.1, 0.2, 0.3, 0.4], "example_id": [0, 1, 2, 3],
    })

    # Second batch of metrics, corresponding to the last 4 rows of the foreign table
    metrics_writer.add_batch({
        "loss": [0.2, 0.4, 0.1, 0.5], "example_id": [4, 5, 6, 7],
    })

# The run is automatically updated with the written metrics table on finalize/exit.

Initialize a MetricsTableWriter.

Parameters:
  • run_url – The Url of the run to write metrics for. Will default to the active run if not provided.

  • foreign_table_url – The Url of the dataset to write metrics for.

  • schema – A dictionary of column names to schema overrides. Schemas will be inferred from the data if not provided.

  • stream_name – Display label for the dashboard group this table joins into. Tables sharing a column signature should use the same stream_name — conflicting names within a group are silently dropped in favor of a generic label.

finalize() Table

Write all added batches to persistent storage, update the run, and return the written table.

Returns:

The written metrics table.

Raises:

RuntimeError – If finalize() has already been called on this writer.

get_written_metrics_infos() Sequence[Mapping[str, Any]]

Get the list of written metrics infos.

Returns:

A list of written metrics infos. The returned Urls are relative to the run’s Url.

class Run(
*,
url: Url | None = None,
created: str | None = None,
last_modified: str | None = None,
description: str | None = None,
metrics: list[dict[str, Any]] | None = None,
constants: dict[str, Any] | None = None,
status: float | None = None,
init_parameters: Any = None,
)

Bases: tlc._core.objects.mutable_object.MutableObject

Represents a single execution of a specific process or experiment.

Warning

Do not instantiate this class directly. Use one of the Run.from_* methods or tlc.init() instead.

A Run object encapsulates details about its setup, execution, metadata, and metrics.

Run objects are mutable, allowing for updates to run attributes as they progress or as additional information becomes available.

Create a Run object.

Parameters:
  • url – The URL of the run.

  • created – The creation timestamp.

  • last_modified – The last modified timestamp.

  • description – The description of the run.

  • metrics – A list of metrics captured during this run.

  • constants – Constant values used during this run.

  • status – The status of the run.

  • init_parameters – Parameters used during object initialization.

add_input_table(
input_table: Table | Url | str,
) None

Adds an input table to the run.

This updates the Run object to include the input table in the list of inputs to the Run.

Parameters:

input_table – The input table to add.

add_input_value(
input_value: dict[str, Any],
) None

Adds a value to the inputs of the run.

Parameters:

input_value – The value to add.

add_metrics(
metrics: dict[str, Any],
*,
schema: dict[str, Schema] | None = None,
foreign_table_url: Url | str | None = None,
constants: dict[str, Any] | None = None,
) Sequence[Mapping[str, Any]]

Write the provided metrics to a Table and associate it with the run.

Parameters:
  • metrics – The metrics data (dict of column names to column data) to write.

  • schema – The schemas for the metrics data.

  • foreign_table_url – The URL of the table to associate with the metrics data. If provided, the metrics data will be augmented with extra columns to identify the example ID and the foreign table, if these columns are not already present. If the metrics data does not correspond 1-to-1 with the table, ensure the metrics data includes an “example_id” column.

  • constants – The constants to add to the run.

Returns:

The written table info.

Raises:
  • ValueError – If the number of rows in the metrics data does not match the number of rows in the table, or the input_table_url is not a valid URL.

  • FileNotFoundError – If the input_table_url can not be found.

add_metrics_table(
metrics_table: Table | Url,
) None

Add a metrics table to the run.

Parameters:

metrics_table – The metrics table to add.

add_output_value(
output_value: dict[str, Any],
) None

Adds a value to the outputs of the run.

Parameters:

output_value – The value to add.

property bulk_data_url: Url

Returns the URL of the bulk data for this run.

copy(
*,
run_name: str | None = None,
project_name: str | None = None,
root_url: tlc.Url | str | None = None,
if_exists: typing.Literal[raise,
rename,
overwrite] = 'raise',
destination_url: tlc.Url | str | None = None,
) Run

Create a copy of this run.

The copy is performed to:

  1. A URL derived from the given project_name, run_name, and root_url if given

  2. destination_url, if given

  3. A generated URL derived from the run’s URL, if none of the above are given

Parameters:
  • destination_url – The URL to copy the run to.

  • project_name – The name of the project to create the run in.

  • run_name – The name of the run to create.

  • root_url – The root URL to create the run in.

  • if_exists – What to do if the destination URL already exists.

Returns:

The copied run.

static from_names(
*,
project_name: str | None = None,
run_name: str | None = None,
root_url: Url | str | None = None,
) Run

Creates a Run instance from the names specifying the URL of an existing Run.

Parameters:
  • project_name – The name of the project.

  • run_name – The name of the run.

  • root_url – The root url to use instead of the default root url.

Returns:

The Run at the resulting url.

static from_url(
url: Url | str,
) Run

Creates a Run instance from the URL of an existing Run.

Parameters:

url – The URL to the Run object.

Returns:

The Run object.

property metrics_tables: list[Table]

Returns a list of the metrics tables for this run.

property name: str

The name of the run.

reduce_embeddings_by_foreign_table_url(
foreign_table_url: Url | str,
*,
delete_source_tables: bool = True,
**kwargs: Any,
) dict[Url, Url]

Reduces all metrics tables in a Run using a reducer trained on the embeddings in a specified metrics table.

See tlc.reduction.reduce.reduce_embeddings_by_foreign_table_url() for more information.

Parameters:
  • foreign_table_url – The Url of the foreign table to use for reduction.

  • delete_source_tables – If True, the source metrics tables will be deleted after reduction.

  • **kwargs – Additional keyword arguments.

Returns:

A dictionary mapping the original table URLs to the reduced table URLs.

reduce_embeddings_per_dataset(
*,
delete_source_tables: bool = True,
**kwargs: Any,
) dict[Url, Url]

Reduces the embeddings for each dataset in this run.

See tlc.reduction.reduce.reduce_embeddings_per_dataset() for more information.

Parameters:
  • delete_source_tables – If True, the source metrics tables will be deleted after reduction.

  • **kwargs – Additional keyword arguments.

Returns:

A dictionary mapping the original table URLs to the reduced table URLs.

set_description(
description: str,
) None

Set the description of the run.

Parameters:

description – The description to set.

set_parameters(
parameters: dict[str, Any],
) None

Set the parameters of the run.

Parameters:

parameters – The parameters to set.

set_status_cancelled() None

Set the status of the run to cancelled.

set_status_collecting() None

Set the status of the run to collecting.

set_status_completed() None

Set the status of the run to completed.

set_status_empty() None

Set the status of the run to empty.

set_status_paused() None

Set the status of the run to paused.

set_status_post_processing() None

Set the status of the run to post processing.

set_status_running() None

Set the status of the run to running.

update_metrics(
metric_infos: Sequence[Mapping[str, Any]] | None = None,
) None

Add new metrics to the run.

Any metrics that are already present in the run will not be added again.

Parameters:

metric_infos – A list of MetricTableInfo dicts to add to the run.

class Schema(
*,
display_name: str = '',
description: str = '',
writable: bool = True,
display_importance: float = 0,
value: ScalarValue | None = None,
values: dict[str, Schema] | None = None,
composite_role: str = '',
display_color: str = '',
swap_group: str = '',
computable: bool = True,
transient: bool = False,
default_visible: bool = True,
size0: DimensionNumericValue | None = None,
size1: DimensionNumericValue | None = None,
size2: DimensionNumericValue | None = None,
size3: DimensionNumericValue | None = None,
size4: DimensionNumericValue | None = None,
size5: DimensionNumericValue | None = None,
metadata: dict[str, Any] | None = None,
default_value: Any | None = None,
array_signature_group: str | None = None,
number_role_u: str | None = None,
number_role_v: str | None = None,
bulk_data_location: str | Url | None = None,
sample_type: str | None = None,
)

A schema is a recursive structure which defines the layout of an object. It defines what elements the object consists of, which must be either

  • Atomic type (with optional metadata, e.g. value range, unit, etc.) OR

  • Composite contents (a list of schemas describing the sub-object)

In addition, it defines HOW MANY of these scalar or composite elements exist, in the form of up to six-dimensions which can each be described separately and be of fixed or variable lengths. The default size of dimensions is 1, describing a scalar value.

Schemas are used for

  • Defining the layout of Objects (as reported by e.g. “MyObject.schema”)

  • In the case of Tables: defining the common layout of all table rows (as reported by e.g “MyTableObject.schema.values[“rows”])

In the case where a schema defines a “top-level” object, it will always have a ‘values’ attribute (since it is always a composite object, and does not comprise only a single atomic value).

Initialize a Schema.

A schema is either atomic (has a value) or composite (has values). Exactly one of value or values must be provided.

Parameters:
  • display_name – Human-readable name shown in the Dashboard.

  • description – Description of this schema element.

  • writable – Whether the value is editable in the Dashboard.

  • display_importance – Ordering hint for Dashboard column display.

  • value – The atomic scalar type (e.g. Float32Value(), StringValue()). Mutually exclusive with values.

  • values – Mapping of field names to child schemas for composite types. Mutually exclusive with value.

  • composite_role – Semantic role for composite schemas (e.g. "bounding_boxes").

  • display_color – Color hint for Dashboard visualization.

  • swap_group – Group identifier for column swapping in the Dashboard.

  • computable – Whether this column can be recomputed from source data.

  • transient – Whether this column is excluded from serialization.

  • default_visible – Whether this column is visible by default in the Dashboard.

  • size0 – First dimension descriptor.

  • size1 – Second dimension descriptor.

  • size2 – Third dimension descriptor.

  • size3 – Fourth dimension descriptor.

  • size4 – Fifth dimension descriptor.

  • size5 – Sixth dimension descriptor.

  • metadata – Arbitrary key-value metadata attached to this schema element.

  • default_value – Default value for this schema element.

  • array_signature_group – Group identifier for arrays that share the same shape signature.

  • number_role_u – Semantic role for the U component of 2D numeric values.

  • number_role_v – Semantic role for the V component of 2D numeric values.

  • bulk_data_location – URL or path prefix where bulk data files are stored for this column. When set, the TableWriter externalizes column data to files under this location.

  • sample_type – Name of the registered sample type that converts between row form (serialized) and sample form (Python objects). For example, "pil_png", "numpy_array", "segmentation_polygons". The resolved instance is available via the resolved_sample_type property. None means identity (no conversion).

FLAT_ARRAY_SAMPLE_TYPES: ClassVar[frozenset[str]] = frozenset(...)
add_outer_dimension() Schema

Like push_dim, but adds an outer dimension (if possible).

add_sample_weight(
*,
hidden: bool = True,
default_value: float = 1.0,
) None

Adds a sample weight column to the schema.

Parameters:
  • hidden – Whether the column should be hidden

  • default_value – The default value for the sample weight column.

add_sub_schema(
name: str,
schema: Schema,
) None

Adds a Schema as a sub-property within this Schema (i.e. into the ‘values’ collection)

add_sub_value(
name: str,
value: ScalarValue,
*,
writable: bool = True,
computable: bool = True,
) None

Adds a scalar value as a sub-property within this Schema (i.e. into the ‘values’ collection)

consider_override_from(
override_schema: Schema | Mapping[str, object] | None,
) Schema

Selectively overwrite attributes in this schema with non-default ones from override_schema.

Merge semantics — sparse at both column and within-column level:

  • Columns missing from the override are left untouched.

  • Columns present in the override only overwrite attributes that differ from their default, so a partial override touches a handful of fields and inherits the rest from self.

  • Recursion into values / size0 mirrors the same rule per sub-schema.

This is the merge that backs Table.override_table_rows_schema. It assumes the override is structurally honest: fields that make dimensionality claims (size0, composite-vs-scalar) must not contradict self’s — if they do, downstream stages that consume the merged schema may produce incoherent data. The factory-level TableWriter(schema=...) / Table.from_*(schema=...) contract is stricter still (declared columns must be complete); see those APIs for that case.

does_object_match(
_object: Any,
) bool

Checks whether a schema matches an example object.

This requires exact 1:1 mapping between attributes in the object and the schema (including recursively). This means no attributes can be missing, nor can there be any additional attributes only present in the object.

static from_any(
any_object: Any,
) Schema

Returns a Schema object which has been populated from a serialized (possibly sparse) object

static from_json(
json_string: str,
) Schema

Returns a Schema object which has been populated from a JSON string

from_row(
row: Any,
) Any

Convert row form to sample form.

A column either has a real SampleType that owns the entire column value, or no transform (identity). Composite schemas recurse into children only when the column-level transform is identity; sample view of a composite is therefore always a dict.

File-storage columns are loaded via load(), not from_row(), so this method passes through the data unchanged for file-storage transforms.

When the sample type produces a numpy ndarray or torch Tensor and the schema’s value is numeric/bool, the result is cast to the schema’s declared dtype. This undoes the dtype widening that pyarrow’s to_pylist() introduces by materializing narrow scalars as Python ints/floats. Non-array results pass through the cast unchanged.

Parameters:

row – The row data to convert.

Returns:

The data in sample form.

classmethod from_sample(
sample: Any,
*,
all_arrays_are_fixed_size: bool = False,
) Schema

Infer a schema describing the provided Python value.

Parameters:
  • sample – The sample to create a schema from.

  • all_arrays_are_fixed_size – If True, all arrays will be marked as fixed size.

Returns:

The inferred schema.

classmethod from_schema_like(
schema_like: Schema | Mapping[str, Any],
) Schema

Convert a SchemaLike value to a Schema.

Always returns a fresh copy — the input is never mutated or shared.

  • Schema objects are deep-copied.

  • Mappings are treated as {column_name: SchemaLike} and converted recursively.

Parameters:

schema_like – A Schema or a mapping of column names to SchemaLike values.

Returns:

A new Schema (always a copy).

Raises:

TypeError – If keys are not strings or the input type is unsupported.

is_atomic() bool

Return whether the schema is atomic, i.e. has a single value.

The opposite of is_composite.

Returns:

Whether the schema is atomic

is_composite() bool

Return whether the schema is composite, i.e. has multiple values.

The opposite of is_atomic.

Returns:

Whether the schema is composite

is_empty() bool
is_fixed_size() bool

Return whether the schema has fixed size.

This requires all dimensions to be fixed size.

is_scalar() bool

Return whether the schema is a scalar value

Sizes are required to be set in increasing dimensions without gaps and no size is treated like a scalar.

last_dimension() DimensionNumericValue | None

Return the last (outermost) dimension of the Schema

pop_dim() DimensionNumericValue | None

Sets size5 to None and shifts all other dimensions left. (size5 becomes size4, size4 becomes size3, etc.).

Returns:

The old size0

push_dim(
dim: DimensionNumericValue | None = None,
) DimensionNumericValue | None

Inserts dim as size0 and shifts all other dimensions right. (size1 becomes size0, size2 becomes size1).

Parameters:

dim – The dimension to insert as size0

Returns:

The old size5

property resolved_sample_type: SampleType

Get the resolved SampleType instance for this schema.

Always returns a SampleType — never None. Resolution checks the explicit

Py:

attr:sample_type name through the legacy name mapping. Returns Identity if no transform is resolved, Hidden for hidden columns.

The result is cached and invalidated automatically when attributes that affect resolution are modified (via __setattr__).

Returns:

The resolved SampleType instance.

set_writable_flag_recursively(
writable: bool,
) None

Sets the writable flag recursively.

Parameters:

writable – Whether the schema is writable

to_json() str

Writes the contents of this schema to a JSON string. Note that

  • Defaults values are omitted for brevity

  • Schemas might be recursive

to_minimal_dict(
include_all: bool,
) dict[str, Any]

Add a minimal representation of this object to a dictionary for subsequent serialization to JSON

to_row(
sample: Any,
ctx: ExternalizationContext | None = None,
) Any

Convert sample form to row form.

A column either has a real SampleType that owns the conversion, or no transform (identity). Composite schemas recurse into children only when the column-level transform is identity.

External-storage leaves are routed through externalize() when ctx is supplied and the value is sample-form (per accepts()). The result is a URL string (typically absolute; the writer pipeline normalizes it to a table-relative string afterwards). Without ctx, or for values already in row form, the leaf is passed through unchanged — this lets a batch freely mix live samples with pre-externalized URL strings.

Inline transforms are called with (sample) only. When ctx is supplied the caller is operating through the pipeline, so row-form / None values are auto-detected via accepts() and passed through instead of handed to a transform that would otherwise crash.

Parameters:
  • sample – The sample data to convert.

  • ctx – Optional externalization context. Supplied by the write pipeline when externalization should happen inline; omitted by call sites that only want structural conversion.

Returns:

The data in row form (or a URL string for externalized leaves when ctx is supplied; the pipeline relativizes URL leaves afterwards).

validate_row(
row: Any,
path: str = '',
) list[ValidationError]

Validate row-form data against this schema.

Checks structural correctness (composite dict keys, dimension constraints) and leaf value type compatibility against ScalarValue descriptors. SampleType is not involved — this validates the storage representation only.

Parameters:
  • row – The data in row form to validate.

  • path – Dot-separated path prefix for error messages (used during recursion).

Returns:

A list of validation errors (empty if valid).

validate_sample(
sample: Any,
) list[ValidationError]

Validate a sample (Python object) against this schema before conversion to row form.

Schemas with a real (non-identity) transform delegate to resolved_sample_type.validate_sample() and do not recurse. Identity-typed composite schemas recurse into child schemas.

Parameters:

sample – The Python object in sample form to validate.

Returns:

A list of validation errors (empty if valid).

class Table(
*,
url: Url | None = None,
created: str | None = None,
description: str | None = None,
row_cache_url: Url | None = None,
row_cache_populated: bool | None = None,
override_table_rows_schema: Any = None,
init_parameters: Any = None,
input_tables: list[Url] | None = None,
)

Bases: tlc._core.addressable_object.AddressableObject

The abstract base class for all Table types.

Warning

Do not instantiate this class directly. Use one of the Table.from_* methods instead.

A Table is an object with two specific responsibilities:

  1. Creating table rows on demand (Either through the row-based access interface table_rows, or through the sample-based access interface provided by __getitem__).

  2. Creating a schema which describes the type of produced rows (through the rows_schema property)

Both types of produced data are determined by immutable properties defined by each particular Table type.

ALTERNATIVE INTERFACE/CACHING:

A full representation of all table rows can - for performance reasons - also be retrieved through the get_rows_as_binary method.

This method will try to retrieve a cached version of the table rows if

  • row_cache_url is non-empty AND

  • row_cache_populated is True

When this is the case, it is guaranteed that the schema property of the table is fully populated, including the nested ‘rows_schema’ property which defines the layout of all table rows.

When this cached version is NOT defined, however, get_rows_as_binary() needs to iterate over all rows to produce the data.

If row_cache_url is non-empty, the produced binary data will be cached to the specified location. After successful caching, the updated Table object will be written to its backing URL exactly once, now with ‘row_cache_populated’ set to True and with the schema fully updated. Also, the row_count property is guaranteed to be correct at this time.

Whether accessing data from a Table object later refers to this cached version (or produces the data itself) is implementation specific.

STATE MUTABILITY:

As described above, Tables are constrained in how they are allowed to change state:

  • The data production parameters (“recipe”) of a table are immutable

  • The persisted JSON representation of a Table (e.g. on disk) can take on three different states, and each state can be written only once:

    1. Bare-bones recipe

    2. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = False)

    3. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = True)

Parameters:
  • url – The URL of the table.

  • created – The creation time of the table.

  • description – The description of the table.

  • row_cache_url – The URL of the row cache.

  • row_cache_populated – Whether the row cache is populated.

  • override_table_rows_schema – Sparse schema merged onto the table’s computed row schema via consider_override_from(). Unlike the factory-level schema kwarg on TableWriter and Table.from_* (which requires declared columns to be complete), this path tolerates sparseness at both the column and within-column level — it is meant for touching up individual attributes on an existing row schema (display names, value maps, sample-type customizations) without restating structure. The override must remain structurally honest with the underlying data: any size0 / composite-vs-scalar claim it makes is not allowed to contradict what the data actually contains.

  • init_parameters – The initial parameters of the table.

  • input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage. This parameter serves as an explicit mechanism for tracking table relationships beyond the automatic lineage tracing typically managed by subclasses.

add_column(
column_name: str,
values: list[object] | object,
*,
schema: Schema | None = None,
url: Url | None = None,
) Table

Create a derived table with a column added.

This method creates and returns a new revision of the table with a new column added.

Parameters:
  • column_name – The name of the column to add.

  • values – The values to add to the column. This can be a list of values, or a single value to be added to all rows.

  • schema – The schema of the column to add. If not provided, the schema will be inferred from the values.

  • url – The url to write the new table to. If not provided, the new table will be located next to the current table.

Returns:

A new table with the column added.

add_value_map_item(
value_path: str,
internal_name: str,
*,
display_name: str = '',
description: str = '',
display_color: str = '',
url: Url | str = '',
value: float | int | None = None,
edited_table_url: Url | str = '',
) Table

Add a value map item for a specified numeric value within the schema of the Table.

Adds a new value map item to the schema of the Table without overwriting existing items.

If the specified value or internal name already exists in the value map, this method will raise an error to prevent overwriting.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.

  • internal_name – The internal name of the value map item. This is the primary identifier of the value map item.

  • display_name – The display name of the value map item.

  • description – The description of the value map item.

  • display_color – The display color of the value map item.

  • url – The url of the value map item.

  • value – The numeric value to add the value map item to. If not provided, the value will be the next available value in the value map (starting from 0).

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item added.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name already exists in the value map.

property columns: list[str]

Return a list of column names for this table.

copy(
*,
table_name: str | None = None,
dataset_name: str | None = None,
project_name: str | None = None,
root_url: tlc.Url | str | None = None,
if_exists: typing.Literal[raise,
rename,
overwrite] = 'raise',
destination_url: tlc.Url | str | None = None,
) Table

Create a copy of this table.

The copy is performed to:

  1. A URL derived from the given project_name, dataset_name, table_name, and root_url if given

  2. destination_url, if given

  3. A generated URL derived from the tables’s URL, if none of the above are given

Parameters:
  • destination_url – The URL to copy the table to.

  • project_name – The name of the project to copy to.

  • dataset_name – The name of the dataset to copy to.

  • table_name – The name of the table to copy to.

  • root_url – The root URL to copy to.

  • if_exists – The behavior to use if the destination URL already exists.

Returns:

The copied table.

delete_column(
column_name: str,
*,
table_name: str | None = None,
table_url: Url | str = '',
description: str | None = None,
) Table

Create a derived table with a column deleted.

This method creates and returns a new revision of the table with a column deleted.

Parameters:
  • column_name – The name of the column to delete.

  • table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.

  • table_url – The url to write the new table to. If not provided, the new table will be located next to the current table.

  • description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the column deleted.

delete_columns(
column_names: Sequence[str],
*,
table_name: str | None = None,
table_url: Url | str = '',
description: str | None = None,
) Table

Create a derived table with columns deleted.

This method creates and returns a new revision of the table with the specified columns deleted.

Parameters:
  • column_names – The names of the columns to delete.

  • table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.

  • table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

  • description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the columns deleted.

delete_row(
index: int,
*,
table_name: str | None = None,
table_url: Url | str = '',
description: str | None = None,
) Table

Delete a row from a Table.

This method creates and returns a new revision of the table with the specified row deleted.

Parameters:
  • index – The index of the row to delete.

  • table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.

  • table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

  • description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the row deleted.

delete_rows(
indices: Sequence[int],
*,
table_name: str | None = None,
table_url: Url | str = '',
description: str | None = None,
) Table

Delete rows from a Table.

This method creates and returns a new revision of the table with the specified rows deleted.

Parameters:
  • indices – The indices of the rows to delete.

  • table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.

  • table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

  • description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the rows deleted.

delete_value_map(
value_path: str,
*,
edited_table_url: Url | str = '',
) Table

Delete a value map for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the Table with a deleted value map for a specific numeric value.

Parameters:
  • value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

delete_value_map_item(
value_path: str,
*,
value: float | int | None = None,
internal_name: str = '',
edited_table_url: Url | str = '',
) Table

Delete a value map item for a specified numeric value within the schema of the Table.

Deletes a value map item from the schema of the Table, by numeric value or internal name.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.

  • value – The numeric value of the value map item to delete. If not provided, the value map item will be deleted by internal name.

  • internal_name – The internal name of the value map item to delete. If not provided, the value map item will be deleted by numeric value.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name does not exist in the value map.

ensure_complete_schema() None

Ensure that the table has a complete schema.

ensure_data_production_is_ready() None

A method that ensures that the table is ready to produce data

This method is called before any access to the Table’s data is made. It is used to ensure that the Table has preformed any necessary data production steps. Normally Tables don’t produce data until it is requested, but this method can be called to force data production.

Note that subsequent applications of this method will not change the data, as a Table is immutable.

ensure_dependent_properties() None

Ensure that the table set row_count as required to reach fully defined state.

export(
output_url: Url | str | Path,
format: str | None = None,
*,
weight_threshold: float = 0.0,
**kwargs: object,
) None

Export this table to the given output URL.

Writes the table’s rows to output_url in the specified format. Several built-in formats ship with 3LC; additional formats can be added by installing plugin packages or registering a custom Exporter subclass. Run 3lc exporters list or call list_exporters() to see what is available at runtime.

Format inference. If format is omitted, the format is chosen by calling can_export() on every registered exporter and picking the one with the highest priority. A table with bounding-box columns exported to a .json file will therefore pick coco over the generic json exporter. Ties at the highest priority raise ValueError and require an explicit format.

Weight filtering. If the table has a weights column (see weights_column_name), rows with weight strictly less than weight_threshold are excluded. Tables without a weights column ignore this parameter.

Format-specific arguments. Each exporter declares its own keyword arguments — e.g. indent and image_folder for COCO, or split for YOLO. Pass them as **kwargs. Unknown kwargs trigger a warning and are dropped. The full list for each exporter is available in the corresponding class docstring under tlc.export.exporters.

Parameters:
  • output_url – The output URL, path, or string. Directory outputs (e.g. YOLO) do not need an extension.

  • format – The export format (e.g. "csv", "coco"). If None, inferred from the table and output_url.

  • weight_threshold – Minimum row weight to include (default 0.0). Ignored if the table has no weights column.

  • **kwargs – Additional format-specific arguments. See the exporter’s class docstring for valid keys.

Raises:

ValueError – If format is specified but no matching exporter is registered; if no format can be inferred; or if the table content is incompatible with the chosen format.

static from_coco(
annotations_file: str | pathlib.Path | tlc.Url,
image_folder: str | pathlib.Path | tlc.Url | None = None,
*,
keep_crowd_annotations: bool = True,
task: typing.Literal[detect,
segment,
pose] = 'detect',
segmentation_format: typing.Literal[polygons,
masks] | None = None,
points: list[float] | None = None,
point_attributes: str | Sequence[str] | Sequence[dict[str,
str]] | Sequence[tlc.schemas.MapElement] | dict[float,
str] | dict[int,
str] | dict[float,
tlc.schemas.MapElement] | dict[int,
tlc.schemas.MapElement] | None = None,
lines: list[int] | None = None,
line_attributes: str | Sequence[str] | Sequence[dict[str,
str]] | Sequence[tlc.schemas.MapElement] | dict[float,
str] | dict[int,
str] | dict[float,
tlc.schemas.MapElement] | dict[int,
tlc.schemas.MapElement] | None = None,
triangles: list[int] | None = None,
triangle_attributes: str | Sequence[str] | Sequence[dict[str,
str]] | Sequence[tlc.schemas.MapElement] | dict[float,
str] | dict[int,
str] | dict[float,
tlc.schemas.MapElement] | dict[int,
tlc.schemas.MapElement] | None = None,
flip_indices: list[int] | None = None,
oks_sigmas: list[float] | None = None,
per_instance_extras: collections.abc.Sequence[str] | collections.abc.Mapping[str,
tlc.schemas._schema.Schema] | None = None,
per_image_extras: collections.abc.Sequence[str] | collections.abc.Mapping[str,
tlc.schemas._schema.Schema] | None = None,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a COCO annotations file.

.. note::

image_folder is kept positional alongside annotations_file so the common Table.from_coco(file, folder) form remains ergonomic. All other parameters are keyword-only.

Parameters:
  • annotations_file – The url of the COCO annotations file.

  • image_folder – The url of the folder containing the images referenced in the COCO annotations file. If not provided, the image paths in the annotations file will be assumed to either be absolute OR relative to the annotations file.

  • keep_crowd_annotations – Whether to include annotations with iscrowd=1 in the Table.

  • task – The task of the dataset. Can be either ‘detect’, ‘segment’, or ‘pose’.

  • segmentation_format – The format of the segmentation. Can be either ‘polygons’ or ‘masks’.

  • points – Default keypoint coordinates, used for drawing new instances in the Dashboard. Pose only.

  • point_attributes – Attributes for each keypoint (e.g. name or color). Pose only.

  • lines – Default skeleton topology for pose. Will override the skeleton provided in the annotations file. Pose only.

  • line_attributes – Attributes for each line (e.g. name or color). Pose only.

  • triangles – Triangles for pose.

  • triangle_attributes – Attributes for each triangle (e.g. name or color). Pose only.

  • flip_indices – Flip indices for pose.

  • oks_sigmas – OKS sigmas for pose.

  • per_instance_extras – Annotation-level extra fields to preserve as per-instance metadata. Pass a list of annotation key names to auto-infer schemas from the data, or a dict mapping key names to explicit Schema objects. Values must be present in every annotation.

  • per_image_extras – Image-level extra fields to preserve as top-level table columns. Pass a list of image key names to auto-infer schemas, or a dict mapping key names to explicit Schema objects. Values must be present in every image entry.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the provided COCO format dataset.

static from_csv(
csv_file: str | pathlib.Path | tlc.Url,
*,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a .csv file.

Parameters:
  • csv_file – The url of the .csv file.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the CSV file.

static from_dict(data: collections.abc.Mapping[str, object], *, schema: Schema | Mapping[str, SchemaLike] | None = None, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | pathlib.Path | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None) Table

Create a Table from a dictionary.

Parameters:
  • data – The dictionary to create the table from.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table. Column-level sparse (omit columns you don’t care about) is fine; declared columns must be complete — see TableWriter for the full factory contract.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects. Extra columns are marked with sample_type={"name": "hidden"}.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the dictionary.

static from_hugging_face_dataset(
hf_dataset: datasets.Dataset,
*,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from an in-memory Hugging Face datasets.Dataset.

This is useful when the dataset has been constructed programmatically, filtered, or loaded locally.

Parameters:
  • hf_dataset – An in-memory datasets.Dataset instance.

  • table_name – The name of the table. If not provided, derived from the dataset’s split or defaults to "data".

  • dataset_name – The name of the dataset. If not provided, derived from hf_dataset.info.dataset_name or defaults to "hf-dataset".

  • project_name – The name of the project. If not provided, derived from hf_dataset.info.dataset_name or defaults to "hf-dataset".

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the in-memory Hugging Face dataset.

static from_hugging_face_hub(
path: str,
name: str | None = None,
split: str = 'train',
*,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a Hugging Face Hub dataset, similar to the datasets.load_dataset function.

.. note::

path, name and split are kept positional to mirror the well-known datasets.load_dataset(path, name, split) call. All other parameters are keyword-only.

Parameters:
  • path – Path or name of the dataset to load, same as in datasets.load_dataset.

  • name – Name of the dataset to load, same as in datasets.load_dataset.

  • split – The split to load, same as in datasets.load_dataset.

  • table_name – The name of the table. If not provided, the table_name is set to split.

  • dataset_name – The name of the dataset. If not provided, dataset_name is set to path if name is not provided, or to {path}-{name} if name is provided.

  • project_name – The name of the project. If not provided, project_name is set to hf-{path}.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the Hugging Face Hub dataset.

static from_image_folder(
root: str | pathlib.Path | tlc.Url,
*,
image_column_name: str = 'image',
label_column_name: str = 'label',
include_label_column: bool = True,
extensions: str | collections.abc.Collection[str] | None = None,
label_overrides: dict[str,
tlc.schemas._schema.MapElement | str] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from an image folder.

This function can be used to load a folder containing subfolders where each subfolder represents a label, or to recursively load all matching images in a folder structure without labels. This provides similar functionality to torchvision’s ImageFolder dataset, but uses the 3LC URL system for file discovery.

When include_label_column is True, the dataset elements are returned as tuples of a PIL.Image and the integer class label. When include_label_column is False, PIL.Images are returned without labels. In this case, root will be recursively scanned.

Parameters:
  • root – The root directory of the image folder.

  • image_column_name – The name of the column containing the images.

  • label_column_name – The name of the column containing the class labels.

  • include_label_column – Whether to include a column of class labels in the table.

  • extensions – A list of allowed image extensions. If not provided, a default list of image extensions is used.

  • label_overrides – A sparse mapping of class names (the directory names) to new class names. A new class name can be a string with the new class name or a MapElement.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

static from_names(
*,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: Url | str | None = None,
) Table

Create a table from the names specifying its url.

Parameters:
  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url.

Returns:

The table at the resulting url.

static from_ndjson(
ndjson_file: str | pathlib.Path | tlc.Url,
*,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a NDJSON file.

Parameters:
  • ndjson_file – The url of the NDJSON file.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the NDJSON file.

static from_pandas(
df: pandas.DataFrame,
*,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a Pandas DataFrame.

Parameters:
  • df – The Pandas DataFrame to create the table from.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the pandas DataFrame.

static from_parquet(
parquet_file: str | pathlib.Path | tlc.Url,
*,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a Parquet file.

Parameters:
  • parquet_file – The url of the Parquet file.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the Parquet file.

static from_torch_dataset(
dataset: torch.utils.data.Dataset,
*,
all_arrays_are_fixed_size: bool = False,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a torch Dataset.

This constructor is designed primarily as a bridge for torchvision DatasetFolder and VisionDataset instances. For those, 3LC preserves the source: images are referenced by their on-disk paths (no copies). Any transform / target_transform / transforms attached to the source dataset is stripped before serialization and is not applied when reading from the Table; reattach it explicitly via Table.with_transform().

For arbitrary torch.utils.data.Dataset subclasses, this method falls back to materializing every sample by calling dataset[i] and serializing the result inline. This works, but it is rarely what you want for a 3LC Table:

  • Images returned as PIL/tensors are stored as bulk data inside the Table, duplicating bytes that already exist on disk or in cloud storage.

  • Tensors and arrays are similarly serialized inline, losing any link to the source they were derived from (a file, a URL, a parquet column).

  • Dataset transforms – especially augmentations – break the assumption that a Table row holds source-shaped data. Tables should hold the closest-to-source representation; transforms belong in the data loading pipeline, applied per-epoch during training.

Prefer one of the following when they fit your data:

Parameters:
  • dataset – The torch Dataset to ingest. Best results when this is a DatasetFolder / VisionDataset whose samples come from files; see the warnings above for the general case.

  • all_arrays_are_fixed_size – Whether all arrays (tuples, lists, etc.) in the dataset are fixed size. This parameter is only used when inferring a schema from a single sample in the dataset when no schema is provided.

  • schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the torch dataset.

.. note::

Transforms attached to a VisionDataset are not persisted in the Table’s JSON, and are not reapplied when reading samples from the Table. Reattach them with Table.with_transform to obtain a TableView that applies the transforms on read.

static from_url(
url: Url | str,
) Table

Create a table from a url.

Parameters:

url – The url to create the table from

Returns:

A concrete Table subclass

Raises:
static from_yolo_ndjson(
ndjson_file: str | pathlib.Path | tlc.Url,
image_folder: str | pathlib.Path | tlc.Url | None = None,
*,
split: str = 'train',
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | pathlib.Path | str | None = None,
if_exists: typing.Literal[raise,
reuse,
rename,
overwrite] = 'reuse',
add_weight_column: bool = True,
weight_column_value: float = 1.0,
description: str | None = None,
extra_columns: dict[str,
Schema | Mapping[str,
SchemaLike]] | None = None,
input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
) Table

Create a Table from a YOLO NDJSON file.

The first line is required to contain the ‘class_names’ and ‘task’ keys, and the rest of the lines are required to contain the ‘file’, ‘width’, ‘height’, ‘split’ and ‘annotations’ keys.

.. note::

image_folder is kept positional alongside ndjson_file so the common Table.from_yolo_ndjson(file, folder) form remains ergonomic. All other parameters are keyword-only.

Parameters:
  • ndjson_file – The url of the NDJSON file.

  • image_folder – The folder containing the images, used to handle relative paths. If not provided, relative image paths are made absolute with respect to the NDJSON file directory.

  • split – The split to load from the dataset. Rows with ‘split’ equal to this value will be loaded.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset. Falls back to split if not provided.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table. If not provided, the description is set to the one in the first line of the NDJSON file, or an empty string.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

Returns:

A Table populated from the YOLO NDJSON file.

static from_yolo_url(images_url: str | pathlib.Path | tlc.Url | collections.abc.Iterable[str | pathlib.Path | tlc.Url], *, categories: str | Sequence[str] | Sequence[dict[str, str]] | Sequence[tlc.schemas.MapElement] | dict[float, str] | dict[int, str] | dict[float, tlc.schemas.MapElement] | dict[int, tlc.schemas.MapElement] | None = None, task: typing.Literal[detect, segment, obb, pose] = 'detect', max_depth: int | None = None, allow_fetch_remote_data: bool = False, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | pathlib.Path | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None, **kwargs: typing.Any) Table

Create a Table from a YOLO dataset folder or file of images.

When images_url is a folder, label files are resolved from image paths by replacing the last images directory segment with labels and changing the extension to .txt. If the image path contains no images directory, the label file is expected next to the image with the same name and a .txt extension. If an image has no corresponding label file, or the label file is empty, no labels are added for that image.

With the following layout, a folder images_url would be images_url="/root/images":

root/
    images.txt
    images/
        image1.jpg
        image2.jpg
        subfolder/
            image3.jpg
            image4.jpg
    labels/
        image1.txt
        image2.txt
        subfolder/
            image3.txt

In the layout above, image4.jpg has no corresponding labels/subfolder/image4.txt and is included as an unlabeled image.

When images_url is a file (images_url="/root/images.txt" in the above example), the same layout is expected, but the image URLs are listed in the text file. Relative URLs are made absolute with respect to the directory containing the text file (i.e. the parent of the text file).

The following text file would be valid:

./images/image1.jpg               # Relative -> root/images/image1.jpg
images/image2.jpg                 # Relative -> root/images/image2.jpg
/root/images/subfolder/image3.jpg # Absolute -> root/images/subfolder/image3.jpg

This method can also be used to create a Table with a label column but no labeled instances, by providing images with no corresponding label files.

Parameters:
  • images_url – The location(s) of the folder(s) containing, or file(s) referencing, the images. Can be a single URL or a list of URLs.

  • categories – The categories of the table.

  • task – The task of the dataset. Can be either ‘detect’, ‘segment’, ‘pose’, or ‘obb’.

  • max_depth – The maximum depth to search for images. If None (default), the limit is set to 1 (i.e. only immediate children) for remote input URLs and unlimited for local files.

  • allow_fetch_remote_data – Whether to allow fetching remote images and label files if on remote storage. Defaults to False, meaning no remote data can be fetched, and an error is raised if required to. If True, the remote data is fetched as part of the table creating process. For large datasets, this will lead to two requests for each image, one for the full image and one for the corresponding label file. In such cases it is recommended to download a local copy and create the table from that.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage.

  • **kwargs – Additional task-specific keyword arguments. Only applies to the task “pose”.

Returns:

A Table populated from the YOLO dataset folder or file of images.

get_column_as_pyarrow_array(
name: str,
*,
combine_chunks: bool = True,
) Array | ChunkedArray

Return a the specified column of the table as a pyarrow table.

To get nested sub-columns, use dot notation. E.g. ‘column.sub_column’. The values in the column will be the row-view of the table. A column which is a PIL image in its sample-view, for instance, will be returned as a column of strings.

Parameters:
  • name – The name of the column to get.

  • combine_chunks – Whether to combine the chunks of the returned column in the case that it is a ChunkedArray. Defaults to True.

Returns:

A pyarrow table containing the specified column.

Raises:

KeyError – If the column does not exist in the table.

get_foreign_table_url(
column: str = FOREIGN_TABLE_ID,
) Url | None

Return the input table URL referenced by this table.

This method is intended for tables that reference a single input table. Typically, this would be a metrics table of per-example metrics collected using another table.

If the table contains a column named ‘input_table_id’ with value map indicating it references a input table by Url, this method returns the Url of that input table.

Parameters:

column – The name of the column to check for a foreign key.

Returns:

The URL of the foreign table, or None if no input table is found.

get_row_cache_size() int

Returns the size of the row cache in bytes.

get_rows_as_binary(
*,
exclude_bulk_data: bool = False,
) bytes

Return all rows of the table as a binary Parquet buffer, with optional exclusion of bulk data columns.

This method will return the ‘Table-representation’ of the table, which is the most efficient representation, since only references to the input data are stored.

Parameters:

exclude_bulk_data – Whether to exclude bulk data columns from the serialized rows.

Returns:

The rows of the table as a binary Parquet buffer.

get_simple_value_map(
value_path: str,
) dict[int, str] | None

Get the simple value map for a value path, mapping class indices to class names.

Parameters:

value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

Returns:

A simple value map for the value, or None if the value does not exist or does not have a value map.

get_value_map(
value_path: str,
) dict[float, MapElement] | None

Get the value map for a value path.

Parameters:

value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

Returns:

A value map for the value, or None if the value does not exist or does not have a value map.

is_all_parquet() bool

Return True if the backing data for this table is all parquet files.

is_descendant_of(
other: Table,
) bool

Return True if this table is a descendent of the provided table.

Parameters:

other – The table to check if this table is a descendant of.

Returns:

True if this table is a descendant of the provided table, False otherwise.

static join_tables(tables: collections.abc.Sequence[tlc._core.objects.table.Table] | collections.abc.Sequence[tlc.Url | str | pathlib.Path], *, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | str | pathlib.Path | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None) Table

Join multiple tables into a single table.

The tables will be joined vertically, meaning that the rows of the resulting table will be the concatenation of the rows of the input tables, in the order they are provided.

The schemas of the tables must be compatible for joining. If the tables have different schemas, the schemas will be attempted merged, and an error will be raised if this is not possible.

Parameters:
  • tables – A list of Table instances to join.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table.

  • root_url – The root url of the table.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table.

  • weight_column_value – The value to initialize the weight column with if add_weight_column is True.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

latest(
timeout: float = 30.0,
) Table

Return the most recent version of the table.

Uses the lineage index to walk descendants of this table’s URL and returns the newest one. Tables created in this process appear in the index via a fast-path; for tables that may have appeared in external scan sources, latest() waits up to timeout seconds for the next scheduler cycle.

Example:

table_instance = Table()
... # working
latest_table = table_instance.latest()
Parameters:

timeout – Seconds to wait for the next indexing cycle when no descendant is yet visible in this process. 0 returns the in-process fast-path result immediately. Defaults to 30.

Returns:

The latest version of the table.

Raises:

ValueError – If the latest version of the table cannot be found in the dataset or if an error occurs when attempting to create an object from the latest Url.

property name: str

The name of the table.

property pyarrow_schema: Schema | None

Returns a pyarrow schema for this table

revision(
tag: Literal[latest] | None = None,
table_url: Url | str = '',
table_name: str = '',
) Table

Return a specific revision of the table.

This function retrieves a specific revision of this table. The revision can be specified by tag, table_url, or table_name. If no arguments are provided, the current table is returned.

Parameters:
  • tag – The tag of the revision to return. Currently only ‘latest’ is supported.

  • table_url – The URL of the revision to return.

  • table_name – The name of the revision to return.

property rows_schema: Schema

Returns the schema for all rows of this table.

set_row_cache_url(
row_cache_url: Url | str,
) bool

Assign a new row_cache_url value.

Will set row_cache_populated to False if the cache file has changed.

Parameters:

row_cache_url – The new row_cache_url value.

Returns:

True if the row_cache_url value was changed, False otherwise.

set_value_map(
value_path: str,
value_map: dict[float, Any],
*,
edited_table_url: Url | str = '',
) Table

Set a value map for a specified numeric value within the schema of the Table.

Sets a value map for a value within the schema of the Table, returning a new table revision with the applied value map.

This method creates and returns a new revision of the table with a overridden value map for a specific numeric value.

Any item in a Schema of type NumericValue can have a value map. A value map is a mapping from a numeric value to a MapElement, where a MapElement contains metadata about a categorical value such as category names and IDs.

Partial Value Maps

Value maps may be partial, i.e. they may only contain a mapping for a subset of the possible numeric values. Indeed they can be floating point values, which can be useful for annotating continuous variables with categorical metadata, such as color or label.

For more fine-grained control over value map editing, see Table.set_value_map_item and Table.add_value_map_item, and Table.delete_value_map_item.

Parameters:
  • value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • value_map – The value map to set on the value. The value will be converted to a a dictionary mapping from floating point values to MapElement if it is not already.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map set.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

set_value_map_item(
value_path: str,
value: float | int,
internal_name: str,
*,
display_name: str = '',
description: str = '',
display_color: str = '',
url: Url | str = '',
edited_table_url: Url | str = '',
) Table

Update an existing value map item for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the table with a value map item added to a value in a column.

Example:

table = Table.from_url("cats-and-dogs")
new_table = table.set_value_map_item("label", 0, "cat")
# new_table is now a new revision of the table with a updated value map item added to the value 0 in the column
assert table.latest() == new_table, "The new table is the latest revision of the table."

To add a new value map item at the next available value in the value map, see Table.add_value_map_item.

To delete a value map item, see Table.delete_value_map_item.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.

  • value – The numeric value to add the value map item to. If the value already exists, the value map item will be updated.

  • internal_name – The internal name of the value map item. This is the primary identifier of the value map item.

  • display_name – The display name of the value map item.

  • description – The description of the value map item.

  • display_color – The display color of the value map item.

  • url – The url of the value map item.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

should_include_schema_in_json(
schema: Schema,
) bool

Only include the schema in the JSON representation if it is not empty.

squash(
*,
output_url: Url | str | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: Url | str | None = None,
input_tables: list[Table | Url | str] | None = None,
) Table

Create a copy of this table where all lineage is squashed.

A squashed table is a table where all lineage is merged. This is useful for creating a table that is independent of its parent tables. This function creates a new table with the same rows as the original table, but with no lineage. The new table is written to the output_url, or placed in the same project and dataset as this table if no output URL is provided.

Parameters:
  • output_url – The output url for the squashed table. Mutually exclusive with project_name, dataset_name, table_name, and root_url.

  • project_name – The project name to use for the squashed table. If not provided, the project_name of the original table is used.

  • dataset_name – The dataset name to use for the squashed table. If not provided, the dataset_name of the original table is used.

  • table_name – The name of the squashed table. If not provided, a uniquified variant of ‘squashed’ is used.

  • root_url – The root URL to use for the squashed table. If not provided, the root URL of the original table is used.

  • input_tables – Optional list of Tables or URLs to Tables to refer to as the input tables for the squashed table. By default, no tables are referred to as inputs.

Returns:

The squashed table.

table_rows() TableRows

Access the rows of this table as an immutable mapping.

to_pandas() DataFrame

Return a pandas DataFrame for this table.

Returns:

A pandas DataFrame populated from the rows of this table.

Raises:

ImportError – If pandas is not installed. Install it with pip install 3lc[pandas], pip install pandas or similar.

static transform_value(
schema: Schema | None,
item: object,
) object

Transform a single table value according to the schema.

3LC currently only uses pure string representations of datetime values. This helper function is used to convert any timestamps to strings.

Parameters:
  • schema – The schema corresponding to the column of the value.

  • item – The value to transform.

property weights_column_name: str | None

Return the name of the column containing the weights for this table, or None if no such column exists.

with_transform(
transform: Callable[[Any], Any],
) TableView

Return a map-style view that applies transform to each sample on read.

The returned view is not a Table. It implements the MapDataset protocol (__len__, __getitem__) and exposes url, which forwards to this Table. Pass the view directly to tlc.collect_metrics(), or to any torch.utils.data.DataLoader.

Each call returns a fresh TableView instance; two calls with the same transform are not the same Python object. Hoist the view (view = table.with_transform(fn)) when you need a stable reference across calls.

Parameters:

transform – A callable applied to each sample before it is returned. Receives the raw sample produced by this Table and returns the transformed sample. Must be picklable (top-level function or importable callable, not a lambda or local closure) when the view is consumed by a torch.utils.data.DataLoader with num_workers > 0.

Returns:

A TableView over this Table.

write_to_row_cache(
*,
create_url_if_empty: bool = False,
overwrite_if_exists: bool = True,
) None

Cache the table rows to the row cache Url.

If the table is already cached, or the Url of the Table is an API-Url, this method does nothing.

In the case where self.row_cache_url is empty, a new Url will be created and assigned to self.row_cache_url if create_url_if_empty is True, otherwise a ValueError will be raised.

Parameters:
  • create_url_if_empty – Whether to create a new row cache Url if self.row_cache_url is empty.

  • overwrite_if_exists – Whether to overwrite the row cache file if it already exists.

class TableView(
source: Table | TableView,
transform: Callable[[Any], Any],
)

A map-style view over a tlc.Table that applies a sample-level transform on every read.

Implements the MapDataset protocol used by tlc.collect_metrics() and any torch.utils.data.DataLoader. Not itself a Table: it has no schema, no persistence, and no object-registry identity. Its url forwards to the underlying Table so metrics collected through it can be linked back to the source.

Views compose: wrapping a TableView in another TableView chains the transforms. source and url always resolve to the root Table, regardless of chain depth.

Construct via tlc.Table.with_transform() or by chaining tlc.TableView.with_transform().

property dataset_name: str | None

The dataset name of the root Table.

property name: str

The name of the root Table.

property project_name: str | None

The project name of the root Table.

property root_url: Url | None

The root URL of the root Table.

property source: Table

The root Table underlying this view (walking through any chained TableView wrappers).

Useful for sampler construction (e.g. create_sampler(view.source, ...)).

property table_rows: TableRows

The rows of the root Table.

property url: Url

The URL of the root Table. The view itself is not persisted.

property weights_column_name: str | None

The name of the weight column of the root Table.

with_transform(
transform: Callable[[Any], Any],
) TableView

Return a new TableView that applies transform on top of this view’s transform.

Parameters:

transform – A callable applied to each sample after this view’s transform has run. Must be picklable (top-level function or importable callable, not a lambda or local closure) when the view is consumed by a torch.utils.data.DataLoader with num_workers > 0.

Returns:

A TableView chaining transform on top of this view.

class TableWriter(
*,
bulk_data_chunk_size_mb: float = DEFAULT_BULK_DATA_CHUNK_SIZE_MB,
bulk_data_context_key: str = DEFAULT_BULK_DATA_SEQUENCE_ID_COLUMN_NAME,
bulk_data_url: tlc.Url | str | None = None,
schema: Schema | Mapping[str,
SchemaLike] | None = None,
project_name: str | None = None,
dataset_name: str | None = None,
table_name: str | None = None,
root_url: tlc.Url | str | None = None,
table_url: tlc.Url | str | None = None,
if_exists: typing.Literal[overwrite,
rename,
raise] = 'rename',
description: str = '',
input_tables: list[tlc.Url] | None = None,
)

A class for writing batches of rows to persistent storage.

Rows are transformed through the writer pipeline (schema resolution, per-leaf to_row() with externalization context, chunk-pattern packing, URL relativization) and accumulated as PyArrow record batches until finalize() writes them out as a parquet-backed Table.

Example:

table_writer = TableWriter(
    project_name="My Project",
    dataset_name="My Dataset",
    table_name="My Table"
)
table_writer.add_batch({"column1": [1, 2, 3], "column2": ["a", "b", "c"]})
table_writer.add_row({"column1": 4, "column2": "d"})
table = table_writer.finalize()

Initialize a TableWriter.

Parameters:
  • bulk_data_chunk_size_mb – The size of the chunk in MB for chunk-pattern bulk data (default: 50.0 MB).

  • bulk_data_context_key – The column name to use as the context key for chunk-pattern bulk data (default: “sequence_id”).

  • bulk_data_url – Optional base URL for bulk data storage. Both chunk-pattern and file-pattern data are stored under this location. Chunk-pattern data goes to <bulk_data_url>/chunks/<table_stem>/, file-pattern data goes to <bulk_data_url>/samples/<table_stem>/. Defaults to <table_parent>/bulk_data/ (or <table_parent_parent>/bulk_data/ for non-metrics tables). Per-column overrides via Schema.bulk_data_location take precedence for file-pattern columns.

  • schema – Optional schema for the table. Can be a Schema object with values for the columns, or a dict mapping column names to Schema objects. Columns you don’t declare are inferred from the first batch.

  • project_name – The name of the project.

  • dataset_name – The name of the dataset.

  • table_name – The name of the table, defaults to “initial”.

  • root_url – The root URL to write the table to. If not provided, the default root URL is used.

  • table_url – An optional url to manually specify the Url of the written table. Mutually exclusive with project_name, dataset_name, table_name, and root_url.

  • if_exists – The option to use when the table already exists.

  • description – An optional description of the table.

  • input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.

add_batch(
table_batch: MutableMapping[str, Any],
) None

Add a batch of rows to the buffer for writing.

This method validates the consistency of the batch and appends it to the buffer. When the buffer reaches its maximum size, it is automatically flushed to disk.

Parameters:

table_batch – A dictionary mapping column names to lists of values.

Raises:

ValueError – If the columns in the batch have unequal lengths or mismatch with existing columns.

add_row(
table_row: MutableMapping[str, Any],
) None

Add a single row to the table being written.

Parameters:

table_row – A dictionary mapping column names to values.

clear() None

Clear the buffer and reset the internal state.

finalize() Table

Write all added batches to disk and return the written table.

get_finalized_table() Table | None

Get the result of the table writing operation.

Returns None if the context manager exited due to an exception or if the result hasn’t been set yet.

Returns:

The written table, or None if not available.

class Url(
value: str | Path | Url | None = None,
scheme: str | None = None,
normalized_path: str | None = None,
query: str | None = None,
)

Bases: abc.ABC

A class which represents a URL.

A URL in 3LC is a combination of a scheme and a path. Many methods in 3LC accept URLs as arguments and/or return URLs. They are also used to refer to tlc.Tables and to cross reference between them. A file URL in 3LC will behave identically on both Posix and Windows systems.

Since a URL in 3LC might contain aliases, and even the scheme might not be determined until aliases are expanded, it is important to note which methods and properties will expand.

The path and scheme properties of the URL will expand aliases

Example: Scheme is determined from the input string

file_url = Url("/path/to/file")  # Or Url("file:///path/to/file")
file_url.scheme == Scheme.FILE
file_url.path == "/path/to/file"
str(file_url) == "/path/to/file"  # omit file:// scheme

s3_url = Url("s3://bucket/path/to/object")
s3_url.scheme == Scheme.S3
s3_url.path == "bucket/path/to/object"
str(s3_url) == "s3://bucket/path/to/object"  # include s3:// scheme

gcs_url = Url("gs://bucket/path/to/object")
gcs_url.scheme == Scheme.GS
gcs_url.path == "bucket/path/to/object"
str(gcs_url) == "gs://bucket/path/to/object"  # include gs:// scheme

relative_url = Url("path/to/file")
relative_url.scheme == Scheme.RELATIVE
relative_url.path == "path/to/file"
str(relative_url) == "path/to/file"  # omit relative:// scheme

# *Aliases are expanded when the URL is used*
# Assume <SAMPLE_DATA> is **not** registered
alias_url = Url("<SAMPLE_DATA>/data.csv")
alias_url.scheme == Scheme.ALIAS
alias_url.path == "<SAMPLE_DATA>/data.csv"
str(alias_url) == "<SAMPLE_DATA>/data.csv"

# The registry is read-only over a pluggable provider. In standalone tlcurl
# use, install a provider that wraps a dict you own; in tlc use, call
# tlc.url.register_url_alias / tlc.url.unregister_url_alias to mutate the
# configuration store the provider reads from.
aliases: dict[str, str] = {}

class _InMemoryProvider:
    def get_aliases(self) -> dict[str, str]:
        return dict(aliases)

UrlAliasRegistry.set_provider(_InMemoryProvider())

# Register the alias by mutating the dict the provider reads.
aliases["<SAMPLE_DATA>"] = "/path/to/data"
# It will now be expanded when using path and scheme properties
alias_url.scheme == Scheme.FILE
alias_url.path == "/path/to/data/data.csv"
str(alias_url) == "<SAMPLE_DATA>/data.csv"

# Swap to an alternative alias.
aliases["<SAMPLE_DATA>"] = "/alternate/path/to/data"
alias_url.scheme == Scheme.FILE
alias_url.path == "/alternate/path/to/data/data.csv"

del aliases["<SAMPLE_DATA>"]
Terminology:

  • A normalized URL has a scheme, uses single-forward slashes as path separator, and does not end-with a slash.

  • An expanded URL has aliases expanded, and is normalized.

  • An absolute URL is a expanded which means that it can be used as a stable persisted reference.

    • Relative URLs are converted to absolute URLs based on an “owner” URL, or, if applicable, the current working directory of the process

  • Relative and Api URLs will have “relative://” or “api://” as their scheme but these schemes will be omitted from the stringified representation.

Caveats:

  • The URL does not make any network calls or access to the file system. It therefore cannot resolve symlinks, and use of these is discouraged in combination with 3LC.

  • There are a few exotic Windows paths that are not supported:

    • The use of a Windows-drive letter without a slash, e.g. C:foo/bar, is not supported. Use C:/foo/bar instead.

Parameters:
  • value – The URL as a string, Path, or Url object. When this argument is passed as a string, it will be normalized and the scheme is deduced from the string contents.

  • scheme – The scheme of the URL, if known.

  • normalized_path – The normalized path of the URL, if known. If both scheme and normalized_path are passed, they will be used directly without any normalization or parsing. It is the responsibility of the caller to ensure that the scheme and normalized_path are valid.

  • query – The query component of the URL (the part after ?), if known. Only meaningful together with scheme and normalized_path; when value is parsed, any query string is split out automatically.

Raises:

ValueError – If the URL is specified with both value and scheme/path.

static absolute_from_relative(
url: Url,
owner: Url | str | None = None,
) Url

Convert a relative URL to an absolute URL, given an owner URL.

Parameters:
  • url – The relative URL to convert.

  • owner – The owner URL, if necessary for conversion.

static api_url_for_object(
obj: object,
) Url

Get the API URL for an object.

This is the default URL for an object when a persistent URL is not specified. API URLs allow objects to be addressable as long as they are in memory.

Parameters:

obj – The object to get the API URL for.

apply_aliases() Url

Apply all registered aliases to this URL.

Returns:

The URL with aliases applied.

create_sibling(
name: str,
) Url

Create a new Url next to the current Url.

Example:

Url("C:/path/to/file.json").create_sibling("umap.json") == Url("C:/path/to/umap.json")
Url("C:/path/to/dir").create_sibling("other") == Url("C:/path/to/other")
Parameters:

name – The name of the new Url.

Returns:

A new Url next to the current Url.

create_unique() Url

Create a unique version of the Url.

This method will create a unique URL by appending a unique identifier to the URL, if necessary.

Returns:

A unique Url.

classmethod cwd() Url

Get the current working directory as a URL.

Returns:

The current working directory as a URL.

delete() None

Delete the URL.

Raises:

Exception – If the URL cannot be deleted.

escape() str

Double-escape the URL string to handle paths in service endpoints.

Some services require double-escaping to process URLs correctly due to internal un-escaping passes.

Returns:

A double-escaped URL string.

exists() bool

Check if the URL exists.

Returns:

True if the URL exists, False otherwise.

Raises:

Exception – If the URL cannot be accessed.

expand_aliases(
*,
allow_unexpanded: bool = True,
) Url

Expand aliases in the URL.

Parameters:

allow_unexpanded – If True, aliases that cannot be expanded will be left in the URL. If False, an exception will be raised if an alias cannot be expanded.

Returns:

The scheme and path of the URL with aliases expanded.

property extension: str

Get the extension of the URL.

Example:

Url("example.json").extension == ".json"
Returns:

The extension of the URL.

flush() None

Raise an error to prevent Url being used in place of str, pathlib.Path or file object.

Implemented to ensure that a Url is not used in the place of a str, pathlib.Path or file object in cases where the silent failure would be confusing. Raises a more helpful error message.

static get_normalized(
value: str,
) tuple[str, str]

Get the normalized value of the string representation of a URL.

Parameters:

value – The URL to normalize.

Returns:

A tuple of (scheme, normalized_path).

static get_path_type(
path: str,
) str

Determine if a path, without scheme, is a Windows or Posix path.

static get_scheme(
value: str,
) str

Get the scheme of the string representation of a URL.

Parameters:

value – The URL as a string.

Raises:

ValueError – If the URL scheme is not supported.

Returns:

The scheme of the URL.

is_absolute() bool

Check if the normalized, unexpanded URL is absolute.

Notice that this method does not expand aliases.

Returns:

True if the URL is absolute, False otherwise.

is_descendant_of(
other: Url,
) bool

Check if the URL is a descendant of another URL.

Parameters:

other – The URL to check if the current URL is a descendant of.

Returns:

True if the URL is a descendant of the other URL, False otherwise.

join(
other: Url,
) Url

Join two URLs.

The other URL needs to be a relative URL

Parameters:

other – The URL to join with the current URL. Required to be relative.

Returns:

A new URL, which is the result of joining the current and other URLs.

Raises:

ValueError – If the other URL is not relative.

static join_url(
scheme: str | None,
path: str,
) str

Join a scheme and a path into a URL.

Parameters:
  • scheme – The scheme.

  • path – The path.

Returns:

The URL with scheme applied

make_parents(
*,
exist_ok: bool = False,
) None

Make all parent directories of the URL.

Parameters:

exist_ok – If True, do not raise an exception if the directory already exists.

Raises:

Exception – If the URL cannot be accessed.

property name: str

Get the name of the URL.

Example:

Url("C:/folder/file.txt").name == "file.txt"
Url("C:/folder").name == "folder"
Returns:

The name of the URL.

static normalize_chars(
url: str,
) str

Normalize characters in a URL.

Parameters:

url – The URL to normalize.

Returns:

The normalized URL.

open(
mode: str,
) BufferedReader | TextIOWrapper

Open the URL as a file.

Parameters:

mode – The file mode to use when opening the URL.

Returns:

A file-like object.

Raises:

TypeError – If the URL cannot be opened as a file.

property parent: Url

Get the parent URL of the URL.

Returns:

The parent URL.

property parts: list[str]

Get the parts of the URL (path segments).

Example:

Url("C:/folder/file.txt").parts == ["C:", "folder", "file.txt"]
Url("pxt://db/dir/table?query=1").parts == ["db", "dir", "table"]
Returns:

The parts of the URL.

property path: str

Return the path of the expanded URL.

Calling this method will expand aliases in the URL.

This will return the path without a scheme, so e.g. an S3 URL will return the path without the protocol.

Url("s3://bucket/table.json").path == "/bucket/table.json"
Url("relative://foo/bar").path == "foo/bar"
Url("http://example.com/path?query=1").path == "example.com/path"
property query: str

Get the query string from the URL (everything after ‘?’).

This is useful for URL schemes that use query parameters (e.g., pxt://). For file:// URLs this typically returns an empty string.

Example:

Url("pxt://db/table?pgdata=/path").query == "pgdata=/path"
Url("file:///path/to/file.txt").query == ""
Returns:

The query string without the leading ‘?’, or empty string if none.

read_bytes() bytes

Read the contents of the URL as bytes.

read_text(
*,
encoding: str = 'utf-8',
) str

Read the contents of the URL as text.

Parameters:

encoding – The encoding to use when reading the content, defaults to “utf-8”.

Returns:

The content of the referenced file as text.

static relative_from(
url: Url,
owner: Url | None,
) Url

Transform a URL into relative form taking a given owner URL into account.

Create an URL relative to the given owner URL that is equivalent to the absolute URL. The owner URL can be a parent directory of the absolute URL, but it may also be a directory or file that shares part of the absolute URL’s path. If the absolute URL and owner URL are not compatible, the function will raise a ValueError

If the transformation is not possible, for example if the URL and the owner have different schemes, the function will return the original URL.

Example:

# Owner URL is a directory
absolute_url = "s3://bucket/path/to/file.ext"
owner_url = "s3://bucket/path"
relative_url = Url.relative_from_absolute(absolute_url, owner_url)
str(relative_url) == "to/file.ext"

# Owner URL is a file
absolute_url = "s3://bucket/path/to/file2.ext"
owner_url = "s3://bucket/path/to/file1.ext"
relative_url = Url.relative_from_absolute(absolute_url, owner_url)
assert str(relative_url) == "../file2.ext"
Raises:

ValueError – If the absolute URL and owner URL are not compatible

replace(
old: str,
new: str,
) Url

Replace occurrences of a substring in the URL with a new substring.

The intended use case for this method is to e.g., replace a file extension in a URL.

This methods textually replaces occurrences of the old substring with the new substring in the path of the URL. Notice that the replacement will happen on the normalized path, which is not necessarily identical to the path passed to the Url constructor when it was first created.

Changing the scheme of the URL is not supported, however it is possible to replace an alias. If the alias contains the scheme (e.g. url.scheme == ALIAS) the scheme can be changed.

Notice that this method does not expand aliases.

Parameters:
  • old – The substring to be replaced.

  • new – The new substring to replace the old substring.

Returns:

A new URL with the specified substring replaced.

property scheme: str

Return the scheme of the expanded URL.

Calling this method will expand aliases in the URL. If the alias cannot be expanded, it will return Scheme.ALIAS.

To access the scheme of the URL without expanding aliases, use the _scheme member variable.

Returns:

The scheme of the URL.

Raises:

ValueError – If the url scheme cannot be determined.

static split_url(
value: str,
) tuple[str, str]

Split a URL into a scheme and a path.

Unlike urlparse, this function does not require a scheme to be present in the URL. It will also not parse the drive letter (e.g. C:/) in a Windows URL as part of the URL.

property stem: str

Get the stem of the URL.

Example:

Url("example.json").stem == "example"
Returns:

The stem of the URL.

to_absolute(
owner: Url | str | None = None,
) Url

Convert a relative URL to an absolute URL.

Parameters:

owner – The owner URL, if necessary for conversion.

Returns:

An absolute URL.

Raises:

NotImplementedError – If the conversion is not supported.

to_minimal_dict(
_: bool = False,
) str

Convert the URL to a minimal, serializable representation.

Returns:

The URL as a str.

to_relative(
owner: Url | str | None = None,
) Url

Relativize a URL, including applying aliases.

Parameters:

owner – The owner URL, if necessary for conversion.

Returns:

A relative URL if possible, otherwise the original URL.

Raises:

NotImplementedError – If the conversion is not supported.

to_relative_with_max_depth(
owner: Url | None,
max_depth: int,
) Url

Relativize the given URL with respect to the given owner URL, up to a maximum depth.

If url does not have a common prefix with owner up to max_depth, url is returned with only aliases.

Parameters:
  • url – The URL to relativize.

  • owner – The URL to relativize with respect to.

  • max_depth – The maximum depth to relativize up to.

Returns:

The relativized URL.

to_str() str

Convert the URL to a normalized string.

This returns the normalized, un-expanded URL as a string.

Returns:

The URL as a string.

write_bytes(
content: bytes | str,
*,
encoding: str = 'utf-8',
if_exists: typing.Literal[overwrite,
rename,
raise] = 'overwrite',
) None

Write bytes content to a URL.

Parameters:
  • content – The content to write. If a string is provided, it will be encoded using the specified encoding.

  • encoding – The encoding to use when encoding string content to bytes, defaults to “utf-8”.

  • if_exists – The write options to use when writing, can be “overwrite”, “rename”, or “raise”.

write_text(
content: str | bytes | typing.Any,
*,
encoding: str = 'utf-8',
if_exists: typing.Literal[overwrite,
rename,
raise] = 'overwrite',
) None

Write text content to a URL.

Parameters:
  • content – The content to write. If bytes are provided, they will be decoded using the specified encoding. If a non-string type is provided, it will be converted to a string using str().

  • encoding – The encoding to use when decoding bytes content to text, defaults to “utf-8”.

  • if_exists – The write options to use when writing, can be “overwrite”, “rename”, or “raise”.

active_project_name() str | None

Return the active project name, if any.

active_run() Run | None

Return the active Run, if any.

close() None

Close a run session

Recommended to call at the end of training to make sure, all training data hook is saved. It blocks the running until all data hooks are saved.

collect_metrics(
table: MapDataset[Any],
metrics_collectors: tlc.metrics.collectors.metrics_collector_base.MetricsCollectorType,
*,
predictor: Module | Predictor | None = None,
foreign_table_url: Url | str | None = None,
constants: dict[str, Any] | None = None,
constants_schemas: dict[str, Schema] | None = None,
run_url: Url | str | None = None,
collect_aggregates: bool = True,
split: str = '',
exclude_zero_weights: bool = False,
dataloader_args: dict[str, Any] | None = None,
) None

Collect per-sample metrics with a map-style dataset.

  • Writes a single metrics table joined to a foreign Table by row index. The written metrics table will contain any constants contained in the constants argument, as well as any metrics computed by the metrics collectors.

  • Adds the metadata of the metrics table to the metrics property of the Run.

  • Adds the Url of the foreign Table to the Run as an input.

  • Collects aggregate values from the metrics collectors and add them to the Run.

The dataset’s index i is interpreted as the row index of the foreign Table for the per-sample join. Pass a Table or TableView to derive the foreign URL automatically, or any other MapDataset together with foreign_table_url to declare the join explicitly. The two paths are mutually exclusive: passing foreign_table_url alongside a Table/TableView is rejected.

Parameters:
  • table – A map-style dataset (any object with __len__ and __getitem__). A Table or TableView works directly; for a custom dataset, foreign_table_url must be passed so metrics can be linked back to a Table. (Parameter will be renamed to dataset in 3.0.)

  • metrics_collectors – A list of metrics collectors to use. Can be a single metrics collector, a list of metrics collectors, or a list of callables with the signature Callable[[Any, PredictorOutput], dict[str, Any]].

  • constants – A dictionary of constants to use when collecting metrics.

  • constants_schemas – A dictionary of schemas for the constants. If no schemas are provided, the schemas will be inferred from the constants.

  • run_url – The url of the run to add the metrics to. If not specified, the active run will be used. If no active run is found, a new run will be created.

  • collect_aggregates – Whether to collect aggregate values from the metrics collectors and add them to the Run. This allows an aggregate view to be shown in the Project page of the 3LC Dashboard. Aggregate values are computed for all computable columns in the metrics collectors, and are prefixed with the split name. For example, if a metrics collector defines a computable column called “accuracy”, and the split is “train”, then the aggregate value will be called “train_accuracy_avg”.

  • split – The split of the dataset. This will be prepended to the aggregate metric names.

  • exclude_zero_weights – Whether to exclude samples with zero weights when collecting metrics. Reads weights from the foreign Table; requires foreign_table_url= (or that table is a Table or TableView).

  • foreign_table_url – Url of the Table to link the metrics back to. Required when table is a custom map-style dataset; must NOT be passed when table is itself a Table or TableView (the URL is derived from table.url).

  • dataloader_args – Additional arguments to pass to the dataloader. Samples produced by table (after any transform) must be combinable by the active collate_fn — the default torch.utils.data.default_collate handles tensors, numbers, strings, and dict/list/tuple trees thereof. For heterogeneous samples (e.g. PIL images, variable-length sequences), pass {"collate_fn": <your fn>} here.

Raises:

ValueError – If table is a DataLoader; if foreign_table_url is provided alongside a Table or TableView table; or if table is a custom map-style dataset and foreign_table_url is not provided.

config: Configuration = None

A lazy alias for the live Configuration singleton. Use this to access and modify the live configuration.

Example:

import tlc
tlc.config.logging.level = "DEBUG"
init(
project_name: str | None = None,
run_name: str | None = None,
*,
description: str | None = None,
parameters: dict[str,
typing.Any] | None = None,
if_exists: typing.Literal[reuse,
overwrite,
rename,
raise] = 'rename',
root_url: tlc.Url | str | None = None,
run_url: tlc.Url | str | None = None,
) Run

Initialize a 3LC Run.

Initializes a 3LC Run object and sets it as the active run for the current session. Starts the 3LC indexing threads.

.. note::

project_name and run_name are kept positional so the common tlc.init("my-project", "my-run") form remains ergonomic. All other parameters are keyword-only.

Parameters:
  • project_name – Name of the project. If empty, the run will be stored under a default project.

  • run_name – Name of the Run. If empty, a random name will be generated.

  • description – Description of the run.

  • parameters – Parameters of the run.

  • if_exists – How to deal with existing runs. Options are “reuse”, “overwrite”, “rename”, “raise”.

  • root_url – The root url to use. If not provided, the project root url will be used.

  • run_url – Url to the run. Mutually exclusive with run_name, project_name, and root_url.

Returns:

A Run object.

Raises:

ValueError – If run_url is provided together with project_name, root_url, or run_name.

log(
data: dict[str, Any],
run: Run | Url | None = None,
) None

Log output data to the active Run or a specified Run.

If keys ‘epoch’ or ‘iteration’ are present in the data, charts for the logged data will be created against those values in the Runs overview in the Dashboard.

Note

This function is intended for logging output data for a Run as a whole, or aggregated over an epoch or iteration. For logging data for individual samples, refer to the Collect Metrics section in the User Guide.

Parameters:
  • data – The data to log.

  • run – The Run to log the data to. If not provided, the active Run will be used.

Raises:

ValueError – If no Run is provided and there is no active Run.

set_active_run(
run: Run | Url,
) None

Set the active Run.