tlc¶
3LC (Three Lines of Code) Python Package.
3LC is a tool for understanding and improving machine learning models and datasets. The tlc package is the
Python entry point: it constructs and reads 3LC tlc.Tables and tlc.Runs, collects per-sample
metrics, and serves data to the 3LC Dashboard through the Object Service.
The top-level tlc namespace is the supported public interface. The names listed in tlc.__all__, together
with the curated sub-namespaces listed below, form the stable API and follow semantic versioning: breaking changes are
reserved for major releases, additions land in minor releases, and patch releases are non-breaking.
Anything underscored is private. tlc._core and any module or attribute whose name starts with _ may move, rename, or
be removed at any time without notice. Reach into them only when nothing in the public surface fits, and expect to
update on every release.
Modules and Packages¶
Module |
Description |
|---|---|
The 3LC configuration types. |
|
Public constants for the |
|
Data-bearing types for working with 3LC Table data. |
|
Exporters for converting 3LC tables into common dataset formats. |
|
Utility helper classes for working with 3LC concepts. |
|
Integrations with third party libraries which are optional dependencies. |
|
Core functionality for collecting per-sample metrics with a model on a |
|
Base classes for the 3LC object hierarchy. |
|
Dimensionality reduction methods and utilities. |
|
Built-in and custom sample types for 3LC tables. |
|
Built-in schemas for describing 3LC table columns. |
|
Public URL API. |
Package Contents¶
Classes¶
Class |
Description |
|---|---|
A class for writing metrics tables to runs. |
|
Represents a single execution of a specific process or experiment. |
|
A schema is a recursive structure which defines the layout of an object. It defines what elements the object consists of, which must be either |
|
The abstract base class for all Table types. |
|
A map-style view over a |
|
A class for writing batches of rows to persistent storage. |
|
A class which represents a URL. |
Functions¶
Function |
Description |
|---|---|
Return the active project name, if any. |
|
Return the active Run, if any. |
|
Close a run session |
|
Collect per-sample metrics with a map-style dataset. |
|
Initialize a 3LC Run. |
|
Log output data to the active Run or a specified Run. |
|
Set the active Run. |
Data¶
Data |
Description |
|---|---|
A lazy alias for the live |
API¶
- class MetricsTableWriter(
- *,
- run_url: Url | str | None = None,
- foreign_table_url: Url | str = '',
- schema: dict[str, Schema] | None = None,
- stream_name: str = 'default_stream',
Bases:
tlc._core.writers.table_writer.TableWriterA class for writing metrics tables to runs.
Calling
finalize()writes the metrics table to persistent storage and automatically updates the corresponding Run to reference the newly written table.If a
foreign_table_urlis supplied, the written metrics table will also be associated with the given foreign table, indicating that each metric value is associated with a specific row in the foreign table.For this to work, each added metrics batch must contain a column called
example_id. This is the foreign key that links the metrics table to the foreign table. The values ofexample_idare linear indices into the foreign table, starting from 0. A single metrics table can contain multiple values for the sameexample_id, and does not need to contain values for allexample_ids in the foreign table.Example:
from tlc import MetricsTableWriter # Assuming a input table of length 8 exists at the url "input_table_url" run = tlc.init() with MetricsTableWriter( run_url=run.url, foreign_table_url="input_table_url", ) as metrics_writer: # First batch of metrics, corresponding to the first 4 rows of the foreign table metrics_writer.add_batch({ "loss": [0.1, 0.2, 0.3, 0.4], "example_id": [0, 1, 2, 3], }) # Second batch of metrics, corresponding to the last 4 rows of the foreign table metrics_writer.add_batch({ "loss": [0.2, 0.4, 0.1, 0.5], "example_id": [4, 5, 6, 7], }) # The run is automatically updated with the written metrics table on finalize/exit.
Initialize a MetricsTableWriter.
- Parameters:
run_url – The Url of the run to write metrics for. Will default to the active run if not provided.
foreign_table_url – The Url of the dataset to write metrics for.
schema – A dictionary of column names to schema overrides. Schemas will be inferred from the data if not provided.
stream_name – Display label for the dashboard group this table joins into. Tables sharing a column signature should use the same
stream_name— conflicting names within a group are silently dropped in favor of a generic label.
- finalize() Table¶
Write all added batches to persistent storage, update the run, and return the written table.
- Returns:
The written metrics table.
- Raises:
RuntimeError – If finalize() has already been called on this writer.
- class Run(
- *,
- url: Url | None = None,
- created: str | None = None,
- last_modified: str | None = None,
- description: str | None = None,
- metrics: list[dict[str, Any]] | None = None,
- constants: dict[str, Any] | None = None,
- status: float | None = None,
- init_parameters: Any = None,
Bases:
tlc._core.objects.mutable_object.MutableObjectRepresents a single execution of a specific process or experiment.
Warning
Do not instantiate this class directly. Use one of the
Run.from_*methods ortlc.init()instead.A Run object encapsulates details about its setup, execution, metadata, and metrics.
Run objects are mutable, allowing for updates to run attributes as they progress or as additional information becomes available.
Create a Run object.
- Parameters:
url – The URL of the run.
created – The creation timestamp.
last_modified – The last modified timestamp.
description – The description of the run.
metrics – A list of metrics captured during this run.
constants – Constant values used during this run.
status – The status of the run.
init_parameters – Parameters used during object initialization.
- add_input_table( ) None¶
Adds an input table to the run.
This updates the Run object to include the input table in the list of inputs to the Run.
- Parameters:
input_table – The input table to add.
- add_input_value( ) None¶
Adds a value to the inputs of the run.
- Parameters:
input_value – The value to add.
- add_metrics(
- metrics: dict[str, Any],
- *,
- schema: dict[str, Schema] | None = None,
- foreign_table_url: Url | str | None = None,
- constants: dict[str, Any] | None = None,
Write the provided metrics to a Table and associate it with the run.
- Parameters:
metrics – The metrics data (dict of column names to column data) to write.
schema – The schemas for the metrics data.
foreign_table_url – The URL of the table to associate with the metrics data. If provided, the metrics data will be augmented with extra columns to identify the example ID and the foreign table, if these columns are not already present. If the metrics data does not correspond 1-to-1 with the table, ensure the metrics data includes an “example_id” column.
constants – The constants to add to the run.
- Returns:
The written table info.
- Raises:
ValueError – If the number of rows in the metrics data does not match the number of rows in the table, or the input_table_url is not a valid URL.
FileNotFoundError – If the input_table_url can not be found.
- add_metrics_table( ) None¶
Add a metrics table to the run.
- Parameters:
metrics_table – The metrics table to add.
- add_output_value( ) None¶
Adds a value to the outputs of the run.
- Parameters:
output_value – The value to add.
- copy(
- *,
- run_name: str | None = None,
- project_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- if_exists: typing.Literal[raise,
- rename,
- overwrite] = 'raise',
- destination_url: tlc.Url | str | None = None,
Create a copy of this run.
The copy is performed to:
A URL derived from the given project_name, run_name, and root_url if given
destination_url, if given
A generated URL derived from the run’s URL, if none of the above are given
- Parameters:
destination_url – The URL to copy the run to.
project_name – The name of the project to create the run in.
run_name – The name of the run to create.
root_url – The root URL to create the run in.
if_exists – What to do if the destination URL already exists.
- Returns:
The copied run.
- static from_names( ) Run¶
Creates a Run instance from the names specifying the URL of an existing Run.
- Parameters:
project_name – The name of the project.
run_name – The name of the run.
root_url – The root url to use instead of the default root url.
- Returns:
The Run at the resulting url.
- static from_url( ) Run¶
Creates a Run instance from the URL of an existing Run.
- Parameters:
url – The URL to the Run object.
- Returns:
The Run object.
- reduce_embeddings_by_foreign_table_url( ) dict[Url, Url]¶
Reduces all metrics tables in a Run using a reducer trained on the embeddings in a specified metrics table.
See
tlc.reduction.reduce.reduce_embeddings_by_foreign_table_url()for more information.- Parameters:
foreign_table_url – The Url of the foreign table to use for reduction.
delete_source_tables – If True, the source metrics tables will be deleted after reduction.
**kwargs – Additional keyword arguments.
- Returns:
A dictionary mapping the original table URLs to the reduced table URLs.
- reduce_embeddings_per_dataset( ) dict[Url, Url]¶
Reduces the embeddings for each dataset in this run.
See
tlc.reduction.reduce.reduce_embeddings_per_dataset()for more information.- Parameters:
delete_source_tables – If True, the source metrics tables will be deleted after reduction.
**kwargs – Additional keyword arguments.
- Returns:
A dictionary mapping the original table URLs to the reduced table URLs.
- set_description(
- description: str,
Set the description of the run.
- Parameters:
description – The description to set.
- class Schema(
- *,
- display_name: str = '',
- description: str = '',
- writable: bool = True,
- display_importance: float = 0,
- value: ScalarValue | None = None,
- values: dict[str, Schema] | None = None,
- composite_role: str = '',
- display_color: str = '',
- swap_group: str = '',
- computable: bool = True,
- transient: bool = False,
- default_visible: bool = True,
- size0: DimensionNumericValue | None = None,
- size1: DimensionNumericValue | None = None,
- size2: DimensionNumericValue | None = None,
- size3: DimensionNumericValue | None = None,
- size4: DimensionNumericValue | None = None,
- size5: DimensionNumericValue | None = None,
- metadata: dict[str, Any] | None = None,
- default_value: Any | None = None,
- array_signature_group: str | None = None,
- number_role_u: str | None = None,
- number_role_v: str | None = None,
- bulk_data_location: str | Url | None = None,
- sample_type: str | None = None,
A schema is a recursive structure which defines the layout of an object. It defines what elements the object consists of, which must be either
Atomic type (with optional metadata, e.g. value range, unit, etc.) OR
Composite contents (a list of schemas describing the sub-object)
In addition, it defines HOW MANY of these scalar or composite elements exist, in the form of up to six-dimensions which can each be described separately and be of fixed or variable lengths. The default size of dimensions is 1, describing a scalar value.
Schemas are used for
Defining the layout of Objects (as reported by e.g. “MyObject.schema”)
In the case of Tables: defining the common layout of all table rows (as reported by e.g “MyTableObject.schema.values[“rows”])
In the case where a schema defines a “top-level” object, it will always have a ‘values’ attribute (since it is always a composite object, and does not comprise only a single atomic value).
Initialize a Schema.
A schema is either atomic (has a
value) or composite (hasvalues). Exactly one ofvalueorvaluesmust be provided.- Parameters:
display_name – Human-readable name shown in the Dashboard.
description – Description of this schema element.
writable – Whether the value is editable in the Dashboard.
display_importance – Ordering hint for Dashboard column display.
value – The atomic scalar type (e.g.
Float32Value(),StringValue()). Mutually exclusive withvalues.values – Mapping of field names to child schemas for composite types. Mutually exclusive with
value.composite_role – Semantic role for composite schemas (e.g.
"bounding_boxes").display_color – Color hint for Dashboard visualization.
swap_group – Group identifier for column swapping in the Dashboard.
computable – Whether this column can be recomputed from source data.
transient – Whether this column is excluded from serialization.
default_visible – Whether this column is visible by default in the Dashboard.
size0 – First dimension descriptor.
size1 – Second dimension descriptor.
size2 – Third dimension descriptor.
size3 – Fourth dimension descriptor.
size4 – Fifth dimension descriptor.
size5 – Sixth dimension descriptor.
metadata – Arbitrary key-value metadata attached to this schema element.
default_value – Default value for this schema element.
array_signature_group – Group identifier for arrays that share the same shape signature.
number_role_u – Semantic role for the U component of 2D numeric values.
number_role_v – Semantic role for the V component of 2D numeric values.
bulk_data_location – URL or path prefix where bulk data files are stored for this column. When set, the
TableWriterexternalizes column data to files under this location.sample_type – Name of the registered sample type that converts between row form (serialized) and sample form (Python objects). For example,
"pil_png","numpy_array","segmentation_polygons". The resolved instance is available via theresolved_sample_typeproperty.Nonemeans identity (no conversion).
- add_sample_weight( ) None¶
Adds a sample weight column to the schema.
- Parameters:
hidden – Whether the column should be hidden
default_value – The default value for the sample weight column.
- add_sub_schema( ) None¶
Adds a Schema as a sub-property within this Schema (i.e. into the ‘values’ collection)
- add_sub_value(
- name: str,
- value: ScalarValue,
- *,
- writable: bool = True,
- computable: bool = True,
Adds a scalar value as a sub-property within this Schema (i.e. into the ‘values’ collection)
- consider_override_from( ) Schema¶
Selectively overwrite attributes in this schema with non-default ones from override_schema.
Merge semantics — sparse at both column and within-column level:
Columns missing from the override are left untouched.
Columns present in the override only overwrite attributes that differ from their default, so a partial override touches a handful of fields and inherits the rest from
self.Recursion into
values/size0mirrors the same rule per sub-schema.
This is the merge that backs
Table.override_table_rows_schema. It assumes the override is structurally honest: fields that make dimensionality claims (size0, composite-vs-scalar) must not contradictself’s — if they do, downstream stages that consume the merged schema may produce incoherent data. The factory-levelTableWriter(schema=...)/Table.from_*(schema=...)contract is stricter still (declared columns must be complete); see those APIs for that case.
- does_object_match(
- _object: Any,
Checks whether a schema matches an example object.
This requires exact 1:1 mapping between attributes in the object and the schema (including recursively). This means no attributes can be missing, nor can there be any additional attributes only present in the object.
- static from_any(
- any_object: Any,
Returns a Schema object which has been populated from a serialized (possibly sparse) object
- static from_json(
- json_string: str,
Returns a Schema object which has been populated from a JSON string
- from_row(
- row: Any,
Convert row form to sample form.
A column either has a real
SampleTypethat owns the entire column value, or no transform (identity). Composite schemas recurse into children only when the column-level transform is identity; sample view of a composite is therefore always a dict.File-storage columns are loaded via
load(), notfrom_row(), so this method passes through the data unchanged for file-storage transforms.When the sample type produces a numpy ndarray or torch Tensor and the schema’s
valueis numeric/bool, the result is cast to the schema’s declared dtype. This undoes the dtype widening that pyarrow’sto_pylist()introduces by materializing narrow scalars as Python ints/floats. Non-array results pass through the cast unchanged.- Parameters:
row – The row data to convert.
- Returns:
The data in sample form.
- classmethod from_sample( ) Schema¶
Infer a schema describing the provided Python value.
- Parameters:
sample – The sample to create a schema from.
all_arrays_are_fixed_size – If True, all arrays will be marked as fixed size.
- Returns:
The inferred schema.
- classmethod from_schema_like( ) Schema¶
Convert a SchemaLike value to a Schema.
Always returns a fresh copy — the input is never mutated or shared.
Schema objects are deep-copied.
Mappings are treated as
{column_name: SchemaLike}and converted recursively.
- Parameters:
schema_like – A Schema or a mapping of column names to SchemaLike values.
- Returns:
A new Schema (always a copy).
- Raises:
TypeError – If keys are not strings or the input type is unsupported.
- is_atomic() bool¶
Return whether the schema is atomic, i.e. has a single value.
The opposite of
is_composite.- Returns:
Whether the schema is atomic
- is_composite() bool¶
Return whether the schema is composite, i.e. has multiple values.
The opposite of
is_atomic.- Returns:
Whether the schema is composite
- is_fixed_size() bool¶
Return whether the schema has fixed size.
This requires all dimensions to be fixed size.
- is_scalar() bool¶
Return whether the schema is a scalar value
Sizes are required to be set in increasing dimensions without gaps and no size is treated like a scalar.
- last_dimension() DimensionNumericValue | None¶
Return the last (outermost) dimension of the Schema
- pop_dim() DimensionNumericValue | None¶
Sets size5 to None and shifts all other dimensions left. (size5 becomes size4, size4 becomes size3, etc.).
- Returns:
The old size0
- push_dim(
- dim: DimensionNumericValue | None = None,
Inserts dim as size0 and shifts all other dimensions right. (size1 becomes size0, size2 becomes size1).
- Parameters:
dim – The dimension to insert as size0
- Returns:
The old size5
- property resolved_sample_type: SampleType¶
Get the resolved SampleType instance for this schema.
Always returns a SampleType — never None. Resolution checks the explicit
- Py:
attr:
sample_typename through the legacy name mapping. ReturnsIdentityif no transform is resolved,Hiddenfor hidden columns.
The result is cached and invalidated automatically when attributes that affect resolution are modified (via
__setattr__).- Returns:
The resolved SampleType instance.
- set_writable_flag_recursively(
- writable: bool,
Sets the writable flag recursively.
- Parameters:
writable – Whether the schema is writable
- to_json() str¶
Writes the contents of this schema to a JSON string. Note that
Defaults values are omitted for brevity
Schemas might be recursive
- to_minimal_dict(
- include_all: bool,
Add a minimal representation of this object to a dictionary for subsequent serialization to JSON
- to_row(
- sample: Any,
- ctx: ExternalizationContext | None = None,
Convert sample form to row form.
A column either has a real
SampleTypethat owns the conversion, or no transform (identity). Composite schemas recurse into children only when the column-level transform is identity.External-storage leaves are routed through
externalize()whenctxis supplied and the value is sample-form (peraccepts()). The result is a URL string (typically absolute; the writer pipeline normalizes it to a table-relative string afterwards). Withoutctx, or for values already in row form, the leaf is passed through unchanged — this lets a batch freely mix live samples with pre-externalized URL strings.Inline transforms are called with
(sample)only. Whenctxis supplied the caller is operating through the pipeline, so row-form /Nonevalues are auto-detected viaaccepts()and passed through instead of handed to a transform that would otherwise crash.- Parameters:
sample – The sample data to convert.
ctx – Optional externalization context. Supplied by the write pipeline when externalization should happen inline; omitted by call sites that only want structural conversion.
- Returns:
The data in row form (or a URL string for externalized leaves when
ctxis supplied; the pipeline relativizes URL leaves afterwards).
- validate_row( ) list[ValidationError]¶
Validate row-form data against this schema.
Checks structural correctness (composite dict keys, dimension constraints) and leaf value type compatibility against
ScalarValuedescriptors.SampleTypeis not involved — this validates the storage representation only.- Parameters:
row – The data in row form to validate.
path – Dot-separated path prefix for error messages (used during recursion).
- Returns:
A list of validation errors (empty if valid).
- validate_sample(
- sample: Any,
Validate a sample (Python object) against this schema before conversion to row form.
Schemas with a real (non-identity) transform delegate to
resolved_sample_type.validate_sample()and do not recurse. Identity-typed composite schemas recurse into child schemas.- Parameters:
sample – The Python object in sample form to validate.
- Returns:
A list of validation errors (empty if valid).
- class Table(
- *,
- url: Url | None = None,
- created: str | None = None,
- description: str | None = None,
- row_cache_url: Url | None = None,
- row_cache_populated: bool | None = None,
- override_table_rows_schema: Any = None,
- init_parameters: Any = None,
- input_tables: list[Url] | None = None,
Bases:
tlc._core.addressable_object.AddressableObjectThe abstract base class for all Table types.
Warning
Do not instantiate this class directly. Use one of the
Table.from_*methods instead.A Table is an object with two specific responsibilities:
Creating table rows on demand (Either through the row-based access interface
table_rows, or through the sample-based access interface provided by__getitem__).Creating a schema which describes the type of produced rows (through the
rows_schemaproperty)
Both types of produced data are determined by immutable properties defined by each particular Table type.
ALTERNATIVE INTERFACE/CACHING:
A full representation of all table rows can - for performance reasons - also be retrieved through the
get_rows_as_binarymethod.This method will try to retrieve a cached version of the table rows if
row_cache_urlis non-empty ANDrow_cache_populatedisTrue
When this is the case, it is guaranteed that the
schemaproperty of the table is fully populated, including the nested ‘rows_schema’ property which defines the layout of all table rows.When this cached version is NOT defined, however, get_rows_as_binary() needs to iterate over all rows to produce the data.
If
row_cache_urlis non-empty, the produced binary data will be cached to the specified location. After successful caching, the updated Table object will be written to its backing URL exactly once, now with ‘row_cache_populated’ set to True and with the schema fully updated. Also, therow_countproperty is guaranteed to be correct at this time.Whether accessing data from a Table object later refers to this cached version (or produces the data itself) is implementation specific.
STATE MUTABILITY:
As described above, Tables are constrained in how they are allowed to change state:
The data production parameters (“recipe”) of a table are immutable
The persisted JSON representation of a Table (e.g. on disk) can take on three different states, and each state can be written only once:
Bare-bones recipe
Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = False)
Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = True)
- Parameters:
url – The URL of the table.
created – The creation time of the table.
description – The description of the table.
row_cache_url – The URL of the row cache.
row_cache_populated – Whether the row cache is populated.
override_table_rows_schema – Sparse schema merged onto the table’s computed row schema via
consider_override_from(). Unlike the factory-levelschemakwarg onTableWriterandTable.from_*(which requires declared columns to be complete), this path tolerates sparseness at both the column and within-column level — it is meant for touching up individual attributes on an existing row schema (display names, value maps, sample-type customizations) without restating structure. The override must remain structurally honest with the underlying data: anysize0/ composite-vs-scalar claim it makes is not allowed to contradict what the data actually contains.init_parameters – The initial parameters of the table.
input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage. This parameter serves as an explicit mechanism for tracking table relationships beyond the automatic lineage tracing typically managed by subclasses.
- add_column(
- column_name: str,
- values: list[object] | object,
- *,
- schema: Schema | None = None,
- url: Url | None = None,
Create a derived table with a column added.
This method creates and returns a new revision of the table with a new column added.
- Parameters:
column_name – The name of the column to add.
values – The values to add to the column. This can be a list of values, or a single value to be added to all rows.
schema – The schema of the column to add. If not provided, the schema will be inferred from the values.
url – The url to write the new table to. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the column added.
- add_value_map_item(
- value_path: str,
- internal_name: str,
- *,
- display_name: str = '',
- description: str = '',
- display_color: str = '',
- url: Url | str = '',
- value: float | int | None = None,
- edited_table_url: Url | str = '',
Add a value map item for a specified numeric value within the schema of the Table.
Adds a new value map item to the schema of the Table without overwriting existing items.
If the specified value or internal name already exists in the value map, this method will raise an error to prevent overwriting.
For more details on value maps, refer to the documentation for
Table.set_value_map.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
value – The numeric value to add the value map item to. If not provided, the value will be the next available value in the value map (starting from 0).
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map item added.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue, or if the value or internal name already exists in the value map.
- copy(
- *,
- table_name: str | None = None,
- dataset_name: str | None = None,
- project_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- if_exists: typing.Literal[raise,
- rename,
- overwrite] = 'raise',
- destination_url: tlc.Url | str | None = None,
Create a copy of this table.
The copy is performed to:
A URL derived from the given project_name, dataset_name, table_name, and root_url if given
destination_url, if given
A generated URL derived from the tables’s URL, if none of the above are given
- Parameters:
destination_url – The URL to copy the table to.
project_name – The name of the project to copy to.
dataset_name – The name of the dataset to copy to.
table_name – The name of the table to copy to.
root_url – The root URL to copy to.
if_exists – The behavior to use if the destination URL already exists.
- Returns:
The copied table.
- delete_column(
- column_name: str,
- *,
- table_name: str | None = None,
- table_url: Url | str = '',
- description: str | None = None,
Create a derived table with a column deleted.
This method creates and returns a new revision of the table with a column deleted.
- Parameters:
column_name – The name of the column to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url to write the new table to. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.
- Returns:
A new table with the column deleted.
- delete_columns(
- column_names: Sequence[str],
- *,
- table_name: str | None = None,
- table_url: Url | str = '',
- description: str | None = None,
Create a derived table with columns deleted.
This method creates and returns a new revision of the table with the specified columns deleted.
- Parameters:
column_names – The names of the columns to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.
- Returns:
A new table with the columns deleted.
- delete_row(
- index: int,
- *,
- table_name: str | None = None,
- table_url: Url | str = '',
- description: str | None = None,
Delete a row from a Table.
This method creates and returns a new revision of the table with the specified row deleted.
- Parameters:
index – The index of the row to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.
- Returns:
A new table with the row deleted.
- delete_rows(
- indices: Sequence[int],
- *,
- table_name: str | None = None,
- table_url: Url | str = '',
- description: str | None = None,
Delete rows from a Table.
This method creates and returns a new revision of the table with the specified rows deleted.
- Parameters:
indices – The indices of the rows to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.
- Returns:
A new table with the rows deleted.
- delete_value_map( ) Table¶
Delete a value map for a specified numeric value within the schema of the Table.
This method creates and returns a new revision of the Table with a deleted value map for a specific numeric value.
- Parameters:
value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map deleted.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue.
- delete_value_map_item(
- value_path: str,
- *,
- value: float | int | None = None,
- internal_name: str = '',
- edited_table_url: Url | str = '',
Delete a value map item for a specified numeric value within the schema of the Table.
Deletes a value map item from the schema of the Table, by numeric value or internal name.
For more details on value maps, refer to the documentation for
Table.set_value_map.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.
value – The numeric value of the value map item to delete. If not provided, the value map item will be deleted by internal name.
internal_name – The internal name of the value map item to delete. If not provided, the value map item will be deleted by numeric value.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map item deleted.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue, or if the value or internal name does not exist in the value map.
- ensure_data_production_is_ready() None¶
A method that ensures that the table is ready to produce data
This method is called before any access to the Table’s data is made. It is used to ensure that the Table has preformed any necessary data production steps. Normally Tables don’t produce data until it is requested, but this method can be called to force data production.
Note that subsequent applications of this method will not change the data, as a Table is immutable.
- ensure_dependent_properties() None¶
Ensure that the table set row_count as required to reach fully defined state.
- export(
- output_url: Url | str | Path,
- format: str | None = None,
- *,
- weight_threshold: float = 0.0,
- **kwargs: object,
Export this table to the given output URL.
Writes the table’s rows to
output_urlin the specified format. Several built-in formats ship with 3LC; additional formats can be added by installing plugin packages or registering a customExportersubclass. Run3lc exporters listor calllist_exporters()to see what is available at runtime.Format inference. If
formatis omitted, the format is chosen by callingcan_export()on every registered exporter and picking the one with the highestpriority. A table with bounding-box columns exported to a.jsonfile will therefore pickcocoover the genericjsonexporter. Ties at the highest priority raiseValueErrorand require an explicitformat.Weight filtering. If the table has a weights column (see
weights_column_name), rows with weight strictly less thanweight_thresholdare excluded. Tables without a weights column ignore this parameter.Format-specific arguments. Each exporter declares its own keyword arguments — e.g.
indentandimage_folderfor COCO, orsplitfor YOLO. Pass them as**kwargs. Unknown kwargs trigger a warning and are dropped. The full list for each exporter is available in the corresponding class docstring undertlc.export.exporters.- Parameters:
output_url – The output URL, path, or string. Directory outputs (e.g. YOLO) do not need an extension.
format – The export format (e.g.
"csv","coco"). IfNone, inferred from the table andoutput_url.weight_threshold – Minimum row weight to include (default
0.0). Ignored if the table has no weights column.**kwargs – Additional format-specific arguments. See the exporter’s class docstring for valid keys.
- Raises:
ValueError – If
formatis specified but no matching exporter is registered; if no format can be inferred; or if the table content is incompatible with the chosen format.
- static from_coco(
- annotations_file: str | pathlib.Path | tlc.Url,
- image_folder: str | pathlib.Path | tlc.Url | None = None,
- *,
- keep_crowd_annotations: bool = True,
- task: typing.Literal[detect,
- segment,
- pose] = 'detect',
- segmentation_format: typing.Literal[polygons,
- masks] | None = None,
- points: list[float] | None = None,
- point_attributes: str | Sequence[str] | Sequence[dict[str,
- str]] | Sequence[tlc.schemas.MapElement] | dict[float,
- str] | dict[int,
- str] | dict[float,
- tlc.schemas.MapElement] | dict[int,
- tlc.schemas.MapElement] | None = None,
- lines: list[int] | None = None,
- line_attributes: str | Sequence[str] | Sequence[dict[str,
- str]] | Sequence[tlc.schemas.MapElement] | dict[float,
- str] | dict[int,
- str] | dict[float,
- tlc.schemas.MapElement] | dict[int,
- tlc.schemas.MapElement] | None = None,
- triangles: list[int] | None = None,
- triangle_attributes: str | Sequence[str] | Sequence[dict[str,
- str]] | Sequence[tlc.schemas.MapElement] | dict[float,
- str] | dict[int,
- str] | dict[float,
- tlc.schemas.MapElement] | dict[int,
- tlc.schemas.MapElement] | None = None,
- flip_indices: list[int] | None = None,
- oks_sigmas: list[float] | None = None,
- per_instance_extras: collections.abc.Sequence[str] | collections.abc.Mapping[str,
- tlc.schemas._schema.Schema] | None = None,
- per_image_extras: collections.abc.Sequence[str] | collections.abc.Mapping[str,
- tlc.schemas._schema.Schema] | None = None,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a COCO annotations file.
.. note::
image_folderis kept positional alongsideannotations_fileso the commonTable.from_coco(file, folder)form remains ergonomic. All other parameters are keyword-only.- Parameters:
annotations_file – The url of the COCO annotations file.
image_folder – The url of the folder containing the images referenced in the COCO annotations file. If not provided, the image paths in the annotations file will be assumed to either be absolute OR relative to the annotations file.
keep_crowd_annotations – Whether to include annotations with
iscrowd=1in the Table.task – The task of the dataset. Can be either ‘detect’, ‘segment’, or ‘pose’.
segmentation_format – The format of the segmentation. Can be either ‘polygons’ or ‘masks’.
points – Default keypoint coordinates, used for drawing new instances in the Dashboard. Pose only.
point_attributes – Attributes for each keypoint (e.g. name or color). Pose only.
lines – Default skeleton topology for pose. Will override the skeleton provided in the annotations file. Pose only.
line_attributes – Attributes for each line (e.g. name or color). Pose only.
triangles – Triangles for pose.
triangle_attributes – Attributes for each triangle (e.g. name or color). Pose only.
flip_indices – Flip indices for pose.
oks_sigmas – OKS sigmas for pose.
per_instance_extras – Annotation-level extra fields to preserve as per-instance metadata. Pass a list of annotation key names to auto-infer schemas from the data, or a dict mapping key names to explicit Schema objects. Values must be present in every annotation.
per_image_extras – Image-level extra fields to preserve as top-level table columns. Pass a list of image key names to auto-infer schemas, or a dict mapping key names to explicit Schema objects. Values must be present in every image entry.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the provided COCO format dataset.
- static from_csv(
- csv_file: str | pathlib.Path | tlc.Url,
- *,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a .csv file.
- Parameters:
csv_file – The url of the .csv file.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the CSV file.
- static from_dict(data: collections.abc.Mapping[str, object], *, schema: Schema | Mapping[str, SchemaLike] | None = None, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | pathlib.Path | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None) Table¶
Create a Table from a dictionary.
- Parameters:
data – The dictionary to create the table from.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table. Column-level sparse (omit columns you don’t care about) is fine; declared columns must be complete — see
TableWriterfor the full factory contract.project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects. Extra columns are marked with
sample_type={"name": "hidden"}.input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the dictionary.
- static from_hugging_face_dataset(
- hf_dataset: datasets.Dataset,
- *,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from an in-memory Hugging Face
datasets.Dataset.This is useful when the dataset has been constructed programmatically, filtered, or loaded locally.
- Parameters:
hf_dataset – An in-memory
datasets.Datasetinstance.table_name – The name of the table. If not provided, derived from the dataset’s split or defaults to
"data".dataset_name – The name of the dataset. If not provided, derived from
hf_dataset.info.dataset_nameor defaults to"hf-dataset".project_name – The name of the project. If not provided, derived from
hf_dataset.info.dataset_nameor defaults to"hf-dataset".root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the in-memory Hugging Face dataset.
- static from_hugging_face_hub(
- path: str,
- name: str | None = None,
- split: str = 'train',
- *,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a Hugging Face Hub dataset, similar to the
datasets.load_datasetfunction... note::
path,nameandsplitare kept positional to mirror the well-knowndatasets.load_dataset(path, name, split)call. All other parameters are keyword-only.- Parameters:
path – Path or name of the dataset to load, same as in
datasets.load_dataset.name – Name of the dataset to load, same as in
datasets.load_dataset.split – The split to load, same as in
datasets.load_dataset.table_name – The name of the table. If not provided, the
table_nameis set tosplit.dataset_name – The name of the dataset. If not provided,
dataset_nameis set topathifnameis not provided, or to{path}-{name}ifnameis provided.project_name – The name of the project. If not provided,
project_nameis set tohf-{path}.root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the Hugging Face Hub dataset.
- static from_image_folder(
- root: str | pathlib.Path | tlc.Url,
- *,
- image_column_name: str = 'image',
- label_column_name: str = 'label',
- include_label_column: bool = True,
- extensions: str | collections.abc.Collection[str] | None = None,
- label_overrides: dict[str,
- tlc.schemas._schema.MapElement | str] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from an image folder.
This function can be used to load a folder containing subfolders where each subfolder represents a label, or to recursively load all matching images in a folder structure without labels. This provides similar functionality to torchvision’s ImageFolder dataset, but uses the 3LC URL system for file discovery.
When
include_label_columnis True, the dataset elements are returned as tuples of aPIL.Imageand the integer class label. Wheninclude_label_columnis False,PIL.Images are returned without labels. In this case,rootwill be recursively scanned.- Parameters:
root – The root directory of the image folder.
image_column_name – The name of the column containing the images.
label_column_name – The name of the column containing the class labels.
include_label_column – Whether to include a column of class labels in the table.
extensions – A list of allowed image extensions. If not provided, a default list of image extensions is used.
label_overrides – A sparse mapping of class names (the directory names) to new class names. A new class name can be a string with the new class name or a
MapElement.project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- static from_names(
- *,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: Url | str | None = None,
Create a table from the names specifying its url.
- Parameters:
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url.
- Returns:
The table at the resulting url.
- static from_ndjson(
- ndjson_file: str | pathlib.Path | tlc.Url,
- *,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a NDJSON file.
- Parameters:
ndjson_file – The url of the NDJSON file.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the NDJSON file.
- static from_pandas(
- df: pandas.DataFrame,
- *,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a Pandas DataFrame.
- Parameters:
df – The Pandas DataFrame to create the table from.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the pandas DataFrame.
- static from_parquet(
- parquet_file: str | pathlib.Path | tlc.Url,
- *,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a Parquet file.
- Parameters:
parquet_file – The url of the Parquet file.
schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the Parquet file.
- static from_torch_dataset(
- dataset: torch.utils.data.Dataset,
- *,
- all_arrays_are_fixed_size: bool = False,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a torch
Dataset.This constructor is designed primarily as a bridge for torchvision
DatasetFolderandVisionDatasetinstances. For those, 3LC preserves the source: images are referenced by their on-disk paths (no copies). Anytransform/target_transform/transformsattached to the source dataset is stripped before serialization and is not applied when reading from the Table; reattach it explicitly viaTable.with_transform().For arbitrary
torch.utils.data.Datasetsubclasses, this method falls back to materializing every sample by callingdataset[i]and serializing the result inline. This works, but it is rarely what you want for a 3LC Table:Images returned as PIL/tensors are stored as bulk data inside the Table, duplicating bytes that already exist on disk or in cloud storage.
Tensors and arrays are similarly serialized inline, losing any link to the source they were derived from (a file, a URL, a parquet column).
Dataset transforms – especially augmentations – break the assumption that a Table row holds source-shaped data. Tables should hold the closest-to-source representation; transforms belong in the data loading pipeline, applied per-epoch during training.
Prefer one of the following when they fit your data:
Table.from_image_folderfor class-folder image layouts.Table.from_coco,Table.from_yolo_url,Table.from_yolo_ndjsonfor detection/segmentation annotations.Table.from_hugging_face_hub/Table.from_hugging_face_datasetfor HF datasets.Table.from_dict,Table.from_pandas,Table.from_csv,Table.from_parquetwhen you can supply URL or path columns directly – these keep the Table a thin reference over the source.TableWriterfor custom ingestion where you need explicit control over the schema and rows.
- Parameters:
dataset – The torch
Datasetto ingest. Best results when this is aDatasetFolder/VisionDatasetwhose samples come from files; see the warnings above for the general case.all_arrays_are_fixed_size – Whether all arrays (tuples, lists, etc.) in the dataset are fixed size. This parameter is only used when inferring a schema from a single sample in the dataset when no
schemais provided.schema – The schema of the table. Can be a Schema object, a dict mapping column names to schemas, or a tuple of schemas for positional columns. If not provided, the schema will be inferred from the first sample in the table.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the torch dataset.
.. note::
Transforms attached to a
VisionDatasetare not persisted in the Table’s JSON, and are not reapplied when reading samples from the Table. Reattach them withTable.with_transformto obtain aTableViewthat applies the transforms on read.
- static from_url( ) Table¶
Create a table from a url.
- Parameters:
url – The url to create the table from
- Returns:
A concrete Table subclass
- Raises:
ValueError – If the url does not point to a table.
FileNotFoundError – If the url cannot be found.
- static from_yolo_ndjson(
- ndjson_file: str | pathlib.Path | tlc.Url,
- image_folder: str | pathlib.Path | tlc.Url | None = None,
- *,
- split: str = 'train',
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | pathlib.Path | str | None = None,
- if_exists: typing.Literal[raise,
- reuse,
- rename,
- overwrite] = 'reuse',
- add_weight_column: bool = True,
- weight_column_value: float = 1.0,
- description: str | None = None,
- extra_columns: dict[str,
- Schema | Mapping[str,
- SchemaLike]] | None = None,
- input_tables: list[tlc.Url | str | pathlib.Path] | None = None,
Create a Table from a YOLO NDJSON file.
The first line is required to contain the ‘class_names’ and ‘task’ keys, and the rest of the lines are required to contain the ‘file’, ‘width’, ‘height’, ‘split’ and ‘annotations’ keys.
.. note::
image_folderis kept positional alongsidendjson_fileso the commonTable.from_yolo_ndjson(file, folder)form remains ergonomic. All other parameters are keyword-only.- Parameters:
ndjson_file – The url of the NDJSON file.
image_folder – The folder containing the images, used to handle relative paths. If not provided, relative image paths are made absolute with respect to the NDJSON file directory.
split – The split to load from the dataset. Rows with ‘split’ equal to this value will be loaded.
project_name – The name of the project.
dataset_name – The name of the dataset. Falls back to
splitif not provided.table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table. If not provided, the description is set to the one in the first line of the NDJSON file, or an empty string.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- Returns:
A Table populated from the YOLO NDJSON file.
- static from_yolo_url(images_url: str | pathlib.Path | tlc.Url | collections.abc.Iterable[str | pathlib.Path | tlc.Url], *, categories: str | Sequence[str] | Sequence[dict[str, str]] | Sequence[tlc.schemas.MapElement] | dict[float, str] | dict[int, str] | dict[float, tlc.schemas.MapElement] | dict[int, tlc.schemas.MapElement] | None = None, task: typing.Literal[detect, segment, obb, pose] = 'detect', max_depth: int | None = None, allow_fetch_remote_data: bool = False, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | pathlib.Path | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None, **kwargs: typing.Any) Table¶
Create a Table from a YOLO dataset folder or file of images.
When
images_urlis a folder, label files are resolved from image paths by replacing the lastimagesdirectory segment withlabelsand changing the extension to.txt. If the image path contains noimagesdirectory, the label file is expected next to the image with the same name and a.txtextension. If an image has no corresponding label file, or the label file is empty, no labels are added for that image.With the following layout, a folder
images_urlwould beimages_url="/root/images":root/ images.txt images/ image1.jpg image2.jpg subfolder/ image3.jpg image4.jpg labels/ image1.txt image2.txt subfolder/ image3.txt
In the layout above,
image4.jpghas no correspondinglabels/subfolder/image4.txtand is included as an unlabeled image.When
images_urlis a file (images_url="/root/images.txt"in the above example), the same layout is expected, but the image URLs are listed in the text file. Relative URLs are made absolute with respect to the directory containing the text file (i.e. the parent of the text file).The following text file would be valid:
./images/image1.jpg # Relative -> root/images/image1.jpg images/image2.jpg # Relative -> root/images/image2.jpg /root/images/subfolder/image3.jpg # Absolute -> root/images/subfolder/image3.jpg
This method can also be used to create a Table with a label column but no labeled instances, by providing images with no corresponding label files.
- Parameters:
images_url – The location(s) of the folder(s) containing, or file(s) referencing, the images. Can be a single URL or a list of URLs.
categories – The categories of the table.
task – The task of the dataset. Can be either ‘detect’, ‘segment’, ‘pose’, or ‘obb’.
max_depth – The maximum depth to search for images. If None (default), the limit is set to 1 (i.e. only immediate children) for remote input URLs and unlimited for local files.
allow_fetch_remote_data – Whether to allow fetching remote images and label files if on remote storage. Defaults to False, meaning no remote data can be fetched, and an error is raised if required to. If True, the remote data is fetched as part of the table creating process. For large datasets, this will lead to two requests for each image, one for the full image and one for the corresponding label file. In such cases it is recommended to download a local copy and create the table from that.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage.
**kwargs – Additional task-specific keyword arguments. Only applies to the task “pose”.
- Returns:
A Table populated from the YOLO dataset folder or file of images.
- get_column_as_pyarrow_array( ) Array | ChunkedArray¶
Return a the specified column of the table as a pyarrow table.
To get nested sub-columns, use dot notation. E.g. ‘column.sub_column’. The values in the column will be the row-view of the table. A column which is a PIL image in its sample-view, for instance, will be returned as a column of strings.
- Parameters:
name – The name of the column to get.
combine_chunks – Whether to combine the chunks of the returned column in the case that it is a ChunkedArray. Defaults to True.
- Returns:
A pyarrow table containing the specified column.
- Raises:
KeyError – If the column does not exist in the table.
- get_foreign_table_url(
- column: str = FOREIGN_TABLE_ID,
Return the input table URL referenced by this table.
This method is intended for tables that reference a single input table. Typically, this would be a metrics table of per-example metrics collected using another table.
If the table contains a column named ‘input_table_id’ with value map indicating it references a input table by Url, this method returns the Url of that input table.
- Parameters:
column – The name of the column to check for a foreign key.
- Returns:
The URL of the foreign table, or None if no input table is found.
- get_rows_as_binary(
- *,
- exclude_bulk_data: bool = False,
Return all rows of the table as a binary Parquet buffer, with optional exclusion of bulk data columns.
This method will return the ‘Table-representation’ of the table, which is the most efficient representation, since only references to the input data are stored.
- Parameters:
exclude_bulk_data – Whether to exclude bulk data columns from the serialized rows.
- Returns:
The rows of the table as a binary Parquet buffer.
- get_simple_value_map(
- value_path: str,
Get the simple value map for a value path, mapping class indices to class names.
- Parameters:
value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
- Returns:
A simple value map for the value, or None if the value does not exist or does not have a value map.
- get_value_map(
- value_path: str,
Get the value map for a value path.
- Parameters:
value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
- Returns:
A value map for the value, or None if the value does not exist or does not have a value map.
- is_descendant_of(
- other: Table,
Return True if this table is a descendent of the provided table.
- Parameters:
other – The table to check if this table is a descendant of.
- Returns:
True if this table is a descendant of the provided table, False otherwise.
- static join_tables(tables: collections.abc.Sequence[tlc._core.objects.table.Table] | collections.abc.Sequence[tlc.Url | str | pathlib.Path], *, project_name: str | None = None, dataset_name: str | None = None, table_name: str | None = None, root_url: tlc.Url | str | None = None, table_url: tlc.Url | str | pathlib.Path | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, weight_column_value: float = 1.0, description: str | None = None, extra_columns: dict[str, Schema | Mapping[str, SchemaLike]] | None = None, input_tables: list[tlc.Url | str | pathlib.Path] | None = None) Table¶
Join multiple tables into a single table.
The tables will be joined vertically, meaning that the rows of the resulting table will be the concatenation of the rows of the input tables, in the order they are provided.
The schemas of the tables must be compatible for joining. If the tables have different schemas, the schemas will be attempted merged, and an error will be raised if this is not possible.
- Parameters:
tables – A list of Table instances to join.
project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table.
root_url – The root url of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
weight_column_value – The value to initialize the weight column with if
add_weight_columnis True.description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are Schema objects.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- latest(
- timeout: float = 30.0,
Return the most recent version of the table.
Uses the lineage index to walk descendants of this table’s URL and returns the newest one. Tables created in this process appear in the index via a fast-path; for tables that may have appeared in external scan sources,
latest()waits up totimeoutseconds for the next scheduler cycle.Example:
table_instance = Table() ... # working latest_table = table_instance.latest()
- Parameters:
timeout – Seconds to wait for the next indexing cycle when no descendant is yet visible in this process.
0returns the in-process fast-path result immediately. Defaults to 30.- Returns:
The latest version of the table.
- Raises:
ValueError – If the latest version of the table cannot be found in the dataset or if an error occurs when attempting to create an object from the latest Url.
- revision( ) Table¶
Return a specific revision of the table.
This function retrieves a specific revision of this table. The revision can be specified by tag, table_url, or table_name. If no arguments are provided, the current table is returned.
- Parameters:
tag – The tag of the revision to return. Currently only ‘latest’ is supported.
table_url – The URL of the revision to return.
table_name – The name of the revision to return.
- set_row_cache_url( ) bool¶
Assign a new row_cache_url value.
Will set row_cache_populated to False if the cache file has changed.
- Parameters:
row_cache_url – The new row_cache_url value.
- Returns:
True if the row_cache_url value was changed, False otherwise.
- set_value_map( ) Table¶
Set a value map for a specified numeric value within the schema of the Table.
Sets a value map for a value within the schema of the Table, returning a new table revision with the applied value map.
This method creates and returns a new revision of the table with a overridden value map for a specific numeric value.
Any item in a
Schemaof typeNumericValuecan have a value map. A value map is a mapping from a numeric value to aMapElement, where aMapElementcontains metadata about a categorical value such as category names and IDs.Partial Value Maps
Value maps may be partial, i.e. they may only contain a mapping for a subset of the possible numeric values. Indeed they can be floating point values, which can be useful for annotating continuous variables with categorical metadata, such as color or label.
For more fine-grained control over value map editing, see
Table.set_value_map_itemandTable.add_value_map_item, andTable.delete_value_map_item.- Parameters:
value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value_map – The value map to set on the value. The value will be converted to a a dictionary mapping from floating point values to
MapElementif it is not already.edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map set.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue.
- set_value_map_item(
- value_path: str,
- value: float | int,
- internal_name: str,
- *,
- display_name: str = '',
- description: str = '',
- display_color: str = '',
- url: Url | str = '',
- edited_table_url: Url | str = '',
Update an existing value map item for a specified numeric value within the schema of the Table.
This method creates and returns a new revision of the table with a value map item added to a value in a column.
Example:
table = Table.from_url("cats-and-dogs") new_table = table.set_value_map_item("label", 0, "cat") # new_table is now a new revision of the table with a updated value map item added to the value 0 in the column assert table.latest() == new_table, "The new table is the latest revision of the table."
To add a new value map item at the next available value in the value map, see
Table.add_value_map_item.To delete a value map item, see
Table.delete_value_map_item.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot- separated path to a sub-value in a composite column.
value – The numeric value to add the value map item to. If the value already exists, the value map item will be updated.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue.
- should_include_schema_in_json(
- schema: Schema,
Only include the schema in the JSON representation if it is not empty.
- squash(
- *,
- output_url: Url | str | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: Url | str | None = None,
- input_tables: list[Table | Url | str] | None = None,
Create a copy of this table where all lineage is squashed.
A squashed table is a table where all lineage is merged. This is useful for creating a table that is independent of its parent tables. This function creates a new table with the same rows as the original table, but with no lineage. The new table is written to the
output_url, or placed in the same project and dataset as this table if no output URL is provided.- Parameters:
output_url – The output url for the squashed table. Mutually exclusive with project_name, dataset_name, table_name, and root_url.
project_name – The project name to use for the squashed table. If not provided, the project_name of the original table is used.
dataset_name – The dataset name to use for the squashed table. If not provided, the dataset_name of the original table is used.
table_name – The name of the squashed table. If not provided, a uniquified variant of ‘squashed’ is used.
root_url – The root URL to use for the squashed table. If not provided, the root URL of the original table is used.
input_tables – Optional list of Tables or URLs to Tables to refer to as the input tables for the squashed table. By default, no tables are referred to as inputs.
- Returns:
The squashed table.
- to_pandas() DataFrame¶
Return a pandas DataFrame for this table.
- Returns:
A pandas DataFrame populated from the rows of this table.
- Raises:
ImportError – If pandas is not installed. Install it with
pip install 3lc[pandas],pip install pandasor similar.
- static transform_value( ) object¶
Transform a single table value according to the schema.
3LC currently only uses pure string representations of datetime values. This helper function is used to convert any timestamps to strings.
- Parameters:
schema – The schema corresponding to the column of the value.
item – The value to transform.
- property weights_column_name: str | None¶
Return the name of the column containing the weights for this table, or None if no such column exists.
- with_transform( ) TableView¶
Return a map-style view that applies
transformto each sample on read.The returned view is not a
Table. It implements theMapDatasetprotocol (__len__,__getitem__) and exposesurl, which forwards to thisTable. Pass the view directly totlc.collect_metrics(), or to anytorch.utils.data.DataLoader.Each call returns a fresh
TableViewinstance; two calls with the sametransformare not the same Python object. Hoist the view (view = table.with_transform(fn)) when you need a stable reference across calls.- Parameters:
transform – A callable applied to each sample before it is returned. Receives the raw sample produced by this
Tableand returns the transformed sample. Must be picklable (top-level function or importable callable, not alambdaor local closure) when the view is consumed by atorch.utils.data.DataLoaderwithnum_workers > 0.- Returns:
A
TableViewover thisTable.
- write_to_row_cache( ) None¶
Cache the table rows to the row cache Url.
If the table is already cached, or the Url of the Table is an API-Url, this method does nothing.
In the case where self.row_cache_url is empty, a new Url will be created and assigned to self.row_cache_url if create_url_if_empty is True, otherwise a ValueError will be raised.
- Parameters:
create_url_if_empty – Whether to create a new row cache Url if self.row_cache_url is empty.
overwrite_if_exists – Whether to overwrite the row cache file if it already exists.
- class TableView( )¶
A map-style view over a
tlc.Tablethat applies a sample-level transform on every read.Implements the
MapDatasetprotocol used bytlc.collect_metrics()and anytorch.utils.data.DataLoader. Not itself aTable: it has no schema, no persistence, and no object-registry identity. Itsurlforwards to the underlyingTableso metrics collected through it can be linked back to the source.Views compose: wrapping a
TableViewin anotherTableViewchains the transforms.sourceandurlalways resolve to the rootTable, regardless of chain depth.Construct via
tlc.Table.with_transform()or by chainingtlc.TableView.with_transform().- property source: Table¶
The root
Tableunderlying this view (walking through any chainedTableViewwrappers).Useful for sampler construction (e.g.
create_sampler(view.source, ...)).
- with_transform( ) TableView¶
Return a new
TableViewthat appliestransformon top of this view’s transform.- Parameters:
transform – A callable applied to each sample after this view’s transform has run. Must be picklable (top-level function or importable callable, not a
lambdaor local closure) when the view is consumed by atorch.utils.data.DataLoaderwithnum_workers > 0.- Returns:
A
TableViewchainingtransformon top of this view.
- class TableWriter(
- *,
- bulk_data_chunk_size_mb: float = DEFAULT_BULK_DATA_CHUNK_SIZE_MB,
- bulk_data_context_key: str = DEFAULT_BULK_DATA_SEQUENCE_ID_COLUMN_NAME,
- bulk_data_url: tlc.Url | str | None = None,
- schema: Schema | Mapping[str,
- SchemaLike] | None = None,
- project_name: str | None = None,
- dataset_name: str | None = None,
- table_name: str | None = None,
- root_url: tlc.Url | str | None = None,
- table_url: tlc.Url | str | None = None,
- if_exists: typing.Literal[overwrite,
- rename,
- raise] = 'rename',
- description: str = '',
- input_tables: list[tlc.Url] | None = None,
A class for writing batches of rows to persistent storage.
Rows are transformed through the writer pipeline (schema resolution, per-leaf
to_row()with externalization context, chunk-pattern packing, URL relativization) and accumulated as PyArrow record batches untilfinalize()writes them out as a parquet-backedTable.Example:
table_writer = TableWriter( project_name="My Project", dataset_name="My Dataset", table_name="My Table" ) table_writer.add_batch({"column1": [1, 2, 3], "column2": ["a", "b", "c"]}) table_writer.add_row({"column1": 4, "column2": "d"}) table = table_writer.finalize()
Initialize a TableWriter.
- Parameters:
bulk_data_chunk_size_mb – The size of the chunk in MB for chunk-pattern bulk data (default: 50.0 MB).
bulk_data_context_key – The column name to use as the context key for chunk-pattern bulk data (default: “sequence_id”).
bulk_data_url – Optional base URL for bulk data storage. Both chunk-pattern and file-pattern data are stored under this location. Chunk-pattern data goes to
<bulk_data_url>/chunks/<table_stem>/, file-pattern data goes to<bulk_data_url>/samples/<table_stem>/. Defaults to<table_parent>/bulk_data/(or<table_parent_parent>/bulk_data/for non-metrics tables). Per-column overrides viaSchema.bulk_data_locationtake precedence for file-pattern columns.schema – Optional schema for the table. Can be a
Schemaobject withvaluesfor the columns, or a dict mapping column names to Schema objects. Columns you don’t declare are inferred from the first batch.project_name – The name of the project.
dataset_name – The name of the dataset.
table_name – The name of the table, defaults to “initial”.
root_url – The root URL to write the table to. If not provided, the default root URL is used.
table_url – An optional url to manually specify the Url of the written table. Mutually exclusive with project_name, dataset_name, table_name, and root_url.
if_exists – The option to use when the table already exists.
description – An optional description of the table.
input_tables – Optional list of Tables or URLs to record as input tables for lineage tracking.
- add_batch(
- table_batch: MutableMapping[str, Any],
Add a batch of rows to the buffer for writing.
This method validates the consistency of the batch and appends it to the buffer. When the buffer reaches its maximum size, it is automatically flushed to disk.
- Parameters:
table_batch – A dictionary mapping column names to lists of values.
- Raises:
ValueError – If the columns in the batch have unequal lengths or mismatch with existing columns.
- add_row(
- table_row: MutableMapping[str, Any],
Add a single row to the table being written.
- Parameters:
table_row – A dictionary mapping column names to values.
- class Url(
- value: str | Path | Url | None = None,
- scheme: str | None = None,
- normalized_path: str | None = None,
- query: str | None = None,
Bases:
abc.ABCA class which represents a URL.
A URL in 3LC is a combination of a scheme and a path. Many methods in 3LC accept URLs as arguments and/or return URLs. They are also used to refer to
tlc.Tables and to cross reference between them. A file URL in 3LC will behave identically on both Posix and Windows systems.Since a URL in 3LC might contain aliases, and even the scheme might not be determined until aliases are expanded, it is important to note which methods and properties will expand.
The
pathandschemeproperties of the URL will expand aliasesExample: Scheme is determined from the input string
file_url = Url("/path/to/file") # Or Url("file:///path/to/file") file_url.scheme == Scheme.FILE file_url.path == "/path/to/file" str(file_url) == "/path/to/file" # omit file:// scheme s3_url = Url("s3://bucket/path/to/object") s3_url.scheme == Scheme.S3 s3_url.path == "bucket/path/to/object" str(s3_url) == "s3://bucket/path/to/object" # include s3:// scheme gcs_url = Url("gs://bucket/path/to/object") gcs_url.scheme == Scheme.GS gcs_url.path == "bucket/path/to/object" str(gcs_url) == "gs://bucket/path/to/object" # include gs:// scheme relative_url = Url("path/to/file") relative_url.scheme == Scheme.RELATIVE relative_url.path == "path/to/file" str(relative_url) == "path/to/file" # omit relative:// scheme # *Aliases are expanded when the URL is used* # Assume <SAMPLE_DATA> is **not** registered alias_url = Url("<SAMPLE_DATA>/data.csv") alias_url.scheme == Scheme.ALIAS alias_url.path == "<SAMPLE_DATA>/data.csv" str(alias_url) == "<SAMPLE_DATA>/data.csv" # The registry is read-only over a pluggable provider. In standalone tlcurl # use, install a provider that wraps a dict you own; in tlc use, call # tlc.url.register_url_alias / tlc.url.unregister_url_alias to mutate the # configuration store the provider reads from. aliases: dict[str, str] = {} class _InMemoryProvider: def get_aliases(self) -> dict[str, str]: return dict(aliases) UrlAliasRegistry.set_provider(_InMemoryProvider()) # Register the alias by mutating the dict the provider reads. aliases["<SAMPLE_DATA>"] = "/path/to/data" # It will now be expanded when using path and scheme properties alias_url.scheme == Scheme.FILE alias_url.path == "/path/to/data/data.csv" str(alias_url) == "<SAMPLE_DATA>/data.csv" # Swap to an alternative alias. aliases["<SAMPLE_DATA>"] = "/alternate/path/to/data" alias_url.scheme == Scheme.FILE alias_url.path == "/alternate/path/to/data/data.csv" del aliases["<SAMPLE_DATA>"]
- Terminology:
A normalized URL has a scheme, uses single-forward slashes as path separator, and does not end-with a slash.
An expanded URL has aliases expanded, and is normalized.
An absolute URL is a expanded which means that it can be used as a stable persisted reference.
Relative URLs are converted to absolute URLs based on an “owner” URL, or, if applicable, the current working directory of the process
Relative and Api URLs will have “relative://” or “api://” as their scheme but these schemes will be omitted from the stringified representation.
- Caveats:
The URL does not make any network calls or access to the file system. It therefore cannot resolve symlinks, and use of these is discouraged in combination with 3LC.
There are a few exotic Windows paths that are not supported:
The use of a Windows-drive letter without a slash, e.g.
C:foo/bar, is not supported. UseC:/foo/barinstead.
- Parameters:
value – The URL as a string, Path, or Url object. When this argument is passed as a string, it will be normalized and the scheme is deduced from the string contents.
scheme – The scheme of the URL, if known.
normalized_path – The normalized path of the URL, if known. If both scheme and normalized_path are passed, they will be used directly without any normalization or parsing. It is the responsibility of the caller to ensure that the scheme and normalized_path are valid.
query – The query component of the URL (the part after
?), if known. Only meaningful together withschemeandnormalized_path; whenvalueis parsed, any query string is split out automatically.
- Raises:
ValueError – If the URL is specified with both value and scheme/path.
- static absolute_from_relative( ) Url¶
Convert a relative URL to an absolute URL, given an owner URL.
- Parameters:
url – The relative URL to convert.
owner – The owner URL, if necessary for conversion.
- static api_url_for_object(
- obj: object,
Get the API URL for an object.
This is the default URL for an object when a persistent URL is not specified. API URLs allow objects to be addressable as long as they are in memory.
- Parameters:
obj – The object to get the API URL for.
- apply_aliases() Url¶
Apply all registered aliases to this URL.
- Returns:
The URL with aliases applied.
- create_sibling(
- name: str,
Create a new Url next to the current Url.
Example:
Url("C:/path/to/file.json").create_sibling("umap.json") == Url("C:/path/to/umap.json") Url("C:/path/to/dir").create_sibling("other") == Url("C:/path/to/other")
- Parameters:
name – The name of the new Url.
- Returns:
A new Url next to the current Url.
- create_unique() Url¶
Create a unique version of the Url.
This method will create a unique URL by appending a unique identifier to the URL, if necessary.
- Returns:
A unique Url.
- classmethod cwd() Url¶
Get the current working directory as a URL.
- Returns:
The current working directory as a URL.
- escape() str¶
Double-escape the URL string to handle paths in service endpoints.
Some services require double-escaping to process URLs correctly due to internal un-escaping passes.
- Returns:
A double-escaped URL string.
- exists() bool¶
Check if the URL exists.
- Returns:
True if the URL exists, False otherwise.
- Raises:
Exception – If the URL cannot be accessed.
- expand_aliases(
- *,
- allow_unexpanded: bool = True,
Expand aliases in the URL.
- Parameters:
allow_unexpanded – If
True, aliases that cannot be expanded will be left in the URL. IfFalse, an exception will be raised if an alias cannot be expanded.- Returns:
The scheme and path of the URL with aliases expanded.
- property extension: str¶
Get the extension of the URL.
Example:
Url("example.json").extension == ".json"
- Returns:
The extension of the URL.
- flush() None¶
Raise an error to prevent Url being used in place of str, pathlib.Path or file object.
Implemented to ensure that a Url is not used in the place of a str, pathlib.Path or file object in cases where the silent failure would be confusing. Raises a more helpful error message.
- static get_normalized(
- value: str,
Get the normalized value of the string representation of a URL.
- Parameters:
value – The URL to normalize.
- Returns:
A tuple of (scheme, normalized_path).
- static get_path_type(
- path: str,
Determine if a path, without scheme, is a Windows or Posix path.
- static get_scheme(
- value: str,
Get the scheme of the string representation of a URL.
- Parameters:
value – The URL as a string.
- Raises:
ValueError – If the URL scheme is not supported.
- Returns:
The scheme of the URL.
- is_absolute() bool¶
Check if the normalized, unexpanded URL is absolute.
Notice that this method does not expand aliases.
- Returns:
True if the URL is absolute, False otherwise.
- is_descendant_of(
- other: Url,
Check if the URL is a descendant of another URL.
- Parameters:
other – The URL to check if the current URL is a descendant of.
- Returns:
True if the URL is a descendant of the other URL, False otherwise.
- join(
- other: Url,
Join two URLs.
The other URL needs to be a relative URL
- Parameters:
other – The URL to join with the current URL. Required to be relative.
- Returns:
A new URL, which is the result of joining the current and other URLs.
- Raises:
ValueError – If the other URL is not relative.
- static join_url( ) str¶
Join a scheme and a path into a URL.
- Parameters:
scheme – The scheme.
path – The path.
- Returns:
The URL with scheme applied
- make_parents(
- *,
- exist_ok: bool = False,
Make all parent directories of the URL.
- Parameters:
exist_ok – If True, do not raise an exception if the directory already exists.
- Raises:
Exception – If the URL cannot be accessed.
- property name: str¶
Get the name of the URL.
Example:
Url("C:/folder/file.txt").name == "file.txt" Url("C:/folder").name == "folder"
- Returns:
The name of the URL.
- static normalize_chars(
- url: str,
Normalize characters in a URL.
- Parameters:
url – The URL to normalize.
- Returns:
The normalized URL.
- open(
- mode: str,
Open the URL as a file.
- Parameters:
mode – The file mode to use when opening the URL.
- Returns:
A file-like object.
- Raises:
TypeError – If the URL cannot be opened as a file.
- property parts: list[str]¶
Get the parts of the URL (path segments).
Example:
Url("C:/folder/file.txt").parts == ["C:", "folder", "file.txt"] Url("pxt://db/dir/table?query=1").parts == ["db", "dir", "table"]
- Returns:
The parts of the URL.
- property path: str¶
Return the path of the expanded URL.
Calling this method will expand aliases in the URL.
This will return the path without a scheme, so e.g. an S3 URL will return the path without the protocol.
Url("s3://bucket/table.json").path == "/bucket/table.json" Url("relative://foo/bar").path == "foo/bar" Url("http://example.com/path?query=1").path == "example.com/path"
- property query: str¶
Get the query string from the URL (everything after ‘?’).
This is useful for URL schemes that use query parameters (e.g., pxt://). For file:// URLs this typically returns an empty string.
Example:
Url("pxt://db/table?pgdata=/path").query == "pgdata=/path" Url("file:///path/to/file.txt").query == ""
- Returns:
The query string without the leading ‘?’, or empty string if none.
- read_text(
- *,
- encoding: str = 'utf-8',
Read the contents of the URL as text.
- Parameters:
encoding – The encoding to use when reading the content, defaults to “utf-8”.
- Returns:
The content of the referenced file as text.
- static relative_from( ) Url¶
Transform a URL into relative form taking a given owner URL into account.
Create an URL relative to the given owner URL that is equivalent to the absolute URL. The owner URL can be a parent directory of the absolute URL, but it may also be a directory or file that shares part of the absolute URL’s path. If the absolute URL and owner URL are not compatible, the function will raise a ValueError
If the transformation is not possible, for example if the URL and the owner have different schemes, the function will return the original URL.
Example:
# Owner URL is a directory absolute_url = "s3://bucket/path/to/file.ext" owner_url = "s3://bucket/path" relative_url = Url.relative_from_absolute(absolute_url, owner_url) str(relative_url) == "to/file.ext" # Owner URL is a file absolute_url = "s3://bucket/path/to/file2.ext" owner_url = "s3://bucket/path/to/file1.ext" relative_url = Url.relative_from_absolute(absolute_url, owner_url) assert str(relative_url) == "../file2.ext"
- Raises:
ValueError – If the absolute URL and owner URL are not compatible
- replace( ) Url¶
Replace occurrences of a substring in the URL with a new substring.
The intended use case for this method is to e.g., replace a file extension in a URL.
This methods textually replaces occurrences of the old substring with the new substring in the path of the URL. Notice that the replacement will happen on the normalized path, which is not necessarily identical to the path passed to the Url constructor when it was first created.
Changing the scheme of the URL is not supported, however it is possible to replace an alias. If the alias contains the scheme (e.g. url.scheme == ALIAS) the scheme can be changed.
Notice that this method does not expand aliases.
- Parameters:
old – The substring to be replaced.
new – The new substring to replace the old substring.
- Returns:
A new URL with the specified substring replaced.
- property scheme: str¶
Return the scheme of the expanded URL.
Calling this method will expand aliases in the URL. If the alias cannot be expanded, it will return
Scheme.ALIAS.To access the scheme of the URL without expanding aliases, use the
_schememember variable.- Returns:
The scheme of the URL.
- Raises:
ValueError – If the url scheme cannot be determined.
- static split_url(
- value: str,
Split a URL into a scheme and a path.
Unlike urlparse, this function does not require a scheme to be present in the URL. It will also not parse the drive letter (e.g. C:/) in a Windows URL as part of the URL.
- property stem: str¶
Get the stem of the URL.
Example:
Url("example.json").stem == "example"
- Returns:
The stem of the URL.
- to_absolute( ) Url¶
Convert a relative URL to an absolute URL.
- Parameters:
owner – The owner URL, if necessary for conversion.
- Returns:
An absolute URL.
- Raises:
NotImplementedError – If the conversion is not supported.
- to_minimal_dict(
- _: bool = False,
Convert the URL to a minimal, serializable representation.
- Returns:
The URL as a str.
- to_relative( ) Url¶
Relativize a URL, including applying aliases.
- Parameters:
owner – The owner URL, if necessary for conversion.
- Returns:
A relative URL if possible, otherwise the original URL.
- Raises:
NotImplementedError – If the conversion is not supported.
- to_relative_with_max_depth( ) Url¶
Relativize the given URL with respect to the given owner URL, up to a maximum depth.
If
urldoes not have a common prefix withownerup tomax_depth,urlis returned with only aliases.- Parameters:
url – The URL to relativize.
owner – The URL to relativize with respect to.
max_depth – The maximum depth to relativize up to.
- Returns:
The relativized URL.
- to_str() str¶
Convert the URL to a normalized string.
This returns the normalized, un-expanded URL as a string.
- Returns:
The URL as a string.
- write_bytes(
- content: bytes | str,
- *,
- encoding: str = 'utf-8',
- if_exists: typing.Literal[overwrite,
- rename,
- raise] = 'overwrite',
Write bytes content to a URL.
- Parameters:
content – The content to write. If a string is provided, it will be encoded using the specified encoding.
encoding – The encoding to use when encoding string content to bytes, defaults to “utf-8”.
if_exists – The write options to use when writing, can be “overwrite”, “rename”, or “raise”.
- write_text(
- content: str | bytes | typing.Any,
- *,
- encoding: str = 'utf-8',
- if_exists: typing.Literal[overwrite,
- rename,
- raise] = 'overwrite',
Write text content to a URL.
- Parameters:
content – The content to write. If bytes are provided, they will be decoded using the specified encoding. If a non-string type is provided, it will be converted to a string using str().
encoding – The encoding to use when decoding bytes content to text, defaults to “utf-8”.
if_exists – The write options to use when writing, can be “overwrite”, “rename”, or “raise”.
- close() None¶
Close a run session
Recommended to call at the end of training to make sure, all training data hook is saved. It blocks the running until all data hooks are saved.
- collect_metrics(
- table: MapDataset[Any],
- metrics_collectors: tlc.metrics.collectors.metrics_collector_base.MetricsCollectorType,
- *,
- predictor: Module | Predictor | None = None,
- foreign_table_url: Url | str | None = None,
- constants: dict[str, Any] | None = None,
- constants_schemas: dict[str, Schema] | None = None,
- run_url: Url | str | None = None,
- collect_aggregates: bool = True,
- split: str = '',
- exclude_zero_weights: bool = False,
- dataloader_args: dict[str, Any] | None = None,
Collect per-sample metrics with a map-style dataset.
Writes a single metrics table joined to a foreign
Tableby row index. The written metrics table will contain any constants contained in theconstantsargument, as well as any metrics computed by the metrics collectors.Adds the metadata of the metrics table to the
metricsproperty of the Run.Adds the Url of the foreign Table to the Run as an input.
Collects aggregate values from the metrics collectors and add them to the Run.
The dataset’s index
iis interpreted as the row index of the foreignTablefor the per-sample join. Pass aTableorTableViewto derive the foreign URL automatically, or any otherMapDatasettogether withforeign_table_urlto declare the join explicitly. The two paths are mutually exclusive: passingforeign_table_urlalongside aTable/TableViewis rejected.- Parameters:
table – A map-style dataset (any object with
__len__and__getitem__). ATableorTableViewworks directly; for a custom dataset,foreign_table_urlmust be passed so metrics can be linked back to aTable. (Parameter will be renamed todatasetin 3.0.)metrics_collectors – A list of metrics collectors to use. Can be a single metrics collector, a list of metrics collectors, or a list of callables with the signature
Callable[[Any, PredictorOutput], dict[str, Any]].constants – A dictionary of constants to use when collecting metrics.
constants_schemas – A dictionary of schemas for the constants. If no schemas are provided, the schemas will be inferred from the constants.
run_url – The url of the run to add the metrics to. If not specified, the active run will be used. If no active run is found, a new run will be created.
collect_aggregates – Whether to collect aggregate values from the metrics collectors and add them to the Run. This allows an aggregate view to be shown in the Project page of the 3LC Dashboard. Aggregate values are computed for all computable columns in the metrics collectors, and are prefixed with the split name. For example, if a metrics collector defines a computable column called “accuracy”, and the split is “train”, then the aggregate value will be called “train_accuracy_avg”.
split – The split of the dataset. This will be prepended to the aggregate metric names.
exclude_zero_weights – Whether to exclude samples with zero weights when collecting metrics. Reads weights from the foreign
Table; requiresforeign_table_url=(or thattableis aTableorTableView).foreign_table_url – Url of the
Tableto link the metrics back to. Required whentableis a custom map-style dataset; must NOT be passed whentableis itself aTableorTableView(the URL is derived fromtable.url).dataloader_args – Additional arguments to pass to the dataloader. Samples produced by
table(after any transform) must be combinable by the activecollate_fn— the defaulttorch.utils.data.default_collatehandles tensors, numbers, strings, anddict/list/tupletrees thereof. For heterogeneous samples (e.g. PIL images, variable-length sequences), pass{"collate_fn": <your fn>}here.
- Raises:
ValueError – If
tableis aDataLoader; ifforeign_table_urlis provided alongside aTableorTableViewtable; or iftableis a custom map-style dataset andforeign_table_urlis not provided.
- config: Configuration = None¶
A lazy alias for the live
Configurationsingleton. Use this to access and modify the live configuration.Example:
import tlc tlc.config.logging.level = "DEBUG"
- init(
- project_name: str | None = None,
- run_name: str | None = None,
- *,
- description: str | None = None,
- parameters: dict[str,
- typing.Any] | None = None,
- if_exists: typing.Literal[reuse,
- overwrite,
- rename,
- raise] = 'rename',
- root_url: tlc.Url | str | None = None,
- run_url: tlc.Url | str | None = None,
Initialize a 3LC Run.
Initializes a 3LC Run object and sets it as the active run for the current session. Starts the 3LC indexing threads.
.. note::
project_nameandrun_nameare kept positional so the commontlc.init("my-project", "my-run")form remains ergonomic. All other parameters are keyword-only.- Parameters:
project_name – Name of the project. If empty, the run will be stored under a default project.
run_name – Name of the Run. If empty, a random name will be generated.
description – Description of the run.
parameters – Parameters of the run.
if_exists – How to deal with existing runs. Options are “reuse”, “overwrite”, “rename”, “raise”.
root_url – The root url to use. If not provided, the project root url will be used.
run_url – Url to the run. Mutually exclusive with run_name, project_name, and root_url.
- Returns:
A Run object.
- Raises:
ValueError – If run_url is provided together with project_name, root_url, or run_name.
- log( ) None¶
Log output data to the active Run or a specified Run.
If keys ‘epoch’ or ‘iteration’ are present in the data, charts for the logged data will be created against those values in the Runs overview in the Dashboard.
Note
This function is intended for logging output data for a Run as a whole, or aggregated over an epoch or iteration. For logging data for individual samples, refer to the Collect Metrics section in the User Guide.
- Parameters:
data – The data to log.
run – The Run to log the data to. If not provided, the active Run will be used.
- Raises:
ValueError – If no Run is provided and there is no active Run.