`tlc.core.objects.table`¶

The abstract base class for all Table types.

Module Contents¶

Classes¶

Class	Description
`ImmutableDict`	An immutable access interface to a nested dictionary representing a TableRow.
`TableRows`	An immutable access interface to the rows of a Table object
`Table`	The abstract base class for all Table types.

Functions¶

Function	Description
`sort_tables_chronologically`	Sort a list of tables chronologically.
`squash_table`	Create a copy of this table where all lineage is squashed.

Data¶

Data	Description
`TableRow`	Generic type for a row of a table.

API¶

TableRow = None¶: Generic type for a row of a table.

class ImmutableDict(

*args: Any,

**kwargs: Any,

)¶

Bases: dict[str, object]

An immutable access interface to a nested dictionary representing a TableRow.

This class is used to make access to table rows immutable, and to provide a consistent interface for accessing nested column data.

copy() → dict[str, object]¶: Return a deep copy of the dict as a standard mutable dict.

class TableRows( table: Table, )¶: An immutable access interface to the rows of a Table object

class Table( *, url: Url | None = None, created: str | None = None, description: str | None = None, row_cache_url: Url | None = None, row_cache_populated: bool | None = None, override_table_rows_schema: Any = None, init_parameters: Any = None, input_tables: list[Url] | None = None, )¶

Bases: tlc.core.object.Object

The abstract base class for all Table types.

Warning

Do not instantiate this class directly. Use one of the Table.from_* methods instead.

A Table is an object with two specific responsibilities:

Creating table rows on demand (Either through the row-based access interface table_rows, or through the sample-based access interface provided by __getitem__).
Creating a schema which describes the type of produced rows (through the rows_schema property)

Both types of produced data are determined by immutable properties defined by each particular Table type.

ALTERNATIVE INTERFACE/CACHING:

A full representation of all table rows can - for performance reasons - also be retrieved through the get_rows_as_binary method.

This method will try to retrieve a cached version of the table rows if

row_cache_url is non-empty AND
row_cache_populated is True

When this is the case, it is guaranteed that the schema property of the table is fully populated, including the nested ‘rows_schema’ property which defines the layout of all table rows.

When this cached version is NOT defined, however, get_rows_as_binary() needs to iterate over all rows to produce the data.

If row_cache_url is non-empty, the produced binary data will be cached to the specified location. After successful caching, the updated Table object will be written to its backing URL exactly once, now with ‘row_cache_populated’ set to True and with the schema fully updated. Also, the row_count property is guaranteed to be correct at this time.

Whether accessing data from a Table object later refers to this cached version (or produces the data itself) is implementation specific.

STATE MUTABILITY:

As described above, Tables are constrained in how they are allowed to change state:

The data production parameters (“recipe”) of a table are immutable
The persisted JSON representation of a Table (e.g. on disk) can take on three different states, and each state can be written only once:
1. Bare-bones recipe
2. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = False)
3. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = True)

Parameters:

url – The URL of the table.
created – The creation time of the table.
description – The description of the table.
row_cache_url – The URL of the row cache.
row_cache_populated – Whether the row cache is populated.
override_table_rows_schema – The schema to override the table rows schema.
init_parameters – The initial parameters of the table.
input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage. This parameter serves as an explicit mechanism for tracking table relationships beyond the automatic lineage tracing typically managed by subclasses.

ensure_complete_schema() → None¶: Ensure that the table has a complete schema.

Create a copy of this table.

The copy is performed to:

A URL derived from the given table_name, dataset_name, project_name, and root_url if given
destination_url, if given
A generated URL derived from the tables’s URL, if none of the above are given

Parameters:

table_name – The name of the table to copy to.
dataset_name – The name of the dataset to copy to.
project_name – The name of the project to copy to.
root_url – The root URL to copy to.
if_exists – The behavior to use if the destination URL already exists.
destination_url – The URL to copy the table to.

Returns:

The copied table.

ensure_dependent_properties() → None¶: Ensure that the table set row_count as required to reach fully defined state.

ensure_data_production_is_ready() → None¶

A method that ensures that the table is ready to produce data

This method is called before any access to the Table’s data is made. It is used to ensure that the Table has preformed any necessary data production steps. Normally Tables don’t produce data until it is requested, but this method can be called to force data production.

Note that subsequent applications of this method will not change the data, as a Table is immutable.

property collecting_metrics: bool¶: Getter for collecting_metrics.

collection_mode() → Iterator[None]¶

Enable metrics-collection mode on the Table.

When collecting metrics mode is enabled, only maps defined by calls to map_collect_metrics() are applied to the table rows.

property row_schema: Schema¶: Returns the schema for a single row of this table.

property rows_schema: Schema¶: Returns the schema for all rows of this table.

table_rows() → TableRows¶: Access the rows of this table as an immutable mapping.

property name: str¶: The name of the table.

get_row_cache_size() → int¶: Returns the size of the row cache in bytes.

set_row_cache_url( row_cache_url: Url | str, ) → bool¶

Assign a new row_cache_url value.

Will set row_cache_populated to False if the cache file has changed.

Parameters:: row_cache_url – The new row_cache_url value.
Returns:: True if the row_cache_url value was changed, False otherwise.

static transform_value( schema: Schema | None, item: object, ) → object¶

Transform a single table value according to the schema.

3LC currently only uses pure string representations of datetime values. This helper function is used to convert any timestamps to strings.

Parameters:

schema – The schema corresponding to the column of the value.
item – The value to transform.

is_all_parquet() → bool¶: Return True if the backing data for this table is all parquet files.

write_to_row_cache( create_url_if_empty: bool = False, overwrite_if_exists: bool = True, ) → None¶

Cache the table rows to the row cache Url.

If the table is already cached, or the Url of the Table is an API-Url, this method does nothing.

In the case where self.row_cache_url is empty, a new Url will be created and assigned to self.row_cache_url if create_url_if_empty is True, otherwise a ValueError will be raised.

Parameters:

create_url_if_empty – Whether to create a new row cache Url if self.row_cache_url is empty.
overwrite_if_exists – Whether to overwrite the row cache file if it already exists.

get_rows_as_binary( exclude_bulk_data: bool = False, ) → bytes¶

Return all rows of the table as a binary Parquet buffer, with optional exclusion of bulk data columns.

This method will return the ‘Table-representation’ of the table, which is the most efficient representation, since only references to the input data are stored.

Parameters:: exclude_bulk_data – Whether to exclude bulk data columns from the serialized rows.
Returns:: The rows of the table as a binary Parquet buffer.

should_include_schema_in_json( schema: Schema, ) → bool¶: Only include the schema in the JSON representation if it is not empty.

latest( use_new_columns: bool = True, wait_for_rescan: bool = True, timeout: float | None = None, ) → Table¶

Return the most recent version of the table, as indexed by the TableIndexingTable indexing mechanism.

This function retrieves the latest version of this table that has been indexed or exists in the ObjectRegistry. If desired it is possible to wait for the next indexing run to complete by setting wait_for_rescan to True together with a timeout in seconds.

For more information about how the indexing system works, see the indexing page.

Example:

table_instance = Table()
... # working
latest_table = table_instance.latest()

Parameters:

use_new_columns – If new columns have been added to the latest revision of the Table, whether to include these values in the sample-view of the Table. Defaults to True.
rescan – Whether to rescan the TableIndexingTable (lineage) before trying to resolve latest revision. Defaults to True.
timeout – The timeout in seconds to block when waiting for the next indexing run to complete. Defaults to None meaning that indexing can run forever.

Returns:

The latest version of the table.

Raises:

ValueError – If the latest version of the table cannot be found in the dataset or if an error occurs when attempting to create an object from the latest Url.

revision( tag: Literal[latest] | None = None, table_url: Url | str = '', table_name: str = '', ) → Table¶

Return a specific revision of the table.

This function retrieves a specific revision of this table. The revision can be specified by tag, table_url, or table_name. If no arguments are provided, the current table is returned.

Parameters:

tag – The tag of the revision to return. Currently only ‘latest’ is supported.
table_url – The URL of the revision to return.
table_name – The name of the revision to return.

squash( output_url: Url, dataset_name: str | None = None, project_name: str | None = None, ) → Table¶

Create a copy of this table where all lineage is squashed.

A squashed table is a table where all lineage is merged. This is useful for creating a table that is independent of its parent tables. This function creates a new table with the same rows as the original table, but with no lineage. The new table is written to the output url.

Example:

table = Table()
... # working
squashed_table = table.squash(Url("s3://bucket/path/to/table"), dataset_name="my_dataset_v2")

Parameters:

table – The table to squash.
output_url – The output url for the squashed table.
dataset_name – The dataset name to use for the squashed table. If not provided, the dataset_name of the original table is used.
project_name – The project name to use for the squashed table. If not provided, the project_name of the original table is used.

Returns:

The squashed table.

property pyarrow_schema: Schema | None¶: Returns a pyarrow schema for this table

property columns: list[str]¶: Return a list of column names for this table.

property bulk_data_url: Url¶

Return the sample url for this table.

The bulk data url is the url to the folder containing any bulk data for this table.

to_pandas() → DataFrame¶

Return a pandas DataFrame for this table.

Returns:: A pandas DataFrame populated from the rows of this table.

get_column( name: str, combine_chunks: bool = True, ) → Array | ChunkedArray¶

Return a the specified column of the table as a pyarrow table.

To get nested sub-columns, use dot notation. E.g. ‘column.sub_column’. The values in the column will be the row-view of the table. A column which is a PIL image in its sample-view, for instance, will be returned as a column of strings.

Parameters:

name – The name of the column to get.
combine_chunks – Whether to combine the chunks of the returned column in the case that it is a ChunkedArray. Defaults to True.

Returns:

A pyarrow table containing the specified column.

Raises:

KeyError – If the column does not exist in the table.

add_column( column_name: str, values: list[object] | object, schema: Schema | None = None, url: Url | None = None, ) → Table¶

Create a derived table with a column added.

This method creates and returns a new revision of the table with a new column added.

Parameters:

column_name – The name of the column to add.
values – The values to add to the column. This can be a list of values, or a single value to be added to all rows.
schema – The schema of the column to add. If not provided, the schema will be inferred from the values.
url – The url to write the new table to. If not provided, the new table will be located next to the current table.

Returns:

A new table with the column added.

delete_column( column_name: str, *, table_name: str | None = None, table_url: Url | str = '', description: str | None = None, ) → Table¶

Create a derived table with a column deleted.

This method creates and returns a new revision of the table with a column deleted.

Parameters:

column_name – The name of the column to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url to write the new table to. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the column deleted.

delete_columns( column_names: Sequence[str], *, table_name: str | None = None, table_url: Url | str = '', description: str | None = None, ) → Table¶

Create a derived table with columns deleted.

This method creates and returns a new revision of the table with the specified columns deleted.

Parameters:

column_names – The names of the columns to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the columns deleted.

delete_rows( indices: Sequence[int], *, table_name: str | None = None, table_url: Url | str = '', description: str | None = None, ) → Table¶

Delete rows from a Table.

This method creates and returns a new revision of the table with the specified rows deleted.

Parameters:

indices – The indices of the rows to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the rows deleted.

delete_row( index: int, *, table_name: str | None = None, table_url: Url | str = '', description: str | None = None, ) → Table¶

Delete a row from a Table.

This method creates and returns a new revision of the table with the specified row deleted.

Parameters:

index – The index of the row to delete.
table_name – The name of the new table. If not provided and table_url is not provided, a default name will be used.
table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
description – A description of the table. If not provided, a default description will be used.

Returns:

A new table with the row deleted.

set_value_map( value_path: str, value_map: dict[float, Any], *, edited_table_url: Url | str = '', ) → Table¶

Set a value map for a specified numeric value within the schema of the Table.

Sets a value map for a value within the schema of the Table, returning a new table revision with the applied value map.

This method creates and returns a new revision of the table with a overridden value map for a specific numeric value.

Any item in a Schema of type NumericValue can have a value map. A value map is a mapping from a numeric value to a MapElement, where a MapElement contains metadata about a categorical value such as category names and IDs.

Partial Value Maps

Value maps may be partial, i.e. they may only contain a mapping for a subset of the possible numeric values. Indeed they can be floating point values, which can be useful for annotating continuous variables with categorical metadata, such as color or label.

For more fine-grained control over value map editing, see Table.set_value_map_item and Table.add_value_map_item, and Table.delete_value_map_item.

Parameters:

value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value_map – The value map to set on the value. The value will be converted to a a dictionary mapping from floating point values to MapElement if it is not already.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map set.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

delete_value_map( value_path: str, *, edited_table_url: Url | str = '', ) → Table¶

Delete a value map for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the Table with a deleted value map for a specific numeric value.

Parameters:

value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

set_value_map_item( value_path: str, value: float | int, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: Url | str = '', *, edited_table_url: Url | str = '', ) → Table¶

Update an existing value map item for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the table with a value map item added to a value in a column.

Example:

table = Table.from_url("cats-and-dogs")
new_table = table.set_value_map_item("label", 0, "cat")
# new_table is now a new revision of the table with a updated value map item added to the value 0 in the column
assert table.latest() == new_table, "The new table is the latest revision of the table."

To add a new value map item at the next available value in the value map, see Table.add_value_map_item.

To delete a value map item, see Table.delete_value_map_item.

Parameters:

value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value – The numeric value to add the value map item to. If the value already exists, the value map item will be updated.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

add_value_map_item( value_path: str, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: Url | str = '', *, value: float | int | None = None, edited_table_url: Url | str = '', ) → Table¶

Add a value map item for a specified numeric value within the schema of the Table.

Adds a new value map item to the schema of the Table without overwriting existing items.

If the specified value or internal name already exists in the value map, this method will raise an error to prevent overwriting.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:

value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
value – The numeric value to add the value map item to. If not provided, the value will be the next available value in the value map (starting from 0).
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item added.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name already exists in the value map.

delete_value_map_item( value_path: str, *, value: float | int | None = None, internal_name: str = '', edited_table_url: Url | str = '', ) → Table¶

Delete a value map item for a specified numeric value within the schema of the Table.

Deletes a value map item from the schema of the Table, by numeric value or internal name.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:

value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value – The numeric value of the value map item to delete. If not provided, the value map item will be deleted by internal name.
internal_name – The internal name of the value map item to delete. If not provided, the value map item will be deleted by numeric value.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name does not exist in the value map.

get_value_map( value_path: str, ) → dict[float, MapElement] | None¶

Get the value map for a value path.

Parameters:: value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
Returns:: A value map for the value, or None if the value does not exist or does not have a value map.

get_simple_value_map( value_path: str, ) → dict[int, str] | None¶

Get the simple value map for a value path, mapping class indices to class names.

Parameters:: value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
Returns:: A simple value map for the value, or None if the value does not exist or does not have a value map.

export(

output_url: Url | str | Path,

format: str | None = None,

weight_threshold: float = 0.0,

**kwargs: object,

) → None¶

Export this table to the given output url.

Parameters:

output_url – The output url to export to.
format – The format to export to. If not provided, the format will be inferred from the table and the output url.
weight_threshold – The weight threshold to use for exporting. If the table has a weights column, rows with a weight below this threshold will be excluded from the export.
kwargs – Additional arguments to pass to the exporter. Which arguments are valid depends on the format. See the documentation for the subclasses of Exporter for more information.

is_descendant_of( other: Table, ) → bool¶

Return True if this table is a descendent of the provided table.

Parameters:: other – The table to check if this table is a descendant of.
Returns:: True if this table is a descendant of the provided table, False otherwise.

get_foreign_table_url( column: str = FOREIGN_TABLE_ID, ) → Url | None¶

Return the input table URL referenced by this table.

This method is intended for tables that reference a single input table. Typically, this would be a metrics table of per-example metrics collected using another table.

If the table contains a column named ‘input_table_id’ with value map indicating it references a input table by Url, this method returns the Url of that input table.

Parameters:: column – The name of the column to check for a foreign key.
Returns:: The URL of the foreign table, or None if no input table is found.

property weights_column_name: str | None¶: Return the name of the column containing the weights for this table, or None if no such column exists.

create_sampler( exclude_zero_weights: bool = True, weighted: bool = True, shuffle: bool = True, repeat_by_weight: bool = False, ) → Sampler[int]¶

Returns a sampler based on the weights column of the table. The type and behavior of the returned Sampler also depends on the values of the argument flags.

Parameters:

exclude_zero_weight – If True, rows with a weight of zero will be excluded from the sampler. This is useful for reducing the length of the sampler for datasets with zero-weighted samples, and thus the length of an epoch when using a PyTorch DataLoader.
weighted – If True, the sampler will use sample weights (beyond the exclusion of zero-weighted rows) to ensure that the distribution of the sampled rows matches the distribution of the weights. When weighted is set to True, you are no longer guaranteed that every row in the table will be sampled in a single epoch, even if all weights are equal.
shuffle – If False, the valid indices will be returned in sequential order. A value of False is mutually exclusive with the weighted flag.
repeat_by_weight – If True, the sampler will repeat the indices based on the weights. This is useful for ensuring that the distribution of the sampled rows matches the distribution of the weights, while still sampling every row in the table (with weight > 1) in a single epoch. The number of repeats of samples with fractional weights will be determined probabilistically. A value of True will set the length of the sampler (and thus an epoch) to the sum of the weights. This flag requires values of True for both weighted and exclude_zero_weights.

Returns:

A Sampler based on the weights column of the table.

map( func: Callable[[Any], object], ) → Table¶

Add a function to the list of functions to be applied to each sample in the table before it is returned by the __getitem__ method when not doing metrics collection.

Parameters:: func – The function to apply to each sample when not doing metrics collection.
Returns:: The table with the function added to the list of functions to apply to each sample when not doing metrics collection.

map_collect_metrics( func: Callable[[Any], object], ) → Table¶

Add a function to the list of functions to be applied to each sample in the table before it is returned by the __getitem__ method when doing metrics collection. If this list is empty, the map functions will be used instead.

Parameters:: func – The function to apply to each sample when doing metrics collection.
Returns:: The table with the function added to the list of functions to apply to each sample when doing metrics collection.

clear_maps() → None¶: Clear any maps added to the table.

static from_url( url: Url | str, ) → Table¶

Create a table from a url.

Parameters:

url – The url to create the table from

Returns:

A concrete Table subclass

Raises:

ValueError – If the url does not point to a table.
FileNotFoundError – If the url cannot be found.

Create a table from the names specifying its url.

Parameters:

table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url.

Returns:

The table at the resulting url.

static from_torch_dataset( dataset: torch.utils.data.Dataset, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, all_arrays_are_fixed_size: bool = False, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | pathlib.Path | str | None = None, ) → TableFromTorchDataset¶

Create a Table from a Torch Dataset.

Parameters:

dataset – The Torch Dataset to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
all_arrays_are_fixed_size – Whether all arrays (tuples, lists, etc.) in the dataset are fixed size. This parameter is only used when generating a SampleType from a single sample in the dataset when no structure is provided.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromTorchDataset instance.

static from_pandas( df: pandas.DataFrame, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | pathlib.Path | str | None = None, ) → TableFromPandas¶

Create a Table from a Pandas DataFrame.

Parameters:

df – The Pandas DataFrame to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromPandas instance.

static from_dict(data: collections.abc.Mapping[str, object], structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | pathlib.Path | str | None = None) → TableFromPydict¶

Create a Table from a dictionary.

Parameters:

data – The dictionary to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromPydict instance.

Create a Table from a .csv file.

Parameters:

csv_file – The url of the .csv file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromCsv instance.

Create a Table from a COCO annotations file.

Parameters:

annotations_file – The url of the COCO annotations file.
image_folder – The url of the folder containing the images referenced in the COCO annotations file. If not provided, the image paths in the annotations file will be assumed to either be absolute OR relative to the annotations file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
include_iscrowd – Whether to include the per-instance iscrowd flag in the table rows.
keep_crowd_annotations – Whether to include annotations with iscrowd=1 in the Table.
task – The task of the dataset. Can be either ‘detect’, ‘segment’, or ‘pose’.
segmentation_format – The format of the segmentation. Can be either ‘polygons’ or ‘masks’.
points – Default keypoint coordinates, used for drawing new instances in the Dashboard. Pose only.
point_attributes – Attributes for each keypoint (e.g. name or color). Pose only.
lines – Default skeleton topology for pose. Will override the skeleton provided in the annotations file. Pose only.
line_attributes – Attributes for each line (e.g. name or color). Pose only.
triangles – Triangles for pose.
triangle_attributes – Attributes for each triangle (e.g. name or color). Pose only.
flip_indices – Flip indices for pose.
oks_sigmas – OKS sigmas for pose.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromCoco instance.

Create a Table from a Parquet file.

Parameters:

parquet_file – The url of the Parquet file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromParquet instance.

Create a Table from a NDJSON file.

Parameters:

ndjson_file – The url of the NDJSON file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

Returns:

A TableFromNdjson instance.

Create a Table from a YOLO NDJSON file.

The first line is required to contain the ‘class_names’ and ‘task’ keys, and the rest of the lines are required to contain the ‘file’, ‘width’, ‘height’, ‘split’ and ‘annotations’ keys.

Parameters:

ndjson_file – The url of the NDJSON file.
image_folder – The folder containing the images, used to handle relative paths. If not provided, relative image paths are made absolute with respect to the NDJSON file directory.
split – The split to load from the dataset. Rows with ‘split’ equal to this value will be loaded.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table. If not provided, the description is set to the one in the first line of the NDJSON file, or an empty string.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}.

Returns:

A TableFromYoloNdjson instance.

Create a Table from a YOLO annotations file.

Parameters:

dataset_yaml_file – The url of the YOLO dataset .yaml file.
split – The split to load from the dataset.
datasets_dir – If path in the dataset_yaml_file is relative, this directory will be prepended to it. Not used if path is absolute. If path is relative and datasets_dir is not provided, an error is raised.
override_split_path – If provided, this will be used as the path to the directory with images and labels instead of the one specified in the dataset_yaml_file. Can be an iterable of such paths.
task – The task of the dataset. Can be either ‘detect’, ‘segment’, ‘pose’, or ‘obb’.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
points – (Pose estimation only) Optional list default points for the keypoints, relative to a unit square. This value will be used when drawing new keypoint instances in the Dashboard.
point_attributes – (Pose estimation only) Optional list of point attributes used to label the keypoints , e.g names.
lines – (Pose estimation only) Optional list of keypoints that should be connected by lines. Formatted as a flat list of vertexes.
line_attributes – (Pose estimation only) Optional list of line attributes used to label the lines.
triangles – (Pose estimation only) Optional list of vertices that should be connected by lines. Formatted as a flat list of vertexes.
triangle_attributes – (Pose estimation only) Optional list of triangle attributes used to label the triangles.
flip_indices – (Pose estimation only) Optional list of flip indices used to flip the keypoints.
oks_sigmas – (Pose estimation only) Optional list of OKS sigmas used to compute the OKS metric.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

When task is pose, values for points, point_attributes, lines, line_attributes, triangles, triangle_attributes, oks_sigmas and flip_indices can be provided directly in the YOLO yaml file. Any provided constructor arguments will take precedence over values in the yaml file.

Returns:: A TableFromYolo instance.

static from_hugging_face( path: str, name: str | None = None, split: str = 'train', table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | pathlib.Path | str | None = None, ) → TableFromHuggingFace¶

Create a Table from a Hugging Face Hub dataset, similar to the datasets.load_dataset function.

Parameters:

path – Path or name of the dataset to load, same as in datasets.load_dataset.
name – Name of the dataset to load, same as in datasets.load_dataset.
split – The split to load, same as in datasets.load_dataset.
table_name – The name of the table. If not provided, the table_name is set to split.
dataset_name – The name of the dataset. If not provided, dataset_name is set to path if name is not provided, or to {path}-{name} if name is provided.
project_name – The name of the project. If not provided, project_name is set to hf-{path}.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromHuggingFace instance.

static from_image_folder( root: str | pathlib.Path | tlcurl.url.Url, image_column_name: str = 'image', label_column_name: str = 'label', include_label_column: bool = True, extensions: str | tuple[str, ...] | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, label_overrides: dict[str, tlc.core.schema.MapElement | str] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | pathlib.Path | str | None = None, ) → Table¶

Create a Table from an image folder.

This function can be used to load a folder containing subfolders where each subfolder represents a label, or to recursively load all matching images in a folder structure without labels. It extends the functionality of torchvision.datasets.ImageFolder.

When include_label_column is True, the dataset elements are returned as tuples of a PIL.Image and the integer class label. When include_label_column is False, PIL.Images are returned without labels. In this case, root will be recursively scanned.

Parameters:

root – The root directory of the image folder.
image_column_name – The name of the column containing the images.
label_column_name – The name of the column containing the class labels.
include_label_column – Whether to include a column of class labels in the table.
extensions – A list of allowed image extensions. If not provided, a default list of image extensions is used.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
label_overrides – A sparse mapping of class names (the directory names) to new class names. A new class name can be a string with the new class name or a MapElement.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

static join_tables(tables: collections.abc.Sequence[tlc.core.objects.table.Table] | collections.abc.Sequence[tlcurl.url.Url | str | pathlib.Path], table_name: str = _3LC_FALLBACK_JOINED_TABLE_NAME, dataset_name: str | None = None, project_name: str | None = None, root_url: tlcurl.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, input_tables: list[tlcurl.url.Url | str | pathlib.Path] | None = None, weight_column_value: float = 1.0, *, table_url: tlcurl.url.Url | str | pathlib.Path | None = None) → Table¶

Join multiple tables into a single table.

The tables will be joined vertically, meaning that the rows of the resulting table will be the concatenation of the rows of the input tables, in the order they are provided.

The schemas of the tables must be compatible for joining. If the tables have different schemas, the schemas will be attempted merged, and an error will be raised if this is not possible.

Parameters:

tables – A list of Table instances to join.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table.
description – A description of the table.
extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.
weight_column_value – The value to initialize the weight column with if add_weight_column is True.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

sort_tables_chronologically( tables: list[Table], reverse: bool = False, ) → list[Table]¶

Sort a list of tables chronologically.

Parameters:: tables – A list of tables to sort chronologically.
Returns:: A list of tables sorted chronologically.

squash_table( table: Table | Url, output_url: Url, ) → Table¶

Create a copy of this table where all lineage is squashed.

Example:

table_instance = Table()
... # working
squashed_table = squash_table(table_instance, Url("s3://bucket/path/to/table"))

Parameters:

table – The table to squash.
output_url – The output url for the squashed table.

Returns:

The squashed table.

tlc.core.objects.table¶

Module Contents¶

Classes¶

Functions¶

Data¶

API¶

`tlc.core.objects.table`¶