tlc.core.objects.table#

The abstract base class for all Table types.

Module Contents#

Classes#

Class

Description

ImmutableDict

An immutable access interface to a nested dictionary representing a TableRow.

TableRows

An immutable access interface to the rows of a Table object

Table

The abstract base class for all Table types.

Functions#

Function

Description

sort_tables_chronologically

Sort a list of tables chronologically.

squash_table

Create a copy of this table where all lineage is squashed.

Data#

Data

Description

TableRow

Generic type for a row of a table.

API#

tlc.core.objects.table.TableRow = None#

Generic type for a row of a table.

class tlc.core.objects.table.ImmutableDict(*args: Any, **kwargs: Any)#

Bases: typing.Dict[str, object]

An immutable access interface to a nested dictionary representing a TableRow.

This class is used to make access to table rows immutable, and to provide a consistent interface for accessing nested column data.

copy() dict[str, object]#

Return a deep copy of the dict as a standard mutable dict.

class tlc.core.objects.table.TableRows(table: tlc.core.objects.table.Table)#

An immutable access interface to the rows of a Table object

class tlc.core.objects.table.Table(url: tlc.core.url.Url | None = None, created: str | None = None, description: str | None = None, row_cache_url: tlc.core.url.Url | None = None, row_cache_populated: bool | None = None, override_table_rows_schema: Any = None, init_parameters: Any = None)#

Bases: tlc.core.object.Object

The abstract base class for all Table types.

A Table is an object with two specific responsibilities:

  1. Creating table rows on demand (Either through the row-based access interface table_rows, or through the sample-based access interface provided by __getitem__).

  2. Creating a schema which describes the type of produced rows (through the rows_schema property)

Both types of produced data are determined by immutable properties defined by each particular Table type.

ALTERNATIVE INTERFACE/CACHING:

A full representation of all table rows can - for performance reasons - also be retrieved through the get_rows_as_binary method.

This method will try to retrieve a cached version of the table rows if

  • row_cache_url is non-empty AND

  • row_cache_populated is True

When this is the case, it is guaranteed that the schema property of the table is fully populated, including the nested ‘rows_schema’ property which defines the layout of all table rows.

When this cached version is NOT defined, however, get_rows_as_binary() needs to iterate over all rows to produce the data.

If row_cache_url is non-empty, the produced binary data will be cached to the specified location. After successful caching, the updated Table object will be written to its backing URL exactly once, now with ‘row_cache_populated’ set to True and with the schema fully updated. Also, the row_count property is guaranteed to be correct at this time.

Whether accessing data from a Table object later refers to this cached version (or produces the data itself) is implementation specific.

STATE MUTABILITY:

As described above, Tables are constrained in how they are allowed to change state:

  • The data production parameters (“recipe”) of a table are immutable

  • The persisted JSON representation of a Table (e.g. on disk) can take on three different states, and each state can be written only once:

    1. Bare-bones recipe

    2. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = False)

    3. Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = True)

Parameters:
  • url – The URL of the table.

  • created – The creation time of the table.

  • description – The description of the table.

  • row_cache_url – The URL of the row cache.

  • row_cache_populated – Whether the row cache is populated.

  • override_table_rows_schema – The schema to override the table rows schema.

  • init_parameters – The initial parameters of the table.

ensure_complete_schema() None#

Ensure that the table has a complete schema.

copy(table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, rename, overwrite] = 'raise', *, destination_url: tlc.core.url.Url | None = None) tlc.core.objects.table.Table#

Create a copy of this table.

The copy is performed to:

  1. A URL derived from the given table_name, dataset_name, project_name, and root_url if given

  2. destination_url, if given

  3. A generated URL derived from the tables’s URL, if none of the above are given

Parameters:
  • table_name – The name of the table to copy to.

  • dataset_name – The name of the dataset to copy to.

  • project_name – The name of the project to copy to.

  • root_url – The root URL to copy to.

  • if_exists – The behavior to use if the destination URL already exists.

  • destination_url – The URL to copy the table to.

Returns:

The copied table.

ensure_dependent_properties() None#

Ensure that the table set row_count as required to reach fully defined state.

ensure_data_production_is_ready() None#

A method that ensures that the table is ready to produce data

This method is called before any access to the Table’s data is made. It is used to ensure that the Table has preformed any necessary data production steps. Normally Tables don’t produce data until it is requested, but this method can be called to force data production.

Note that subsequent applications of this method will not change the data, as a Table is immutable.

property collecting_metrics: bool#

Getter for collecting_metrics.

collection_mode() Iterator[None]#

Enable metrics-collection mode on the Table.

When collecting metrics mode is enabled, only maps defined by calls to map_collect_metrics() are applied to the table rows.

property row_schema: tlc.core.schema.Schema#

Returns the schema for a single row of this table.

property rows_schema: tlc.core.schema.Schema#

Returns the schema for all rows of this table.

property table_rows: tlc.core.objects.table.TableRows#

Access the rows of this table as an immutable mapping.

property name: str#

The name of the table.

get_row_cache_size() int#

Returns the size of the row cache in bytes.

set_row_cache_url(row_cache_url: tlc.core.url.Url | str) bool#

Assign a new row_cache_url value.

Will set row_cache_populated to False if the cache file has changed.

Parameters:

row_cache_url – The new row_cache_url value.

Returns:

True if the row_cache_url value was changed, False otherwise.

static transform_value(schema: tlc.core.schema.Schema | None, item: object) object#

Transform a single table value according to the schema.

By default, any numpy arrays are converted to lists.

Parameters:
  • schema – The schema corresponding to the column of the value.

  • item – The value to transform.

is_all_parquet() bool#

Return True if the backing data for this table is all parquet files.

write_to_row_cache(create_url_if_empty: bool = False, overwrite_if_exists: bool = True) None#

Cache the table rows to the row cache Url.

If the table is already cached, or the Url of the Table is an API-Url, this method does nothing.

In the case where self.row_cache_url is empty, a new Url will be created and assigned to self.row_cache_url if create_url_if_empty is True, otherwise a ValueError will be raised.

Parameters:
  • create_url_if_empty – Whether to create a new row cache Url if self.row_cache_url is empty.

  • overwrite_if_exists – Whether to overwrite the row cache file if it already exists.

get_rows_as_binary(exclude_bulk_data: bool = False) bytes#

Return all rows of the table as a binary Parquet buffer

This method will return the ‘Table-representation’ of the table, which is the most efficient representation, since only references to the input data are stored.

Parameters:

exclude_bulk_data – Whether to exclude bulk data columns from the serialized rows. This filter only applies to Tables that are fully cached on disk.

Returns:

The rows of the table as a binary Parquet buffer.

should_include_schema_in_json(schema: tlc.core.schema.Schema) bool#

Only include the schema in the JSON representation if it is not empty.

latest(use_new_columns: bool = True, wait_for_rescan: bool = True, timeout: float = 0) tlc.core.objects.table.Table#

Return the most recent version of the table, as indexed by the TableIndexingTable indexing mechanism.

This function retrieves the latest version of this table that has been indexed or exists in the ObjectRegistry. If desired it is possible to wait for the next indexing run to complete by setting wait_for_rescan to True together with a timeout in seconds.

Example:

table_instance = Table()
... # working
latest_table = table_instance.latest()
Parameters:
  • use_new_columns – If new columns have been added to the latest revision of the Table, whether to include these values in the sample-view of the Table. Defaults to True.

  • rescan – Whether to rescan the TableIndexingTable (lineage) before trying to resolve latest revision. Defaults to True.

  • timeout_s – The timeout in seconds to wait for the next indexing run to complete. Defaults to 0 seconds meaning that indexing can run forever.

Returns:

The latest version of the table.

Raises:

ValueError – If the latest version of the table cannot be found in the dataset or if an error occurs when attempting to create an object from the latest Url.

revision(tag: Literal[latest] | None = None, table_url: tlc.core.url.Url | str = '', table_name: str = '') tlc.core.objects.table.Table#

Return a specific revision of the table.

This function retrieves a specific revision of this table. The revision can be specified by tag, table_url, or table_name. If no arguments are provided, the current table is returned.

Parameters:
  • tag – The tag of the revision to return. Currently only ‘latest’ is supported.

  • table_url – The URL of the revision to return.

  • table_name – The name of the revision to return.

squash(output_url: tlc.core.url.Url, dataset_name: str | None = None, project_name: str | None = None) tlc.core.objects.table.Table#

Create a copy of this table where all lineage is squashed.

A squashed table is a table where all lineage is merged. This is useful for creating a table that is independent of its parent tables. This function creates a new table with the same rows as the original table, but with no lineage. The new table is written to the output url.

Example:

table = Table()
... # working
squashed_table = table.squash(Url("s3://bucket/path/to/table"), dataset_name="my_dataset_v2")
Parameters:
  • table – The table to squash.

  • output_url – The output url for the squashed table.

  • dataset_name – The dataset name to use for the squashed table. If not provided, the dataset_name of the original table is used.

  • project_name – The project name to use for the squashed table. If not provided, the project_name of the original table is used.

Returns:

The squashed table.

property pyarrow_schema: pyarrow.Schema | None#

Returns a pyarrow schema for this table

property columns: list[str]#

Return a list of column names for this table.

property bulk_data_url: tlc.core.url.Url#

Return the sample url for this table.

The bulk data url is the url to the folder containing any bulk data for this table. The root of the bulk data url can be overridden by setting the TLC_BULK_DATA_URL environment variable.

to_pandas() pandas.DataFrame#

Return a pandas DataFrame for this table.

Returns:

A pandas DataFrame populated from the rows of this table.

add_column(column_name: str, values: list[object] | object, schema: tlc.core.schema.Schema | None = None, url: tlc.core.url.Url | None = None) tlc.core.objects.table.Table#

Create a derived table with a column added.

This method creates and returns a new revision of the table with a new column added.

Parameters:
  • column_name – The name of the column to add.

  • values – The values to add to the column. This can be a list of values, or a single value to be added to all rows.

  • schema – The schema of the column to add. If not provided, the schema will be inferred from the values.

  • url – The url to write the new table to. If not provided, the new table will be located next to the current table.

Returns:

A new table with the column added.

set_value_map(value_path: str, value_map: dict[float, Any], *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table#

Set a value map for a specified numeric value within the schema of the Table.

Sets a value map for a value within the schema of the Table, returning a new table revision with the applied value map.

This method creates and returns a new revision of the table with a overridden value map for a specific numeric value.

Any item in a Schema of type NumericValue can have a value map. A value map is a mapping from a numeric value to a MapElement, where a MapElement contains metadata about a categorical value such as category names and IDs.

Partial Value Maps

Value maps may be partial, i.e. they may only contain a mapping for a subset of the possible numeric values. Indeed they can be floating point values, which can be useful for annotating continuous variables with categorical metadata, such as color or label.

For more fine-grained control over value map editing, see Table.set_value_map_item and Table.add_value_map_item, and Table.delete_value_map_item.

Parameters:
  • value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • value_map – The value map to set on the value. The value will be converted to a a dictionary mapping from floating point values to MapElement if it is not already.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map set.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

delete_value_map(value_path: str, *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table#

Delete a value map for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the Table with a deleted value map for a specific numeric value.

Parameters:
  • value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

set_value_map_item(value_path: str, value: float | int, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: tlc.core.url.Url | str = '', *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table#

Update an existing value map item for a specified numeric value within the schema of the Table.

This method creates and returns a new revision of the table with a value map item added to a value in a column.

Example:

table = Table.from_url("cats-and-dogs")
new_table = table.set_value_map_item("label", 0, "cat")
# new_table is now a new revision of the table with a updated value map item added to the value 0 in the column
assert table.latest() == new_table, "The new table is the latest revision of the table."

To add a new value map item at the next available value in the value map, see Table.add_value_map_item.

To delete a value map item, see Table.delete_value_map_item.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • value – The numeric value to add the value map item to. If the value already exists, the value map item will be updated.

  • internal_name – The internal name of the value map item. This is the primary identifier of the value map item.

  • display_name – The display name of the value map item.

  • description – The description of the value map item.

  • display_color – The display color of the value map item.

  • url – The url of the value map item.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Raises:

ValueError – If the value path does not exist or is not a NumericValue.

add_value_map_item(value_path: str, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: tlc.core.url.Url | str = '', *, value: float | int | None = None, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table#

Add a value map item for a specified numeric value within the schema of the Table.

Adds a new value map item to the schema of the Table without overwriting existing items.

If the specified value or internal name already exists in the value map, this method will raise an error to prevent overwriting.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • internal_name – The internal name of the value map item. This is the primary identifier of the value map item.

  • display_name – The display name of the value map item.

  • description – The description of the value map item.

  • display_color – The display color of the value map item.

  • url – The url of the value map item.

  • value – The numeric value to add the value map item to. If not provided, the value will be the next available value in the value map (starting from 0).

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item added.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name already exists in the value map.

delete_value_map_item(value_path: str, *, value: float | int | None = None, internal_name: str = '', edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table#

Delete a value map item for a specified numeric value within the schema of the Table.

Deletes a value map item from the schema of the Table, by numeric value or internal name.

For more details on value maps, refer to the documentation for Table.set_value_map.

Parameters:
  • value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

  • value – The numeric value of the value map item to delete. If not provided, the value map item will be deleted by internal name.

  • internal_name – The internal name of the value map item to delete. If not provided, the value map item will be deleted by numeric value.

  • edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.

Returns:

A new table with the value map item deleted.

Raises:

ValueError – If the value path does not exist or is not a NumericValue, or if the value or internal name does not exist in the value map.

get_value_map(value_path: str) dict[float, tlc.core.schema.MapElement] | None#

Get the value map for a value path.

Parameters:

value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.

Returns:

A value map for the value, or None if the value does not exist or does not have a value map.

export(output_url: tlc.core.url.Url | str | pathlib.Path, format: str | None = None, weight_threshold: float = 0.0, **kwargs: object) None#

Export this table to the given output url.

Parameters:
  • output_url – The output url to export to.

  • format – The format to export to. If not provided, the format will be inferred from the table and the output url.

  • weight_threshold – The weight threshold to use for exporting. If the table has a weights column, rows with a weight below this threshold will be excluded from the export.

  • kwargs – Additional arguments to pass to the exporter. Which arguments are valid depends on the format. See the documentation for the subclasses of Exporter for more information.

is_descendant_of(other: tlc.core.objects.table.Table) bool#

Return True if this table is a descendent of the provided table.

Parameters:

other – The table to check if this table is a descendant of.

Returns:

True if this table is a descendant of the provided table, False otherwise.

get_foreign_table_url(column: str = FOREIGN_TABLE_ID) tlc.core.url.Url | None#

Return the input table URL referenced by this table.

This method is intended for tables that reference a single input table. Typically, this would be a metrics table of per-example metrics collected using another table.

If the table contains a column named ‘input_table_id’ with value map indicating it references a input table by Url, this method returns the Url of that input table.

Parameters:

column – The name of the column to check for a foreign key.

Returns:

The URL of the foreign table, or None if no input table is found.

property weights_column_name: str | None#

Return the name of the column containing the weights for this table, or None if no such column exists.

create_sampler(exclude_zero_weights: bool = True, weighted: bool = True, shuffle: bool = True) torch.utils.data.sampler.Sampler[int]#

Returns a sampler based on the weights column of the table. The type and behavior of the returned Sampler also depends on the values of the argument flags. The sampler is always shuffled.

Parameters:
  • exclude_zero_weight – If True, rows with a weight of zero will be excluded from the sampler. This is useful for adjusting the length of the sampler, and thus the length of an epoch when using a PyTorch DataLoader, to the number of non-zero weighted rows in the table.

  • weighted – If True, the sampler will use sample weights (beyond the exclusion of zero-weighted rows) to ensure that the distribution of the sampled rows matches the distribution of the weights. When weighted is set to True, you are no longer guaranteed that every row in the table will be sampled in a single epoch, even if all weights are equal.

  • shuffle – If False, the valid indices will be returned in sequential order. A value of False is mutually exclusive with the weighted flag.

Returns:

A Sampler based on the weights column of the table.

map(func: Callable[[Any], object]) tlc.core.objects.table.Table#

Add a function to the list of functions to be applied to each sample in the table before it is returned by the __getitem__ method when not doing metrics collection.

Parameters:

func – The function to apply to each sample when not doing metrics collection.

Returns:

The table with the function added to the list of functions to apply to each sample when not doing metrics collection.

map_collect_metrics(func: Callable[[Any], object]) tlc.core.objects.table.Table#

Add a function to the list of functions to be applied to each sample in the table before it is returned by the __getitem__ method when doing metrics collection. If this list is empty, the map functions will be used instead.

Parameters:

func – The function to apply to each sample when doing metrics collection.

Returns:

The table with the function added to the list of functions to apply to each sample when doing metrics collection.

clear_maps() None#

Clear any maps added to the table.

static from_url(url: tlc.core.url.Url | str) tlc.core.objects.table.Table#

Create a table from a url.

Parameters:

url – The url to create the table from

Returns:

A concrete Table subclass

Raises:
static from_names(table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None) tlc.core.objects.table.Table#

Create a table from the names specifying its url.

Parameters:
  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url.

Returns:

The table at the resulting url.

static from_torch_dataset(dataset: torch.utils.data.Dataset, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, all_arrays_are_fixed_size: bool = False, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromTorchDataset#

Create a Table from a Torch Dataset.

Parameters:
  • dataset – The Torch Dataset to create the table from.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • all_arrays_are_fixed_size – Whether all arrays (tuples, lists, etc.) in the dataset are fixed size. This parameter is only used when generating a SampleType from a single sample in the dataset when no structure is provided.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromTorchDataset instance.

static from_pandas(df: pandas.DataFrame, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromPandas#

Create a Table from a Pandas DataFrame.

Parameters:
  • df – The Pandas DataFrame to create the table from.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromPandas instance.

static from_dict(data: typing.Mapping[str, object], structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromPydict#

Create a Table from a dictionary.

Parameters:
  • data – The dictionary to create the table from.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromPydict instance.

static from_csv(csv_file: str | pathlib.Path | tlc.core.url.Url, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromCsv#

Create a Table from a .csv file.

Parameters:
  • csv_file – The url of the .csv file.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromCsv instance.

static from_coco(annotations_file: str | pathlib.Path | tlc.core.url.Url, image_folder: str | pathlib.Path | tlc.core.url.Url | None = None, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromCoco#

Create a Table from a COCO annotations file.

Parameters:
  • annotations_file – The url of the COCO annotations file.

  • image_folder – The url of the folder containing the images referenced in the COCO annotations file. If not provided, the image paths in the annotations file will be assumed to either be absolute OR relative to the annotations file.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromCoco instance.

static from_parquet(parquet_file: str | pathlib.Path | tlc.core.url.Url, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromParquet#

Create a Table from a Parquet file.

Parameters:
  • parquet_file – The url of the Parquet file.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromParquet instance.

static from_yolo(dataset_yaml_file: str | pathlib.Path | tlc.core.url.Url, split: str = 'train', structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromYolo#

Create a Table from a YOLO annotations file.

Parameters:
  • input_url – The url of the YOLO dataset .yaml file.

  • image_folder_url – The url of the folder containing the images referenced in the YOLO annotations file.

  • structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromYolo instance.

static from_hugging_face(path: str, name: str | None = None, split: str = 'train', table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.integration.hugging_face.TableFromHuggingFace#

Create a Table from a Hugging Face Hub dataset, similar to the datasets.load_dataset function.

Parameters:
  • path – Path or name of the dataset to load, same as in datasets.load_dataset.

  • name – Name of the dataset to load, same as in datasets.load_dataset.

  • split – The split to load, same as in datasets.load_dataset.

  • table_name – The name of the table. If not provided, the table_name is set to split.

  • dataset_name – The name of the dataset. If not provided, dataset_name is set to path if name is not provided, or to {path}-{name} if name is provided.

  • project_name – The name of the project. If not provided, project_name is set to hf-{path}.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

Returns:

A TableFromHuggingFace instance.

static from_image_folder(root: str | pathlib.Path | tlc.core.url.Url, image_column_name: str = 'image', label_column_name: str = 'label', include_label_column: bool = True, extensions: str | tuple[str, ...] | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, extra_columns: dict[str, tlc.client.sample_type._SampleTypeStructure] | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.table.Table#

Create a Table from an image folder.

This function can be used to load a folder containing subfolders where each subfolder represents a label, or to recursively load all matching images in a folder structure without labels. It extends the functionality of torchvision.datasets.ImageFolder.

When include_label_column is True, the dataset elements are returned as tuples of a PIL.Image and the integer class label. When include_label_column is False, PIL.Images are returned without labels. In this case, root will be recursively scanned.

Parameters:
  • root – The root directory of the image folder.

  • image_column_name – The name of the column containing the images.

  • label_column_name – The name of the column containing the class labels.

  • include_label_column – Whether to include a column of class labels in the table.

  • extensions – A list of allowed image extensions. If not provided, a default list of image extensions is used.

  • table_name – The name of the table.

  • dataset_name – The name of the dataset.

  • project_name – The name of the project.

  • root_url – The root url of the table.

  • if_exists – What to do if the table already exists at the provided url.

  • add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.

  • description – A description of the table.

  • extra_columns – A dictionary of extra columns to add to the table. The keys are the column names, and the values are the structures of the columns. These can typically be expressed as Schemas, ScalarValues, or SampleTypes.

  • table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}

tlc.core.objects.table.sort_tables_chronologically(tables: list[tlc.core.objects.table.Table], reverse: bool = False) list[tlc.core.objects.table.Table]#

Sort a list of tables chronologically.

Parameters:

tables – A list of tables to sort chronologically.

Returns:

A list of tables sorted chronologically.

tlc.core.objects.table.squash_table(table: tlc.core.objects.table.Table | tlc.core.url.Url, output_url: tlc.core.url.Url) tlc.core.objects.table.Table#

Create a copy of this table where all lineage is squashed.

Example:

table_instance = Table()
... # working
squashed_table = squash_table(table_instance, Url("s3://bucket/path/to/table"))
Parameters:
  • table – The table to squash.

  • output_url – The output url for the squashed table.

Returns:

The squashed table.