tlc.core.objects.table
#
The abstract base class for all Table types.
Module Contents#
Classes#
Class |
Description |
---|---|
An immutable access interface to a nested dictionary representing a TableRow. |
|
An immutable access interface to the rows of a Table object |
|
The abstract base class for all Table types. |
Functions#
Function |
Description |
---|---|
Sort a list of tables chronologically. |
|
Create a copy of this table where all lineage is squashed. |
Data#
Data |
Description |
---|---|
Generic type for a row of a table. |
API#
- tlc.core.objects.table.TableRow = None#
Generic type for a row of a table.
- class tlc.core.objects.table.ImmutableDict(*args: Any, **kwargs: Any)#
Bases:
typing.Dict
[str
,object
]An immutable access interface to a nested dictionary representing a TableRow.
This class is used to make access to table rows immutable, and to provide a consistent interface for accessing nested column data.
- class tlc.core.objects.table.TableRows(table: tlc.core.objects.table.Table)#
An immutable access interface to the rows of a Table object
- class tlc.core.objects.table.Table(url: tlc.core.url.Url | None = None, created: str | None = None, description: str | None = None, row_cache_url: tlc.core.url.Url | None = None, row_cache_populated: bool | None = None, override_table_rows_schema: Any = None, init_parameters: Any = None)#
Bases:
tlc.core.object.Object
The abstract base class for all Table types.
A Table is an object with two specific responsibilities:
Creating table rows on demand (Either through the row-based access interface
table_rows
, or through the sample-based access interface provided by__getitem__
).Creating a schema which describes the type of produced rows (through the
rows_schema
property)
Both types of produced data are determined by immutable properties defined by each particular Table type.
ALTERNATIVE INTERFACE/CACHING:
A full representation of all table rows can - for performance reasons - also be retrieved through the
get_rows_as_binary
method.This method will try to retrieve a cached version of the table rows if
row_cache_url
is non-empty ANDrow_cache_populated
isTrue
When this is the case, it is guaranteed that the
schema
property of the table is fully populated, including the nested ‘rows_schema’ property which defines the layout of all table rows.When this cached version is NOT defined, however, get_rows_as_binary() needs to iterate over all rows to produce the data.
If
row_cache_url
is non-empty, the produced binary data will be cached to the specified location. After successful caching, the updated Table object will be written to its backing URL exactly once, now with ‘row_cache_populated’ set to True and with the schema fully updated. Also, therow_count
property is guaranteed to be correct at this time.Whether accessing data from a Table object later refers to this cached version (or produces the data itself) is implementation specific.
STATE MUTABILITY:
As described above, Tables are constrained in how they are allowed to change state:
The data production parameters (“recipe”) of a table are immutable
The persisted JSON representation of a Table (e.g. on disk) can take on three different states, and each state can be written only once:
Bare-bones recipe
Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = False)
Bare-bones recipe + full schema + ‘row_count’ (‘row_cache_populated’ = True)
- Parameters:
url – The URL of the table.
created – The creation time of the table.
description – The description of the table.
row_cache_url – The URL of the row cache.
row_cache_populated – Whether the row cache is populated.
override_table_rows_schema – The schema to override the table rows schema.
init_parameters – The initial parameters of the table.
- copy(table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, rename, overwrite] = 'raise', *, destination_url: tlc.core.url.Url | None = None) tlc.core.objects.table.Table #
Create a copy of this table.
The copy is performed to:
A URL derived from the given table_name, dataset_name, project_name, and root_url if given
destination_url, if given
A generated URL derived from the tables’s URL, if none of the above are given
- Parameters:
table_name – The name of the table to copy to.
dataset_name – The name of the dataset to copy to.
project_name – The name of the project to copy to.
root_url – The root URL to copy to.
if_exists – The behavior to use if the destination URL already exists.
destination_url – The URL to copy the table to.
- Returns:
The copied table.
- ensure_dependent_properties() None #
Ensure that the table set row_count as required to reach fully defined state.
- ensure_data_production_is_ready() None #
A method that ensures that the table is ready to produce data
This method is called before any access to the Table’s data is made. It is used to ensure that the Table has preformed any necessary data production steps. Normally Tables don’t produce data until it is requested, but this method can be called to force data production.
Note that subsequent applications of this method will not change the data, as a Table is immutable.
- collection_mode() Iterator[None] #
Enable metrics-collection mode on the Table.
When collecting metrics mode is enabled, only maps defined by calls to
map_collect_metrics()
are applied to the table rows.
- property row_schema: tlc.core.schema.Schema#
Returns the schema for a single row of this table.
- property rows_schema: tlc.core.schema.Schema#
Returns the schema for all rows of this table.
- property table_rows: tlc.core.objects.table.TableRows#
Access the rows of this table as an immutable mapping.
- set_row_cache_url(row_cache_url: tlc.core.url.Url | str) bool #
Assign a new row_cache_url value.
Will set row_cache_populated to False if the cache file has changed.
- Parameters:
row_cache_url – The new row_cache_url value.
- Returns:
True if the row_cache_url value was changed, False otherwise.
- static transform_value(schema: tlc.core.schema.Schema | None, item: object) object #
Transform a single table value according to the schema.
By default, any numpy arrays are converted to lists.
- Parameters:
schema – The schema corresponding to the column of the value.
item – The value to transform.
- write_to_row_cache(create_url_if_empty: bool = False, overwrite_if_exists: bool = True) None #
Cache the table rows to the row cache Url.
If the table is already cached, or the Url of the Table is an API-Url, this method does nothing.
In the case where self.row_cache_url is empty, a new Url will be created and assigned to self.row_cache_url if create_url_if_empty is True, otherwise a ValueError will be raised.
- Parameters:
create_url_if_empty – Whether to create a new row cache Url if self.row_cache_url is empty.
overwrite_if_exists – Whether to overwrite the row cache file if it already exists.
- get_rows_as_binary(exclude_bulk_data: bool = False) bytes #
Return all rows of the table as a binary Parquet buffer
This method will return the ‘Table-representation’ of the table, which is the most efficient representation, since only references to the input data are stored.
- Parameters:
exclude_bulk_data – Whether to exclude bulk data columns from the serialized rows. This filter only applies to Tables that are fully cached on disk.
- Returns:
The rows of the table as a binary Parquet buffer.
- should_include_schema_in_json(schema: tlc.core.schema.Schema) bool #
Only include the schema in the JSON representation if it is not empty.
- latest(use_new_columns: bool = True, wait_for_rescan: bool = True, timeout: float = 0) tlc.core.objects.table.Table #
Return the most recent version of the table, as indexed by the TableIndexingTable indexing mechanism.
This function retrieves the latest version of this table that has been indexed or exists in the ObjectRegistry. If desired it is possible to wait for the next indexing run to complete by setting wait_for_rescan to True together with a timeout in seconds.
- Example:
table_instance = Table() ... # working latest_table = table_instance.latest()
- Parameters:
use_new_columns – If new columns have been added to the latest revision of the Table, whether to include these values in the sample-view of the Table. Defaults to True.
rescan – Whether to rescan the TableIndexingTable (lineage) before trying to resolve latest revision. Defaults to True.
timeout_s – The timeout in seconds to wait for the next indexing run to complete. Defaults to 0 seconds meaning that indexing can run forever.
- Returns:
The latest version of the table.
- Raises:
ValueError – If the latest version of the table cannot be found in the dataset or if an error occurs when attempting to create an object from the latest Url.
- squash(output_url: tlc.core.url.Url, dataset_name: str | None = None, project_name: str | None = None) tlc.core.objects.table.Table #
Create a copy of this table where all lineage is squashed.
A squashed table is a table where all lineage is merged. This is useful for creating a table that is independent of its parent tables. This function creates a new table with the same rows as the original table, but with no lineage. The new table is written to the output url.
- Example:
table = Table() ... # working squashed_table = table.squash(Url("s3://bucket/path/to/table"), dataset_name="my_dataset_v2")
- Parameters:
table – The table to squash.
output_url – The output url for the squashed table.
dataset_name – The dataset name to use for the squashed table. If not provided, the dataset_name of the original table is used.
project_name – The project name to use for the squashed table. If not provided, the project_name of the original table is used.
- Returns:
The squashed table.
- property pyarrow_schema: pyarrow.Schema | None#
Returns a pyarrow schema for this table
- property bulk_data_url: tlc.core.url.Url#
Return the sample url for this table.
The bulk data url is the url to the folder containing any bulk data for this table. The root of the bulk data url can be overridden by setting the
TLC_BULK_DATA_URL
environment variable.
- static from_url(url: tlc.core.url.Url | str) tlc.core.objects.table.Table #
Create a table from a url.
- Parameters:
url – The url to create the table from
- Returns:
A concrete Table subclass
- Raises:
ValueError – If the url does not point to a table.
FileNotFoundError – If the url cannot be found.
- static from_names(table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None) tlc.core.objects.table.Table #
Create a table from the names specifying its url.
- Parameters:
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url.
- Returns:
The table at the resulting url.
- to_pandas() pandas.DataFrame #
Return a pandas DataFrame for this table.
- Returns:
A pandas DataFrame populated from the rows of this table.
- add_column(column_name: str, values: list[object] | object, schema: tlc.core.schema.Schema | None = None, url: tlc.core.url.Url | None = None) tlc.core.objects.table.Table #
Create a derived table with a column added.
This method creates and returns a new revision of the table with a new column added.
- Parameters:
column_name – The name of the column to add.
values – The values to add to the column. This can be a list of values, or a single value to be added to all rows.
schema – The schema of the column to add. If not provided, the schema will be inferred from the values.
url – The url to write the new table to. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the column added.
- set_value_map(value_path: str, value_map: dict[float, Any], *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table #
Set a value map for a specified numeric value within the schema of the Table.
Sets a value map for a value within the schema of the Table, returning a new table revision with the applied value map.
This method creates and returns a new revision of the table with a overridden value map for a specific numeric value.
Any item in a
Schema
of typeNumericValue
can have a value map. A value map is a mapping from a numeric value to aMapElement
, where aMapElement
contains metadata about a categorical value such as category names and IDs.Partial Value Maps
Value maps may be partial, i.e. they may only contain a mapping for a subset of the possible numeric values. Indeed they can be floating point values, which can be useful for annotating continuous variables with categorical metadata, such as color or label.
For more fine-grained control over value map editing, see
Table.set_value_map_item
andTable.add_value_map_item
, andTable.delete_value_map_item
.- Parameters:
value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value_map – The value map to set on the value. The value will be converted to a a dictionary mapping from floating point values to
MapElement
if it is not already.edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map set.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue
.
- delete_value_map(value_path: str, *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table #
Delete a value map for a specified numeric value within the schema of the Table.
This method creates and returns a new revision of the Table with a deleted value map for a specific numeric value.
- Parameters:
value_path – The path to the value to add the value map to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map deleted.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue
.
- set_value_map_item(value_path: str, value: float | int, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: tlc.core.url.Url | str = '', *, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table #
Update an existing value map item for a specified numeric value within the schema of the Table.
This method creates and returns a new revision of the table with a value map item added to a value in a column.
- Example:
table = Table.from_url("cats-and-dogs") new_table = table.set_value_map_item("label", 0, "cat") # new_table is now a new revision of the table with a updated value map item added to the value 0 in the column assert table.latest() == new_table, "The new table is the latest revision of the table."
To add a new value map item at the next available value in the value map, see
Table.add_value_map_item
.To delete a value map item, see
Table.delete_value_map_item
.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value – The numeric value to add the value map item to. If the value already exists, the value map item will be updated.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue
.
- add_value_map_item(value_path: str, internal_name: str, display_name: str = '', description: str = '', display_color: str = '', url: tlc.core.url.Url | str = '', *, value: float | int | None = None, edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table #
Add a value map item for a specified numeric value within the schema of the Table.
Adds a new value map item to the schema of the Table without overwriting existing items.
If the specified value or internal name already exists in the value map, this method will raise an error to prevent overwriting.
For more details on value maps, refer to the documentation for
Table.set_value_map
.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
internal_name – The internal name of the value map item. This is the primary identifier of the value map item.
display_name – The display name of the value map item.
description – The description of the value map item.
display_color – The display color of the value map item.
url – The url of the value map item.
value – The numeric value to add the value map item to. If not provided, the value will be the next available value in the value map (starting from 0).
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map item added.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue
, or if the value or internal name already exists in the value map.
- delete_value_map_item(value_path: str, *, value: float | int | None = None, internal_name: str = '', edited_table_url: tlc.core.url.Url | str = '') tlc.core.objects.table.Table #
Delete a value map item for a specified numeric value within the schema of the Table.
Deletes a value map item from the schema of the Table, by numeric value or internal name.
For more details on value maps, refer to the documentation for
Table.set_value_map
.- Parameters:
value_path – The path to the value to add the value map item to. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
value – The numeric value of the value map item to delete. If not provided, the value map item will be deleted by internal name.
internal_name – The internal name of the value map item to delete. If not provided, the value map item will be deleted by numeric value.
edited_table_url – The url of the edited table. If not provided, the new table will be located next to the current table.
- Returns:
A new table with the value map item deleted.
- Raises:
ValueError – If the value path does not exist or is not a
NumericValue
, or if the value or internal name does not exist in the value map.
- get_value_map(value_path: str) dict[float, tlc.core.schema.MapElement] | None #
Get the value map for a value path.
- Parameters:
value_path – The path to the value to get the value map for. Can be the name of a column, or a dot-separated path to a sub-value in a composite column.
- Returns:
A value map for the value, or None if the value does not exist or does not have a value map.
- get_value_map_for_column(column_name: str) dict[int, str] | None #
Get a value map for a column.
Convenience method for reading value maps from the schema of a table. For scalar columns, this method will return the value map for the column, if it exists. For composite columns, this method will return the first value map found in the column schema. This will usually be the desired value map, but for a composite column with multiple sub-schemas with value maps, this method will return the first one found while traversing the schema. For full control, the table’s
schema
property should be traversed manually, i.etable.schema.values["rows"].values["column"]...
.A value map must be a mapping from int to be considered a valid candidate. The string value of the value map will be the either the
internal_name
,display_name
,url
, ordescription
of theMapElement
found in the schema, depending on what is present.- Example:
from tlc.core.objects.tables.from_url import TableFromCoco table = TableFromCoco(input_url="./annotations.json", image_folder_url="./images") value_map = table.get_value_map_for_column("bbs") # value_map = {1: "cat", 2: "dog", 3: "mouse"}
- Parameters:
column_name – The name of the column to get the value map for (Note: this argument does currently not support dot-separated paths).
- Returns:
A value map for the column, or None if the column does not have a value map.
- Raises:
KeyError – If the column does not exist.
- export(output_url: tlc.core.url.Url | str | pathlib.Path, format: str | None = None, weight_threshold: float = 0.0, **kwargs: object) None #
Export this table to the given output url.
- Parameters:
output_url – The output url to export to.
format – The format to export to. If not provided, the format will be inferred from the table and the output url.
weight_threshold – The weight threshold to use for exporting. If the table has a weights column, rows with a weight below this threshold will be excluded from the export.
kwargs – Additional arguments to pass to the exporter. Which arguments are valid depends on the format. See the documentation for the subclasses of Exporter for more information.
- is_descendant_of(other: tlc.core.objects.table.Table) bool #
Return True if this table is a descendent of the provided table.
- Parameters:
other – The table to check if this table is a descendant of.
- Returns:
True if this table is a descendant of the provided table, False otherwise.
- get_foreign_table_url(column: str = FOREIGN_TABLE_ID) tlc.core.url.Url | None #
Return the input table URL referenced by this table.
This method is intended for tables that reference a single input table. Typically, this would be a metrics table of per-example metrics collected using another table.
If the table contains a column named ‘input_table_id’ with value map indicating it references a input table by Url, this method returns the Url of that input table.
- Parameters:
column – The name of the column to check for a foreign key.
- Returns:
The URL of the foreign table, or None if no input table is found.
- property weights_column_name: str | None#
Return the name of the column containing the weights for this table, or None if no such column exists.
- create_sampler(exclude_zero_weights: bool = True, weighted: bool = True) torch.utils.data.sampler.Sampler[int] #
Returns a sampler based on the weights column of the table. The type and behavior of the returned Sampler also depends on the values of the argument flags. The sampler is always shuffled.
- Parameters:
exclude_zero_weight – If True, rows with a weight of zero will be excluded from the sampler. This is useful for adjusting the length of the sampler, and thus the length of an epoch when using a PyTorch Dataloader, to the number of non-zero weighted rows in the table.
weighted – If True, the sampler will use sample weights (beyond the exclusion of zero-weighted rows) to ensure that the distribution of the sampled rows matches the distribution of the weights. When
weighted
is set to True, you are no longer guaranteed that every row in the table will be sampled in a single epoch, even if all weights are equal.
- Returns:
A Sampler based on the weights column of the table.
- map(func: Callable[[Any], object]) tlc.core.objects.table.Table #
Add a function to the list of functions to be applied to each sample in the table before it is returned by the
__getitem__
method when not doing metrics collection.- Parameters:
func – The function to apply to each sample when not doing metrics collection.
- Returns:
The table with the function added to the list of functions to apply to each sample when not doing metrics collection.
- map_collect_metrics(func: Callable[[Any], object]) tlc.core.objects.table.Table #
Add a function to the list of functions to be applied to each sample in the table before it is returned by the
__getitem__
method when doing metrics collection. If this list is empty, themap
functions will be used instead.- Parameters:
func – The function to apply to each sample when doing metrics collection.
- Returns:
The table with the function added to the list of functions to apply to each sample when doing metrics collection.
- static from_torch_dataset(dataset: torch.utils.data.Dataset, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, all_arrays_are_fixed_size: bool = False, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromTorchDataset #
Create a Table from a Torch Dataset.
- Parameters:
dataset – The Torch Dataset to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
all_arrays_are_fixed_size – Whether all arrays (tuples, lists, etc.) in the dataset are fixed size. This parameter is only used when generating a SampleType from a single sample in the dataset when no
structure
is provided.description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromTorchDataset instance.
- static from_pandas(df: pandas.DataFrame, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromPandas #
Create a Table from a Pandas DataFrame.
- Parameters:
df – The Pandas DataFrame to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromPandas instance.
- static from_dict(data: typing.Mapping[str, dict[str, object] | list[object]], structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_python_object.TableFromPydict #
Create a Table from a dictionary.
- Parameters:
data – The dictionary to create the table from.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromPydict instance.
- static from_csv(csv_file: str | pathlib.Path | tlc.core.url.Url, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromCsv #
Create a Table from a .csv file.
- Parameters:
csv_file – The url of the .csv file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromCsv instance.
- static from_coco(annotations_file: str | pathlib.Path | tlc.core.url.Url, image_folder: str | pathlib.Path | tlc.core.url.Url | None = None, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromCoco #
Create a Table from a COCO annotations file.
- Parameters:
annotations_file – The url of the COCO annotations file.
image_folder – The url of the folder containing the images referenced in the COCO annotations file. If not provided, the image paths in the annotations file will be assumed to either be absolute OR relative to the annotations file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromCoco instance.
- static from_parquet(parquet_file: str | pathlib.Path | tlc.core.url.Url, structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromParquet #
Create a Table from a Parquet file.
- Parameters:
parquet_file – The url of the Parquet file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromParquet instance.
- static from_yolo(dataset_yaml_file: str | pathlib.Path | tlc.core.url.Url, split: str = 'train', structure: tlc.client.sample_type._SampleTypeStructure | None = None, table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.core.objects.tables.from_url.TableFromYolo #
Create a Table from a YOLO annotations file.
- Parameters:
input_url – The url of the YOLO dataset .yaml file.
image_folder_url – The url of the folder containing the images referenced in the YOLO annotations file.
structure – The structure of a single sample in the table. This is used to infer the schema of the table, and perform any necessary conversions between the row representation and the sample representation of the data. If not provided, the structure will be inferred from the first sample in the table.
table_name – The name of the table.
dataset_name – The name of the dataset.
project_name – The name of the project.
root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromYolo instance.
- static from_hugging_face(path: str, name: str | None = None, split: str = 'train', table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root_url: tlc.core.url.Url | str | None = None, if_exists: typing.Literal[raise, reuse, rename, overwrite] = 'reuse', add_weight_column: bool = True, description: str | None = None, *, table_url: tlc.core.url.Url | pathlib.Path | str | None = None) tlc.integration.hugging_face.TableFromHuggingFace #
Create a Table from a Hugging Face Hub dataset, similar to the
datasets.load_dataset
function.- Parameters:
path – Path or name of the dataset to load, same as in
datasets.load_dataset
.name – Name of the dataset to load, same as in
datasets.load_dataset
.split – The split to load, same as in
datasets.load_dataset
.table_name – The name of the table. If not provided, the
table_name
is set tosplit
.dataset_name – The name of the dataset. If not provided,
dataset_name
is set topath
ifname
is not provided, or to{path}-{name}
ifname
is provided.project_name – The name of the project. If not provided,
project_name
is set tohf-{path}
.root_url – The root url of the table.
if_exists – What to do if the table already exists at the provided url.
add_weight_column – Whether to add a column of sampling weights to the table, all initialized to 1.0.
description – A description of the table.
table_url – A custom Url for the table, mutually exclusive with {root_url, project_name, dataset_name, table_name}
- Returns:
A TableFromHuggingFace instance.
- tlc.core.objects.table.sort_tables_chronologically(tables: list[tlc.core.objects.table.Table], reverse: bool = False) list[tlc.core.objects.table.Table] #
Sort a list of tables chronologically.
- Parameters:
tables – A list of tables to sort chronologically.
- Returns:
A list of tables sorted chronologically.
- tlc.core.objects.table.squash_table(table: tlc.core.objects.table.Table | tlc.core.url.Url, output_url: tlc.core.url.Url) tlc.core.objects.table.Table #
Create a copy of this table where all lineage is squashed.
- Example:
table_instance = Table() ... # working squashed_table = squash_table(table_instance, Url("s3://bucket/path/to/table"))
- Parameters:
table – The table to squash.
output_url – The output url for the squashed table.
- Returns:
The squashed table.