tlc.client.reduce.reduce
#
Functions for dimensionality reduction of embeddings.
Module Contents#
Functions#
Function |
Description |
---|---|
Create a reduction method object. |
|
Reduce embeddings using multiple reducer parameter sets. |
|
Reduce all embeddings columns in the input tables. |
|
Reduce embeddings for a stream of tables. |
|
Reduce embeddings using a single reducer across all tables. |
|
Reduce embeddings for a producer table and a list of consumer tables. |
API#
- tlc.client.reduce.reduce.create_reducer(method: str, reducer_args: tlc.client.reduce.reduction_method.ReducerArgs | None = None) tlc.client.reduce.reduction_method.ReductionMethod #
Create a reduction method object.
- Parameters:
method – The reduction method to use.
reducer_args – Arguments specific to the reduction method, e.g.
UMapTableArgs
.
- Returns:
A reduction method object.
- tlc.client.reduce.reduce.reduce_embeddings_multiple_parameters(table: tlc.core.objects.table.Table, method: str = 'umap', delete_source_tables: bool = False, parameter_sets: list[dict[str, Any]] = []) tlc.core.url.Url #
Reduce embeddings using multiple reducer parameter sets.
This function will add a dimensionality reduced column for each parameter set in
parameter_sets
.
- tlc.client.reduce.reduce.reduce_embeddings(tables: list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url] #
Reduce all embeddings columns in the input tables.
The reduction method is fit and applied to each table independently.
- Parameters:
tables – A list of tables to reduce.
method – The reduction method to use.
delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.
kwargs – Arguments specific to the reduction method, see e.g.
UMapTableArgs
for valid keyword arguments.
- Returns:
A dictionary mapping the URLs of the input tables to the URLs of the reduced tables.
Warning
Enabling the
delete_source_tables
option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.Note
The
delete_source_tables
option should not be enabled if further dimensionality reductions on the same input tables are anticipated.
- tlc.client.reduce.reduce.reduce_embeddings_per_dataset(tables: list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url] #
Reduce embeddings for a stream of tables.
Will fit a reduction method on the most recent table from each stream, and apply the reduction to all earlier tables in the stream. A stream is defined as a sequence of tables with the same example table ID, which means they originate from the same dataset.
Tables with no example table ID will be ignored.
- Parameters:
tables – A list of tables to reduce.
method – The reduction method to use.
delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.
kwargs – Arguments specific to the reduction method, see e.g.
UMapTableArgs
for valid keyword arguments.
- Returns:
A dictionary mapping the URLs of the input tables to the URLs of the reduced tables.
Warning
Enabling the
delete_source_tables
option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.Note
The
delete_source_tables
option should not be enabled if further dimensionality reductions on the same input tables are anticipated.
- tlc.client.reduce.reduce.reduce_embeddings_by_example_table_url(tables: list[tlc.core.objects.table.Table], example_table_url: tlc.core.url.Url, method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url] #
- tlc.client.reduce.reduce.reduce_embeddings_by_foreign_table_url(tables: list[tlc.core.objects.table.Table], foreign_table_url: tlc.core.url.Url, method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url] #
Reduce embeddings using a single reducer across all tables.
The reduction method is fit on the most recently written table in the stream of tables defined by
foreign_table_url
, and applied on all other tables.For example, this function can be used to train a UMAP model on the embeddings collected from the validation set during the final epoch, and then apply that model to the embeddings collected from the training set and validation set during all epochs.
- Parameters:
tables – A list of tables to reduce.
method – The reduction method to use.
foreign_table_url – Identifies which stream of metrics tables to use for fitting a reduction model. Must be a absolute URL after expanding aliases.
delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.
kwargs – Arguments specific to the reduction method, see e.g.
UMapTableArgs
for valid keyword arguments.
- Returns:
A dictionary mapping the URLs of the input tables to the URLs of the reduced tables.
- Raises:
ValueError – If
foreign_table_url
does not identify a stream of tables.
Warning
Enabling the
delete_source_tables
option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.Note
The
delete_source_tables
option should not be enabled if further dimensionality reductions on the same input tables are anticipated.
- tlc.client.reduce.reduce.reduce_embeddings_with_producer_consumer(producer: tlc.core.objects.table.Table, consumers: list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url] #
Reduce embeddings for a producer table and a list of consumer tables.
The reduction method is fit on the producer table, and then applied to the consumer tables.
- Parameters:
producer – The table to fit the reduction method on.
consumers – The tables to apply the reduction method to.
method – The reduction method to use.
delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.
kwargs – Arguments specific to the reduction method, see e.g.
UMapTableArgs
for valid keyword arguments.
- Returns:
A dictionary mapping the URLs of the consumer tables to the URLs of the reduced tables.
Warning
Enabling the
delete_source_tables
option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.Note
The
delete_source_tables
option should not be enabled if further dimensionality reductions on the same input tables are anticipated.