tlc.client.reduce.reduce#

Functions for dimensionality reduction of embeddings.

Module Contents#

Functions#

Function

Description

create_reducer

Create a reduction method object.

reduce_embeddings

Reduce all embeddings columns in the input table(s).

reduce_embeddings_multiple_parameters

Reduce embeddings using multiple reducer parameter sets.

reduce_embeddings_per_dataset

Reduce embeddings for a stream of tables.

reduce_embeddings_by_foreign_table_url

Reduce embeddings using a single reducer across all tables.

reduce_embeddings_with_producer_consumer

Reduce embeddings for a producer table and a list of consumer tables.

API#

tlc.client.reduce.reduce.create_reducer(method: str, reducer_args: tlc.client.reduce.reduction_method.ReducerArgs | None = None) tlc.client.reduce.reduction_method.ReductionMethod#

Create a reduction method object.

Parameters:
  • method – The reduction method to use.

  • reducer_args – Arguments specific to the reduction method, e.g. UMapTableArgs.

Returns:

A reduction method object.

tlc.client.reduce.reduce.reduce_embeddings(tables: tlc.core.objects.table.Table | list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) tlc.core.objects.table.Table | dict[tlc.core.url.Url, tlc.core.url.Url]#

Reduce all embeddings columns in the input table(s).

The reduction method is fit and applied to each table independently.

Parameters:
  • tables – A Table (or a list of tables) to reduce.

  • method – The reduction method to use.

  • delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.

  • kwargs – Arguments specific to the reduction method, see e.g. UMapTableArgs for valid keyword arguments.

Returns:

A single reduced table if the input is a single table, or a dictionary mapping the URLs of the input tables to the URLs of the reduced tables.

Warning

The tables argument will be renamed to table in the next major release, and passing a list of tables will be deprecated. :::

Warning

Enabling the delete_source_tables option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.

Note

The delete_source_tables option should not be enabled if further dimensionality reductions on the same input tables are anticipated. :::

tlc.client.reduce.reduce.reduce_embeddings_multiple_parameters(table: tlc.core.objects.table.Table | tlc.core.url.Url, method: str = 'umap', delete_source_tables: bool = False, parameter_sets: list[dict[str, Any]] = []) tlc.core.url.Url#

Reduce embeddings using multiple reducer parameter sets.

This function will add a dimensionality reduced column for each parameter set in parameter_sets.

tlc.client.reduce.reduce.reduce_embeddings_per_dataset(tables: list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url]#

Reduce embeddings for a stream of tables.

Will fit a reduction method on the most recent table from each stream, and apply the reduction to all earlier tables in the stream. A stream is defined as a sequence of tables with the same example table ID, which means they originate from the same dataset.

Tables with no example table ID will be ignored.

Parameters:
  • tables – A list of tables to reduce.

  • method – The reduction method to use.

  • delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.

  • kwargs – Arguments specific to the reduction method, see e.g. UMapTableArgs for valid keyword arguments.

Returns:

A dictionary mapping the URLs of the input tables to the URLs of the reduced tables.

Warning

Enabling the delete_source_tables option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.

Note

The delete_source_tables option should not be enabled if further dimensionality reductions on the same input tables are anticipated.

tlc.client.reduce.reduce.reduce_embeddings_by_foreign_table_url(tables: list[tlc.core.objects.table.Table], foreign_table_url: tlc.core.url.Url, method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url]#

Reduce embeddings using a single reducer across all tables.

The reduction method is fit on the most recently written table in the stream of tables defined by foreign_table_url, and applied on all other tables.

For example, this function can be used to train a UMAP model on the embeddings collected from the validation set during the final epoch, and then apply that model to the embeddings collected from the training set and validation set during all epochs.

Parameters:
  • tables – A list of tables to reduce.

  • method – The reduction method to use.

  • foreign_table_url – Identifies which stream of metrics tables to use for fitting a reduction model. Must be a absolute URL after expanding aliases.

  • delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.

  • kwargs – Arguments specific to the reduction method, see e.g. UMapTableArgs for valid keyword arguments.

Returns:

A dictionary mapping the URLs of the input tables to the URLs of the reduced tables.

Raises:

ValueError – If foreign_table_url does not identify a stream of tables.

Warning

Enabling the delete_source_tables option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.

Note

The delete_source_tables option should not be enabled if further dimensionality reductions on the same input tables are anticipated.

tlc.client.reduce.reduce.reduce_embeddings_with_producer_consumer(producer: tlc.core.objects.table.Table, consumers: list[tlc.core.objects.table.Table], method: str = 'umap', delete_source_tables: bool = False, **kwargs: Any) dict[tlc.core.url.Url, tlc.core.url.Url]#

Reduce embeddings for a producer table and a list of consumer tables.

The reduction method is fit on the producer table, and then applied to the consumer tables.

Parameters:
  • producer – The table to fit the reduction method on.

  • consumers – The tables to apply the reduction method to.

  • method – The reduction method to use.

  • delete_source_tables – Specifies whether to delete the source tables after performing the reduction. Enabling this option can help minimize disk-space usage.

  • kwargs – Arguments specific to the reduction method, see e.g. UMapTableArgs for valid keyword arguments.

Returns:

A dictionary mapping the URLs of the consumer tables to the URLs of the reduced tables.

Warning

Enabling the delete_source_tables option will disrupt the lineage of the reduced tables. If the cache files for these reduced tables are subsequently deleted, they will be irrecoverable.

Note

The delete_source_tables option should not be enabled if further dimensionality reductions on the same input tables are anticipated.