tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector#

Collect embeddings from PyTorch models.

See this page for more information on how to use this module.

Module Contents#

Classes#

Class

Description

EmbeddingsMetricsCollector

Metrics collector that prepares hidden layer activations for storage.

API#

class tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector.EmbeddingsMetricsCollector(layers: Sequence[int], reshape_strategy: dict[int, Literal[mean, flatten, avg_pool_1_1, avg_pool_2_2, avg_pool_3_3]] | dict[int, Callable[[torch.Tensor], torch.Tensor]] | None = None)#

Bases: tlc.client.torch.metrics.metrics_collectors.metrics_collector_base.MetricsCollector

Metrics collector that prepares hidden layer activations for storage.

Assumes that the provided predictor_output contains a dictionary of hidden layers, where the keys are the layer indices and the values are the activations of the layer.

Returns metrics batches with a column named “embeddings_{layer}” for each layer provided.

The activations of intermediate modules can have arbitrary shape, and in order to write them to a Table, they must be reshaped to 1D arrays (flattened).

Will ensure all layers are flattened according to reshape_strategy[layer].

Create a new embeddings metrics collector.

Parameters:
  • layers – The layers to collect embeddings from. All layers must be present in the hidden layers returned by the Predictor. In practice this means that the Predictor used during metrics collection must be created with the layers argument set to a superset of the layers provided here.

  • reshape_strategy – The reshaping strategy to use for each layer. Hidden layer activations can have arbitrary shapes, and in order to be written to a Table, they must be reshaped to 1D arrays. Can be either “mean”, which takes the mean across all non-first dimensions (excluding batch dimension), or “flatten”, which flattens all dimensions after the batch dimension, or “avg_pool_1_1”, “avg_pool_2_2”, or “avg_pool_3_3”, which use average pooling and a given output size to ensure consistent shapes. When using the “flatten” strategy, the inputs to the model should have the same shape across batches. Otherwise, use the “mean” strategy, or one of the average pooling strategies instead. It is also possible to provide a callable which performs the flattening. (Default: “mean” for all layers)

compute_metrics(_1: tlc.core.builtins.types.SampleData, predictor_output: tlc.client.torch.metrics.predictor.PredictorOutput) dict[str, tlc.core.builtins.types.MetricData]#

Collect and flatten hidden layer activations from model outputs.

Parameters:

predictor_output – The outputs from a Predictor.

Returns:

A dictionary of column names to batch of flattened embeddings.

property column_schemas: dict[str, tlc.core.schema.Schema]#