tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector#

Collect embeddings from PyTorch models.

Module Contents#

Classes#

Class

Description

EmbeddingsMetricsCollector

Metrics collector that prepares hidden layer activations for storage.

API#

class tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector.EmbeddingsMetricsCollector(layers: list[int], reshape_strategy: dict[int, str] | dict[int, Callable[[torch.Tensor], torch.Tensor]] | None = None)#

Bases: tlc.client.torch.metrics.metrics_collectors.metrics_collector_base.MetricsCollector

Metrics collector that prepares hidden layer activations for storage.

Assumes that the provided predictor_output contains a dictionary of hidden layers, where the keys are the layer indices and the values are the activations of the layer.

Returns metrics batches with a column named “embeddings_{layer}” for each layer provided.

The activations of intermediate modules can have arbitrary shape, and in order to write them to a Table, they must be reshaped to 1D arrays (flattened).

Will ensure all layers are flattened according to reshape_strategy[layer].

Create a new embeddings metrics collector.

Parameters:
  • layers – The layers to collect embeddings from. All layers must be present in the hidden layers returned by the Predictor. In practice this means that the Predictor used during metrics collection must be created with the layers argument set to a superset of the layers provided here.

  • reshape_strategy – The reshaping strategy to use for each layer. Can be either “mean”, which takes the mean across all non-first dimensions (excluding batch dimension), or “flatten”, which flattens all dimensions after the batch dimension. If you use the “flatten” strategy, we assume that your inputs have the same shape across the batches. If they don’t, please use the “mean” strategy instead. Could also be a callable which performs the flattening.

compute_metrics(_1: tlc.core.builtins.types.SampleData, predictor_output: tlc.client.torch.metrics.predictor.PredictorOutput) dict[str, tlc.core.builtins.types.MetricData]#

Collect and flatten hidden layer activations from model outputs.

Parameters:

predictor_output – The outputs from a Predictor.

Returns:

A dictionary of column names to batch of flattened embeddings.

property column_schemas: dict[str, tlc.core.schema.Schema]#