tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector#

Extract embeddings from PyTorch models.

Module Contents#

Classes#

Class

Description

EmbeddingsMetricsCollector

Metrics collector that prepares NN-embeddings for storage.

API#

class tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector.EmbeddingsMetricsCollector(model: torch.nn.Module, layers: list[int], reshape_strategy: dict[int, str] | dict[int, Callable[[torch.Tensor], torch.Tensor]] | None = None)#

Bases: tlc.client.torch.metrics.metrics_collectors.metrics_collector_base.MetricsCollector

Metrics collector that prepares NN-embeddings for storage.

Returns metrics batches with a column named “embeddings_{layer}” for each layer in the model. The outputs of intermediate model modules could have arbitrary shapes, but in order to write them to a table, they must be reshaped to 1D arrays (flattened).

Will ensure all layers are flattened according to reshape_strategy[layer].

Create a new embeddings metrics collector.

Parameters:
  • model – The model to collect embeddings from.

  • layers – The layers to collect embeddings from.

  • reshape_strategy – The reshaping strategy to use for each layer. Can be either “mean”, which takes the mean across all non-first dimensions (excluding batch dimension), or “flatten”, which flattens all dimensions after the batch dimension. Could also be a callable which performs the flattening.

compute_metrics(_1: tlc.core.builtins.types.SampleData, _2: tlc.core.builtins.types.SampleData | None = None, hook_outputs: dict[int, torch.Tensor] | None = None) dict[str, tlc.core.builtins.types.MetricData]#

Collect large NN-embeddings from pytorch models.

Parameters:

hook_outputs – The outputs from the model hooks.

Returns:

A dictionary of column names to batch of flattened embeddings.

property column_schemas: dict[str, tlc.core.schema.Schema]#