tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector
#
Collect embeddings from PyTorch models.
Module Contents#
Classes#
Class |
Description |
---|---|
Metrics collector that prepares hidden layer activations for storage. |
API#
- class tlc.client.torch.metrics.metrics_collectors.embeddings_metrics_collector.EmbeddingsMetricsCollector(layers: list[int], reshape_strategy: dict[int, str] | dict[int, Callable[[torch.Tensor], torch.Tensor]] | None = None)#
Bases:
tlc.client.torch.metrics.metrics_collectors.metrics_collector_base.MetricsCollector
Metrics collector that prepares hidden layer activations for storage.
Assumes that the provided
predictor_output
contains a dictionary of hidden layers, where the keys are the layer indices and the values are the activations of the layer.Returns metrics batches with a column named “embeddings_{layer}” for each layer provided.
The activations of intermediate modules can have arbitrary shape, and in order to write them to a Table, they must be reshaped to 1D arrays (flattened).
Will ensure all layers are flattened according to
reshape_strategy[layer]
.Create a new embeddings metrics collector.
- Parameters:
layers – The layers to collect embeddings from. All layers must be present in the hidden layers returned by the
Predictor
. In practice this means that thePredictor
used during metrics collection must be created with thelayers
argument set to a superset of the layers provided here.reshape_strategy – The reshaping strategy to use for each layer. Can be either “mean”, which takes the mean across all non-first dimensions (excluding batch dimension), or “flatten”, which flattens all dimensions after the batch dimension. If you use the “flatten” strategy, we assume that your inputs have the same shape across the batches. If they don’t, please use the “mean” strategy instead. Could also be a callable which performs the flattening.
- compute_metrics(_1: tlc.core.builtins.types.SampleData, predictor_output: tlc.client.torch.metrics.predictor.PredictorOutput) dict[str, tlc.core.builtins.types.MetricData] #
Collect and flatten hidden layer activations from model outputs.
- Parameters:
predictor_output – The outputs from a
Predictor
.- Returns:
A dictionary of column names to batch of flattened embeddings.
- property column_schemas: dict[str, tlc.core.schema.Schema]#