Collect Metrics#

One of the key features of 3LC is the ability to collect fine-grained metrics from input Tables. This guide outlines how to collect and accumulate per-sample metrics to analyze your datasets efficiently.

Overview#

tlc uses the concept of a metrics collector, which is a mechanism that defines how samples and predictions are combined to produce metrics. In practice, it is as simple as a function that takes a batch of samples and a batch of predictions and returns a dictionary of metrics. Schema information can optionally be provided to the metrics collector to allow customization of the metrics.

The tlc Python package includes a set of built-in collectors for common use-cases, but it is also easy to create custom collectors to fit specific needs.

When collecting per-sample metrics using an ML model, it is often desirable to run just a single inference pass through each dataset for metric collection. This is achieved by calling the collect_metrics function, where you specify the model, the Table, and the metrics collectors you wish to use. This collection interface provides a high-level abstraction that handles the orchestration of the entire process.

In order to be able to e.g. disable augmentations which should be active during training but are not wanted during metrics collection, we provide the alternative map method Table.map_collect_metrics, which allows you to map the samples in the Table before they are passed to the metrics collector.

"""Pseudo-code example of collecting per-sample metrics from a Table and a model."""
import tlc
import torch

table: tlc.Table = ...
model: torch.nn.Module = ...

def metrics_collector(batch, predictor_output):
    """Example of a metrics collector function.
    
    batch: A batch of samples from the Table, optionally mapped according to `map_collect_metrics`
        and collated by a torch.DataLoader.
    predictor_output: The output of the model for the batch.
    """
    return {
        "accuracy": [...]
    }

# The following command orchestrates a full inference pass through the Table,
# collecting metrics using the provided metrics collector(s) and updating the active Run accordingly.
tlc.collect_metrics(table, metrics_collector, model)

For more details on how to control the data flow and customize the inference and metrics collection process, see classes Predictor and MetricsCollector.

If collecting metrics in a single pass isn’t necessary for your workflow, or if you want to add metrics to a Run using a more direct approach, the Run.add_metrics_data function provides a straightforward alternative.

Metrics Collectors#

The metrics_collectors module provides a variety of pre-defined metrics collectors, including:

To create your own metrics collectors, you have two options:

Examples#

The Example Notebooks section offers several demonstrations of supported workflows:

  • MNIST Notebook: Demonstrates a custom metrics collector for classification metrics.

  • CIFAR10 Notebook: Uses a standard metrics collector for multi-class classification. Also shows usage of the EmbeddingsMetricsCollector for capturing hidden layer activations and dimensionality reduction via UMAP.

  • Hugging Face IMDB Notebook: Introduces a custom metrics collection method that works with the HuggingFace Trainer class.

  • Hugging Face fine-tuning Notebook: Fine-tuning an Hugging Face model and collecting metrics by using our TLCTrainer class.

  • Hugging Face CIFAR 100 Notebook: Utilizes a HuggingFace dataset and computes 2D embeddings.

  • Detectron2 Balloons: Trains an object detection model and gathers bounding box metrics with detectron2.

  • Detectron2 COCO128: Executes inference and gathers bounding box metrics using detectron2.

  • Per Bounding Box Metrics: Describes metric collection for individual bounding boxes in images.

  • Per Bounding Box Embeddings: Covers embedding collection for bounding boxes and uses UMAP for dimensionality reduction.

  • Bounding Box Classifier: Details an advanced workflow where a model is trained to classify bounding boxes in an image, which can be used in conjunction with an object detection model to find bounding boxes of special interest.

  • PyTorch Lightning SegFormer: Demonstrates how to use a custom metrics collector for collecting predicted masks from a semantic segmentation model.