Collect Metrics#

One of the key features of 3LC is the ability to collect fine-grained metrics from input Tables. This guide outlines how to collect and accumulate per-sample metrics to analyze your datasets efficiently.


tlc uses the concept of a metrics collector, which is a mechanism that uses data and model output to produce metrics. This can be as simple as a function that takes a batch of samples and a batch of predictions and returns a dictionary of metrics. Information about the schema of the metrics can optionally be provided to the metrics collector to allow customization of the metrics.

The tlc Python package includes a set of built-in collectors for common use-cases, but it is also easy to create custom collectors to fit specific needs.

When collecting per-sample metrics using an ML model, it is often desirable to run just a single inference pass through each dataset for metric collection. This is achieved by calling the collect_metrics function, where you specify the model, the Table, and the metrics collectors you want to use. This collection interface provides a high-level abstraction that handles orchestration of calling the model and passing the output to the collection functions.

In order to be able to e.g. disable augmentations which should be active during training but are not wanted during metrics collection, we provide the alternative map method Table.map_collect_metrics, which allows you to map the samples in the Table before they are passed to the metrics collector.

"""Pseudo-code example of collecting per-sample metrics from a Table and a model."""
import tlc
import torch

table: tlc.Table = ...
model: torch.nn.Module = ...

def metrics_collector(batch, predictor_output):
    """Example of a metrics collector function.
    batch: A batch of samples from the Table, optionally mapped according to `map_collect_metrics`
        and collated by a torch.DataLoader.
    predictor_output: The output of the model for the batch.
    return {
        "accuracy": [...]

# The following command orchestrates a full inference pass through the Table,
# collecting metrics using the provided metrics collector(s) and updating the active Run accordingly.
tlc.collect_metrics(table, metrics_collector, model)

For more details on how to control the data flow and customize the inference and metrics collection process, see classes Predictor and MetricsCollector.

If collecting metrics in a single pass isn’t necessary for your workflow, or if you want to add metrics to a Run using a more direct approach, the Run.add_metrics_data function provides a straightforward alternative.

Metrics Collectors#

The metrics_collectors module provides a variety of pre-defined metrics collectors, including:

To create your own metrics collectors, you have two options:

Log On-the-Fly Metrics#

In some cases, you may want to log arbitrary metrics on-the-fly during training or inference. There are two main ways of doing this:

  1. Use tlc.log() to log a single dictionary of key-value pairs to the current active Run. This is suitable for simple scalar values such as loss or learning rate. Subsequent calls to tlc.log() with the same dictionary keys will extend metrics over time. Metrics added this way will be automatically charted in the Project page, using an internal time axis, or one of “epoch”, “iteration”, or “step”, if provided.

  2. Use Run.add_metrics_data() to log a dictionary of column names to column value lists as a separate metrics table on the Run. This is suitable for metrics that should be viewable as a table, or when overriding the schema of the metrics data is necessary.

"""Pseudo-code example of logging on-the-fly metrics."""
import tlc

run: tlc.Run = ... # Get or create an active Run.
table: tlc.Table = ... # Get or create a Table.

# Log a dictionaries of key-value pairs using "epoch" as the time axis:
tlc.log({"epoch": 0, "loss": 0.5, "accuracy": 0.9})
tlc.log({"epoch": 1, "loss": 0.4, "accuracy": 0.91})
tlc.log({"epoch": 10, "loss": 0.3, "accuracy": 0.92})

# Log lists of metrics as a separate metrics table on the Run:
metrics = {"loss": [0.5, 0.4, 0.3], "accuracy": [0.9, 0.91, 0.92]}

# Log predictions with a custom schema, and associate samples with the input table:
# (Assumes that `table` contains exactly 3 samples.)
metrics = {"predicted_label": [0, 1, 0]}
    override_column_schemas={"predicted_label": tlc.CategoricalLabel("label", ["cat", "dog"])},

# Log metrics for individual samples using `example_id` to associate metrics with samples:
metrics = {
    "example_id": [0, 0, 1, 1, 2, 2],
    "iou": [0.9, 0.8, 0.6, 0.5, 0.3, 0.2],



The Example Notebooks section offers several demonstrations of supported workflows:

  • MNIST Notebook: Demonstrates a custom metrics collector for classification metrics.

  • CIFAR10 Notebook: Uses a standard metrics collector for multi-class classification. Also shows usage of the EmbeddingsMetricsCollector for capturing hidden layer activations and dimensionality reduction via UMAP.

  • Hugging Face IMDB Notebook: Introduces a custom metrics collection method that works with the HuggingFace Trainer class.

  • Hugging Face fine-tuning Notebook: Fine-tuning an Hugging Face model and collecting metrics by using our TLCTrainer class.

  • Hugging Face CIFAR 100 Notebook: Utilizes a HuggingFace dataset and computes 2D embeddings.

  • Detectron2 Balloons: Trains an object detection model and gathers bounding box metrics with detectron2.

  • Detectron2 COCO128: Executes inference and gathers bounding box metrics using detectron2.

  • Per Bounding Box Metrics: Describes metric collection for individual bounding boxes in images.

  • Per Bounding Box Embeddings: Covers embedding collection for bounding boxes and uses UMAP for dimensionality reduction.

  • Bounding Box Classifier: Details an advanced workflow where a model is trained to classify bounding boxes in an image, which can be used in conjunction with an object detection model to find bounding boxes of special interest.

  • PyTorch Lightning SegFormer: Demonstrates how to use a custom metrics collector for collecting predicted masks from a semantic segmentation model.