Collect Metrics#
One of the key features of 3LC is the ability to collect fine-grained metrics from input Tables. This guide outlines how to collect and accumulate per-sample metrics to analyze your datasets efficiently.
Overview#
tlc
uses the concept of a metrics collector, which is a mechanism that uses data and model output to produce metrics.
This can be as simple as a function that takes a batch of samples and a batch of predictions and returns a dictionary of
metrics. Information about the schema of the metrics can optionally be provided to the metrics collector to allow
customization of the metrics.
The tlc
Python package includes a set of built-in collectors for common use-cases, but it is also easy to create
custom collectors to fit specific needs.
When collecting per-sample metrics using an ML model, it is often desirable to run just a single inference pass through
each dataset for metric collection. This is achieved by calling the
collect_metrics
function, where you specify the model, the
Table, and the metrics collectors you want to use. This collection interface provides a high-level abstraction that
handles orchestration of calling the model and passing the output to the collection functions.
In order to be able to e.g. disable augmentations which should be active during training but are not wanted during
metrics collection, we provide the alternative map method
Table.map_collect_metrics
, which allows you to map the samples
in the Table before they are passed to the metrics collector.
"""Pseudo-code example of collecting per-sample metrics from a Table and a model."""
import tlc
import torch
table: tlc.Table = ...
model: torch.nn.Module = ...
def metrics_collector(batch, predictor_output):
"""Example of a metrics collector function.
batch: A batch of samples from the Table, optionally mapped according to `map_collect_metrics`
and collated by a torch.DataLoader.
predictor_output: The output of the model for the batch.
"""
return {
"accuracy": [...]
}
# The following command orchestrates a full inference pass through the Table,
# collecting metrics using the provided metrics collector(s) and updating the active Run accordingly.
tlc.collect_metrics(table, metrics_collector, model)
For more details on how to control the data flow and customize the inference and metrics collection process, see classes
Predictor
and
MetricsCollector
.
If collecting metrics in a single pass isn’t necessary for your workflow, or if you want to add metrics to a Run using
a more direct approach, the Run.add_metrics
function provides a straightforward alternative.
Metrics Collectors#
The metrics_collectors module provides a variety of pre-defined metrics collectors, including:
To create your own metrics collectors, you have two options:
Subclass MetricsCollector.
Use the FunctionalMetricsCollector and provide a function with the signature
metrics_fn(sample_batch, prediction_batch)
.
Log On-the-Fly Metrics#
In some cases, you may want to log arbitrary metrics on-the-fly during training or inference. There are two main ways of doing this:
Use
tlc.log()
to log a single dictionary of key-value pairs to the current active Run. This is suitable for simple scalar values such as loss or learning rate. Subsequent calls totlc.log()
with the same dictionary keys will extend metrics over time. Metrics added this way will be automatically charted in the Project page, using an internal time axis, or one of “epoch”, “iteration”, or “step”, if provided.Use
Run.add_metrics()
to log a dictionary of column names to column value lists as a separate metrics table on the Run. This is suitable for metrics that should be viewable as a table, or when overriding the schema of the metrics data is necessary.
"""Pseudo-code example of logging on-the-fly metrics."""
import tlc
run: tlc.Run = ... # Get or create an active Run.
table: tlc.Table = ... # Get or create a Table.
# Log a dictionaries of key-value pairs using "epoch" as the time axis:
tlc.log({"epoch": 0, "loss": 0.5, "accuracy": 0.9})
tlc.log({"epoch": 1, "loss": 0.4, "accuracy": 0.91})
...
tlc.log({"epoch": 10, "loss": 0.3, "accuracy": 0.92})
# Log lists of metrics as a separate metrics table on the Run:
metrics = {"loss": [0.5, 0.4, 0.3], "accuracy": [0.9, 0.91, 0.92]}
run.add_metrics(metrics)
# Log predictions with a custom schema, and associate samples with the input table:
# (Assumes that `table` contains exactly 3 samples.)
metrics = {"predicted_label": [0, 1, 0]}
run.add_metrics(
metrics,
foreign_table_url=table.url,
column_schemas={"predicted_label": tlc.CategoricalLabel("label", ["cat", "dog"])},
)
# Log metrics for individual samples using `example_id` to associate metrics with samples:
metrics = {
"example_id": [0, 0, 1, 1, 2, 2],
"iou": [0.9, 0.8, 0.6, 0.5, 0.3, 0.2],
}
run.add_metrics(
metrics,
foreign_table_url=table.url,
)
Examples#
The Example Notebooks section offers several demonstrations of supported workflows:
MNIST Notebook: Demonstrates a custom metrics collector for classification metrics.
CIFAR10 Notebook: Uses a standard metrics collector for multi-class classification. Also shows usage of the
EmbeddingsMetricsCollector
for capturing hidden layer activations and dimensionality reduction via UMAP.Hugging Face IMDB Notebook: Introduces a custom metrics collection method that works with the HuggingFace
Trainer
class.Hugging Face fine-tuning Notebook: Fine-tuning an Hugging Face model and collecting metrics by using our
TLCTrainer
class.Hugging Face CIFAR 100 Notebook: Utilizes a HuggingFace dataset and computes 2D embeddings.
Detectron2 Balloons: Trains an object detection model and gathers bounding box metrics with detectron2.
Detectron2 COCO128: Executes inference and gathers bounding box metrics using detectron2.
Per Bounding Box Metrics: Describes metric collection for individual bounding boxes in images.
PyTorch Lightning SegFormer: Demonstrates how to use a custom metrics collector for collecting predicted masks from a semantic segmentation model.