Integrating 3LC with Detectron2#

This document describes how to integrate 3LC in projects using Detectron2, a popular computer vision library built on top of PyTorch that provides a modular and extensible framework for object detection and segmentation. 3LC provides several methods to make it easy to integrate 3LC with your existing Detectron2 projects.

For a complete working example of using 3LC with Detectron2, see the 3LC Detectron2 Balloons Example and 3LC Detectron2 COCO128 Example notebooks.

Note

In order to use the Detectron2 integration, the detectron2 package must be installed in the Python environment. See the Detectron2 documentation for instructions on how to install or build an appropriate version, depending on your platform, CUDA version, and PyTorch version. We strongly recommend using a CUDA-enabled version of Detectron2 if you have a compatible GPU available.

Because there is not a single particular pre-built package for Detectron2 that is appropriate for all 3LC users, the tlc package does not list detectron2 as a formal dependency that gets installed when 3LC is installed. Instead, it is up to the end-user to make sure an appropriate version of detectron2 is installed in the Python environment prior to using the 3LC Detectron2 integration. If detectron2is not installed, the tlc.integration.detectron2 module will not be available.

Registering Datasets#

In order to make a dataset available to both 3LC and Detectron2, it needs to be registered in the global DatasetCatalog. The convenience method register_coco_instances does this for datasets in the COCO format.

from tlc.integration.detectron2 import register_coco_instances

register_coco_instances(
    name="my-dataset",
    metadata={},
    json_file="path/to/my/dataset.json",
    image_root="path/to/my/dataset/images",
)

The first time this method is called a Table will be created internally, based on the specific combination of json annotations-file and image folder. Subsequent calls with the same signature will resolve to the latest available revision descending from the initial 3LC Table.

Note

Ensure that image_root / json_file["images"][i]["file_name"] resolves to the full path to a image for each image in the dataset. If the file_name fields in the annotations file are absolute paths, leave image_root empty.

This method will also register the dataset in the global MetadataCatalog and DatasetCatalog. This is required for Detectron2 to be able to find the dataset. The metadata catalog will be populated with the following fields:

Metadata Entry

Description

json_file

Path to the JSON file containing the annotations

image_root

Root directory for all the images in the dataset

initial_tlc_table_url

URL for the initial 3LC table

latest_tlc_table_url

URL for the latest descending revision of the initial table

dataset_size

The total number of samples in the dataset

thing_classes

List of class IDs as specified in the annotations file

thing_dataset_id_to_contiguous_id

Mapping between class IDs and contiguous IDs used for model training

To access the metadata and dataset dicts, use the following code:

from detectron2.data import DatasetCatalog

metadata = MetadataCatalog.get("my-dataset")
dataset_dicts = DatasetCatalog.get("my-dataset")

For more information on the Detectron2 dataset format, see the Use Custom Datasets tutorial and the detectron2.data API documentation.

Training a Model#

3LC supports multiple methods for integrating with the process of training a Detectron2 model.

When writing a custom training loop from scratch, you can collect metrics just like you would for any other PyTorch model, see the metrics collection user guide for details.

When using the Trainer abstraction, you can use hooks to configure the 3LC integration. It is possible to use both the DefaultTrainer and the SimpleTrainer with 3LC. When using the DefaultTrainer, 3LC will have access to the Detectron2 config object through the trainer. When using the SimpleTrainer, the config object must be passed to the hook constructors. 3LC will read the Detectron2 config to determine if the desired metrics collection scheme is valid and to determine derived metrics such as the number of iterations per epoch. If any missing required fields are detected, an error will be raised.

See the Training tutorial for more details on implementing model training.

Collecting Metrics#

It is possible to collect any metrics that can be computed from the model inputs and outputs. See the metrics collection guide for details on how to implement custom metrics collectors.

We also provide the DetectronMetricsCollectionHook which will collect any metrics added to the Detectron2 EventStorage by the trainer or the model.

The following code shows how to setup bounding box and Detectron2 metrics collection when using the Trainer abstraction.

from tlc.integration.detectron2 import MetricsCollectionHook, DetectronMetricsCollectionHook
from tlc import BoundingBoxMetricsCollector

metrics_collector = BoundingBoxMetricsCollector(
    model=trainer.model,
    classes=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_classes,
    label_mapping=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_dataset_id_to_contiguous_id,
    iou_threshold=0.5,
    compute_derived_metrics=True,
)

trainer.register_hooks(
        [
            MetricsCollectionHook(
                dataset_name=TRAIN_DATASET_NAME,
                metrics_collectors=[metrics_collector],
                collection_start_iteration=250,
                collection_frequency=250,
            ),
            MetricsCollectionHook(
                dataset_name=TEST_DATASET_NAME,
                metrics_collectors=[metrics_collector],
                collect_metrics_after_train=True,
            ),
            DetectronMetricsCollectionHook(
                run_url=session.run_url,
                collection_frequency=5,
            ),
        ]
    )

This code will collect metrics for the training dataset every 250 iterations, starting at iteration 250, and for the test dataset after training is complete. In addition, Detectron2 built-in metrics will be collected every 5 iterations.