Integrating 3LC with Detectron2#

This document describes how to integrate 3LC in projects using Detectron2, a popular computer vision library built on top of PyTorch that provides a modular and extensible framework for object detection and segmentation. 3LC provides several methods to make it easy to integrate 3LC with your existing Detectron2 projects.

For a complete working example of using 3LC with Detectron2, see the 3LC Detectron2 Balloons Example and 3LC Detectron2 COCO128 Example notebooks.

Note

In order to use the Detectron2 integration, the detectron2 package must be installed in the Python environment. See the Detectron2 documentation for instructions on how to install or build an appropriate version, depending on your platform, CUDA version, and PyTorch version. We strongly recommend using a CUDA-enabled version of Detectron2 if you have a compatible GPU available.

Because there is not a single particular pre-built package for Detectron2 that is appropriate for all 3LC users, the tlc package does not list detectron2 as a formal dependency that gets installed when 3LC is installed. Instead, it is up to the end-user to make sure an appropriate version of detectron2 is installed in the Python environment prior to using the 3LC Detectron2 integration. If detectron2is not installed, the tlc.integration.detectron2 module will not be available.

Registering Datasets#

In order to make a dataset available to both 3LC and Detectron2, it needs to be registered in the global MetadataCatalog. The convenience method register_coco_instances does this for datasets in the COCO format.

from tlc.integration.detectron2 import register_coco_instances

register_coco_instances(
    name="my-dataset",
    metadata={},
    json_file="path/to/my/dataset.json",
    image_root="path/to/my/dataset/images",
)

The first time this method is called a Table will be created internally, based on the specific combination of json annotations-file and image folder. Subsequent calls with the same signature will resolve to the latest available revision descending from the initial 3LC Table.

Note

Ensure that image_root / json_file["images"][i]["file_name"] resolves to the full path to a image for each image in the dataset. If the file_name fields in the annotations file are absolute paths, leave image_root empty.

This method will also register the dataset in the global MetadataCatalog and DatasetCatalog. This is required for Detectron2 to be able to find the dataset. The metadata catalog will be populated with the following fields (items added by 3LC in bold):

Metadata Entry	Description
json_file	Path to the JSON file containing the annotations
image_root	Root directory for all the images in the dataset
thing_classes	List of class IDs as specified in the annotations file
thing_dataset_id_to_contiguous_id	Mapping between class IDs and contiguous IDs used for model training
initial_tlc_table_url	URL of the initial 3LC Table
latest_tlc_table_url	URL of the latest revision of the initial table
dataset_size	The total number of samples in the dataset

To access the metadata and dataset dicts, use the following code:

from detectron2.data import DatasetCatalog

metadata = MetadataCatalog.get("my-dataset")
dataset_dicts = DatasetCatalog.get("my-dataset")

For more information on the Detectron2 dataset format, see the Use Custom Datasets tutorial and the detectron2.data API documentation.

Training a Model#

3LC supports multiple methods for integrating with the process of training a Detectron2 model.

When writing a custom training loop from scratch, you can collect metrics just like you would for any other PyTorch model, see the metrics collection user guide for details.

When using the Trainer abstraction, you can use hooks to configure the 3LC integration. It is possible to use both the DefaultTrainer and the SimpleTrainer with 3LC. When using the DefaultTrainer, 3LC will have access to the Detectron2 config object through the trainer. When using the SimpleTrainer, the config object must be passed to the hook constructors. 3LC will read the Detectron2 config to determine if the desired metrics collection scheme is valid and to determine derived metrics such as the number of iterations per epoch. If any missing required fields are detected, an error will be raised.

See the Detectron2 training tutorial for more details on implementing model training.

Training on S3

When training directly with image files on S3, the S3PathHandler must be registered with Detectron2. This can be done by calling the following at the start of the training script:

from detectron2.utils.file_io import PathManager
from iopath.common.s3 import S3PathHandler

PathManager.register_handler(S3PathHandler())

We automatically register the S3PathHandler in 3LC when using the Detectron2 integration, so as long as import tlc is called in the training script, you do not need to manually register the handler.

Note: If using several data loading processes, ensure that the handler is registered in all processes.

Collecting Metrics#

It is possible to collect any metrics that can be computed from the model inputs and outputs. See the metrics collection guide for details on how to implement custom metrics collectors.

We also provide the DetectronMetricsCollectionHook which will collect any metrics added to the Detectron2 EventStorage by the trainer or the model.

The following code shows how to setup bounding box and Detectron2 metrics collection when using the Trainer abstraction.

from tlc.integration.detectron2 import MetricsCollectionHook, DetectronMetricsCollectionHook
from tlc import BoundingBoxMetricsCollector

metrics_collector = BoundingBoxMetricsCollector(
    model=trainer.model,
    classes=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_classes,
    label_mapping=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_dataset_id_to_contiguous_id,
    iou_threshold=0.5,
    compute_derived_metrics=True,
)

trainer.register_hooks(
        [
            MetricsCollectionHook(
                dataset_name=TRAIN_DATASET_NAME,
                metrics_collectors=[metrics_collector],
                collection_start_iteration=250,
                collection_frequency=250,
            ),
            MetricsCollectionHook(
                dataset_name=TEST_DATASET_NAME,
                metrics_collectors=[metrics_collector],
                collect_metrics_after_train=True,
            ),
            DetectronMetricsCollectionHook(
                run_url=session.run_url,
                collection_frequency=5,
            ),
        ]
    )

This code will collect metrics for the training dataset every 250 iterations, starting at iteration 250, and for the test dataset after training is complete. In addition, Detectron2 built-in metrics will be collected every 5 iterations.