Integrating 3LC with Detectron2#
This document describes how to integrate 3LC in projects using Detectron2, a popular computer vision library built on top of PyTorch that provides a modular and extensible framework for object detection and segmentation. 3LC provides several methods to make it easy to integrate 3LC with your existing Detectron2 projects.
For a complete working example of using 3LC with Detectron2, see the 3LC Detectron2 Balloons Example and 3LC Detectron2 COCO128 Example notebooks.
Note
In order to use the Detectron2 integration, the detectron2
package must be installed in the Python environment. See
the Detectron2 documentation for instructions on
how to install or build an appropriate version, depending on your platform, CUDA version, and PyTorch version.
We strongly recommend using a CUDA-enabled version of Detectron2 if you have a compatible GPU available.
Because there is not a single particular pre-built package for Detectron2 that is appropriate for all 3LC users, the
tlc
package does not list detectron2
as a formal dependency that gets installed when 3LC is installed. Instead, it
is up to the end-user to make sure an appropriate version of detectron2
is installed in the Python environment prior
to using the 3LC Detectron2 integration. If detectron2
is not installed, the tlc.integration.detectron2
module will
not be available.
Registering Datasets#
In order to make a dataset available to both 3LC and Detectron2, it needs to be registered in the global
MetadataCatalog
. The convenience method
register_coco_instances
does this
for datasets in the COCO format.
from tlc.integration.detectron2 import register_coco_instances
register_coco_instances(
name="my-dataset",
metadata={},
json_file="path/to/my/dataset.json",
image_root="path/to/my/dataset/images",
)
The first time this method is called a Table
will be created internally, based on
the specific combination of json annotations-file and image folder. Subsequent calls with the same signature will
resolve to the latest available revision descending from the initial 3LC Table.
Note
Ensure that image_root / json_file["images"][i]["file_name"]
resolves to the full path to a image for each image in the dataset. If the file_name
fields in the annotations file
are absolute paths, leave image_root
empty.
This method will also register the dataset in the global
MetadataCatalog
and
DatasetCatalog
. This is required for Detectron2 to be able to find
the dataset. The metadata catalog will be populated with the following fields (items added by 3LC in bold):
Metadata Entry |
Description |
---|---|
json_file |
Path to the JSON file containing the annotations |
image_root |
Root directory for all the images in the dataset |
thing_classes |
List of class IDs as specified in the annotations file |
thing_dataset_id_to_contiguous_id |
Mapping between class IDs and contiguous IDs used for model training |
initial_tlc_table_url |
URL of the initial 3LC Table |
latest_tlc_table_url |
URL of the latest revision of the initial table |
dataset_size |
The total number of samples in the dataset |
To access the metadata and dataset dicts, use the following code:
from detectron2.data import DatasetCatalog
metadata = MetadataCatalog.get("my-dataset")
dataset_dicts = DatasetCatalog.get("my-dataset")
For more information on the Detectron2 dataset format, see the Use Custom Datasets tutorial and the detectron2.data API documentation.
Training a Model#
3LC supports multiple methods for integrating with the process of training a Detectron2 model.
When writing a custom training loop from scratch, you can collect metrics just like you would for any other PyTorch model, see the metrics collection user guide for details.
When using the Trainer
abstraction, you can use hooks to configure the 3LC integration. It is possible to use both the
DefaultTrainer
and the SimpleTrainer
with 3LC. When using the DefaultTrainer
, 3LC will have access to the
Detectron2 config object through the trainer. When using the SimpleTrainer
, the
config object must be passed to the hook constructors. 3LC will read the Detectron2 config to determine if the desired
metrics collection scheme is valid and to determine derived metrics such as the number of iterations per epoch. If any
missing required fields are detected, an error will be raised.
See the Detectron2 training tutorial for more details on implementing model training.
Training on S3
When training directly with image files on S3, the S3PathHandler
must be registered with Detectron2. This can be done
by calling the following at the start of the training script:
from detectron2.utils.file_io import PathManager
from iopath.common.s3 import S3PathHandler
PathManager.register_handler(S3PathHandler())
We automatically register the S3PathHandler
in 3LC when using the Detectron2 integration, so as long as import tlc
is called in the training script, you do not need to manually register the handler.
Note: If using several data loading processes, ensure that the handler is registered in all processes.
Collecting Metrics#
It is possible to collect any metrics that can be computed from the model inputs and outputs. See the metrics collection guide for details on how to implement custom metrics collectors.
We also provide the
DetectronMetricsCollectionHook
which will collect any metrics added to the Detectron2 EventStorage
by the trainer or the model.
The following code shows how to setup bounding box and Detectron2 metrics collection when using the Trainer
abstraction.
from tlc.integration.detectron2 import MetricsCollectionHook, DetectronMetricsCollectionHook
from tlc import BoundingBoxMetricsCollector
metrics_collector = BoundingBoxMetricsCollector(
model=trainer.model,
classes=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_classes,
label_mapping=MetadataCatalog.get(TRAIN_DATASET_NAME).thing_dataset_id_to_contiguous_id,
iou_threshold=0.5,
compute_derived_metrics=True,
)
trainer.register_hooks(
[
MetricsCollectionHook(
dataset_name=TRAIN_DATASET_NAME,
metrics_collectors=[metrics_collector],
collection_start_iteration=250,
collection_frequency=250,
),
MetricsCollectionHook(
dataset_name=TEST_DATASET_NAME,
metrics_collectors=[metrics_collector],
collect_metrics_after_train=True,
),
DetectronMetricsCollectionHook(
run_url=session.run_url,
collection_frequency=5,
),
]
)
This code will collect metrics for the training dataset every 250 iterations, starting at iteration 250, and for the test dataset after training is complete. In addition, Detectron2 built-in metrics will be collected every 5 iterations.