Register Datasets#
The first step in integrating your training code with 3LC is to register your datasets. This can be accomplished in a number of ways, depending on the format of your data and the framework you are integrating with.
Importer Tables#
The most direct way of registering a dataset is to manually create a Table
object, using one of the “importer” table types.
We currently support loading CSV, parquet, COCO-format and Pandas datasets.
import tlc
## Assuming data.csv is in the same directory as this notebook
csv_table = tlc.Table.from_csv("./data.csv", table_name="my-csv-table")
## Assuming data.parquet is in the same directory as this notebook
parquet_table = tlc.Table.from_parquet("./data.parquet", table_name="my-parquet-table")
## Assuming annotations.json and images/ are in the same directory as this notebook
coco_table = tlc.Table.from_coco(
annotations_file="./annotations.json",
image_folder="./images",
table_name="my-coco-table",
)
## Assuming df is a pandas DataFrame
df_table = tlc.Table.from_pandas(df, table_name="my-pandas-table")
## Assuming data is a dictionary
dict_table = tlc.Table.from_dict(data, table_name="my-dict-table")
The above code creates Table
objects for each of the input types. All the Table.from_*
methods provide a set of
common parameters for controlling the destination URL, schema and sample-view information, the
behavior if the table already exists, configuration of default columns, and more.
See 3LC Project Structure for more information on how to control the URL of the created Table
object.
From PyTorch Dataset#
To register a PyTorch Dataset as a Table
, call
Table.from_torch_dataset
.
Under the hood, this will create a
TableFromTorchDataset
.,
which is a subclass of Table
.
YOLO Format#
For details on how to register datasets in the YOLO format, see the YOLOv5 or YOLOv8 integration documentation for more details.
COCO Format with Detectron2#
When integrating with the detectron2 framework, the python package provides a drop-in replacement for the
register_coco_instances
function from
detectron2. See the detectron2 integration documentation for more details.
Hugging Face Datasets#
In order to use datasets from Hugging Face 🤗 Datasets, tlc
provides an alternative to the
datasets.load_dataset
function;
Table.from_hugging_face
.
See the Hugging Face integration documentation for more details.