Create Table from PyTorch Dataset¶

Convert a PyTorch Dataset into a 3LC Table using the built-in conversion method.

Many ML practitioners already have PyTorch datasets for their projects. Converting them to 3LC Tables allows you to leverage 3LC’s data analysis and visualization capabilities while maintaining compatibility with your existing PyTorch training pipelines.

We use tlc.Table.from_torch_dataset() to convert any map-style PyTorch Dataset to a 3LC Table. The method iterates through the dataset, converts samples to 3LC’s format, and can either infer the schema from the first sample or use a provided schema.

Note that this method requires map-style datasets (not iterable) and needs access to the dataset length. It’s not suitable for stochastic or infinite datasets since it requires a complete iteration to create the table.

Project setup¶

[ ]:

TMP_PATH = "../../transient_data"  # A folder to store temporary data (zipped CIFAR images)

Install dependencies¶

[ ]:

%pip install -q 3lc

Imports¶

[ ]:

import tlc
from torchvision.datasets import CIFAR10

Create Table¶

[ ]:

train_dataset = CIFAR10(TMP_PATH, train=True, download=True)
val_dataset = CIFAR10(TMP_PATH, train=False)

# The "structure" of the table is a representation of an individual sample in the dataset.
# Here, we define the structure of the table to be A tuple containing a image and a label.
structure = (tlc.PILImage("Image"), tlc.CategoricalLabel("Label", classes=train_dataset.classes))

train_table = tlc.Table.from_torch_dataset(
    train_dataset,
    structure=structure,
    project_name="3LC Tutorials - CIFAR-10",
    dataset_name="CIFAR-10-train",
    table_name="initial",
)

val_table = tlc.Table.from_torch_dataset(
    val_dataset,
    structure=structure,
    project_name="3LC Tutorials - CIFAR-10",
    dataset_name="CIFAR-10-val",
    table_name="initial",
)