Sample Transforms¶

In most training scripts, some kind of transform is applied to the data before it is consumed by the model — for example, in computer vision, augmentations and conversion to tensors. 3LC supports this through Table.with_transform(), which returns a lightweight TableView wrapping the source Table.

Provide a Python function that takes a full sample and returns the transformed sample. The transform is applied on-the-fly when __getitem__ is called on the view; the underlying Table, its Sample View, and the data on disk are never modified.

Each call to Table.with_transform() returns a fresh TableView instance — two calls with the same callable do not produce the same Python object. Hoist the view (view = table.with_transform(fn)) when you need a stable reference across calls.

If your view will be consumed by a torch.utils.data.DataLoader with num_workers > 0, the transform must be picklable: a top-level function or importable callable, not a lambda or local closure.

Example¶

The following simple example shows how Table.with_transform() can be used to apply torchvision transforms to the data.

[ ]:

from PIL import Image

import tlc

table = tlc.Table.from_torch_dataset(
    dataset=[{"image": Image.open("3lc-logo.png")}],
    schema={"image": tlc.schemas.ImageSchema()},
    project_name="With Transform Project",
    if_exists="overwrite",
)

# The Sample view of the Table presents the PIL Image directly
table[0]["image"]

../../_images/user-guide_tables_transforms_1_0.png

[2]:

from torchvision import transforms

train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.RandomRotation(degrees=45),
    transforms.RandomHorizontalFlip(),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.5, 1.5), shear=10),
    transforms.ToPILImage(), # Back to PIL for visualization
])

def train_transform(sample):
    return {"image": train_transforms(sample["image"])}

train_view = table.with_transform(train_transform)

def plot_augmented_samples(images):
    widths, heights = zip(*(im.size for im in images))
    total_width = sum(widths)
    max_height = max(heights)

    concat_img = Image.new('RGB', (total_width, max_height))
    x_offset = 0
    for im in images:
        concat_img.paste(im, (x_offset, 0))
        x_offset += im.size[0]

    return concat_img

plot_augmented_samples([train_view[0]["image"] for _ in range(10)])

[2]:

../../_images/user-guide_tables_transforms_2_0.png

Different transforms for training and metrics collection¶

It is common to use heavier augmentation during training and lighter (or no) augmentation during metrics collection. Build one view per pipeline and feed each to its own consumer — the underlying Table is shared, so the data is loaded only once.

Pass the metrics-collection view directly to {py:func}tlc.collect_metrics; it accepts any MapDataset (anything implementing __len__ and __getitem__), and uses the underlying Table.url to associate the collected metrics with the source Table.

[3]:

collection_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.ToPILImage(), # Back to PIL for visualization
])

def collection_transform(sample):
    return {"image": collection_transforms(sample["image"])}

metrics_view = table.with_transform(collection_transform)

plot_augmented_samples([metrics_view[0]["image"] for _ in range(10)])

[3]:

../../_images/user-guide_tables_transforms_4_0.png

Composing transforms¶

Transforms compose by chaining .with_transform() calls. Each call wraps the previous view, so transforms are applied in the order they were attached: the inner transform runs first, the outer transform last.

[4]:

def to_tensor(sample):
    return {"image": transforms.ToTensor()(sample["image"])}

def add_noise(sample):
    import torch
    return {"image": sample["image"] + 0.05 * torch.randn_like(sample["image"])}

noisy_tensor_view = table.with_transform(to_tensor).with_transform(add_noise)
noisy_tensor_view[0]["image"].shape

[4]:

torch.Size([3, 89, 83])

View identity¶

A TableView is not a Table: it has no schema, no persistence, and no object-registry identity of its own. Its url and source always resolve to the underlying Table, regardless of how many transforms have been chained on top. Use view.source (e.g. when constructing a sampler) to access the root Table.

[5]:

assert noisy_tensor_view.source is table
assert noisy_tensor_view.url == table.url
len(noisy_tensor_view) == len(table)

[5]:

True