tlc.sample_types

Built-in and custom sample types for 3LC tables.

Sample types describe how a Python value (an image, a tensor, a list of bounding boxes, …) is converted to and from the row form a tlc.Table stores. They are the counterpart of tlc.schemas: schemas describe the on-disk shape of a column; sample types describe the user-facing Python value for that column.

The core protocol lives in tlc.sample_types.SampleType (and its subclass tlc.sample_types.ExternalSampleType for values stored outside the row). Register custom sample types via tlc.sample_types.register_sample_type().

Package Contents

Classes

Class

Description

BoundingBoxes2DSampleType

SampleType for 2D bounding box instances.

BoundingBoxes3DSampleType

SampleType for 3D bounding box instances.

EncodedSample

A pre-encoded binary sample with its file extension.

ExternalNumpyArraySampleType

External transform for numpy arrays stored as .npy files.

ExternalSampleType

Base class for sample types that store data externally.

ExternalTorchTensorSampleType

External transform for torch tensors stored as .npy files (falls back to .pt on read).

ExternalizationContext

Context passed to externalize().

Geometry2DSampleType

SampleType for generic 2D geometry instances.

Geometry3DSampleType

SampleType for generic 3D geometry instances.

Hidden

SampleType for hidden columns that should be excluded from sample view.

Identity

Pass-through transform that performs no conversion.

JpegImageSampleType

PIL.Image stored as JPEG. See PILImageSampleType for the contract.

Keypoints2DSampleType

SampleType for 2D keypoint instances.

LargeBytes

SampleType for large binary data stored as external files.

NumpyArraySampleType

Inline transform for numpy arrays.

OrientedBoundingBoxes2DSampleType

SampleType for 2D oriented bounding box instances.

OrientedBoundingBoxes3DSampleType

SampleType for 3D oriented bounding box instances.

PILImageSampleType

SampleType for PIL.Image stored as external PNG files.

Path

SampleType that converts URL paths to absolute paths.

SampleType

Base class for inline sample types.

SampleTypeInfo

Structured metadata for a registered sample type.

SampleTypeRegistry

Registry for sample types.

SegmentationMasksSampleType

SampleType for mask-based instance segmentation and RLE storage.

SegmentationPolygonsSampleType

SampleType for polygon-based instance segmentation, stored as RLE on the wire.

TorchTensorSampleType

Inline transform for torch tensors.

ValidationError

A validation error or warning found during data validation.

WebpImageSampleType

PIL.Image stored as WEBP. See PILImageSampleType for the contract.

Functions

Function

Description

get_sample_types

List all registered sample type names.

register_sample_type

Decorator to register a SampleType subclass by name.

Data

API

class BoundingBoxes2DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for 2D bounding box instances.

Converts between tlc.data_types.bounding_boxes.BoundingBoxes2D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Inline:

  • to_row(): BoundingBoxes2D -> dict with instances/additional_data

  • from_row(): dict with instances/additional_data -> BoundingBoxes2D

accepts(
value: Any,
) bool

Check if the value is a BoundingBoxes2D dataclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is a BoundingBoxes2D dataclass instance.

from_row(
data: Any,
) Any

Create a BoundingBoxes2D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with bounding box data.

Returns:

A BoundingBoxes2D object.

Raises:

ValueError – If the row does not contain instances or if array lengths are inconsistent.

to_row(
sample: Any,
) Mapping[str, Any]

Convert a BoundingBoxes2D to the internal 3LC Table row format.

Parameters:

sample – A BoundingBoxes2D object.

Returns:

Wire-format dict with instances and instances_additional_data keys.

validate_sample(
sample: Any,
) list[ValidationError]

Validate a BoundingBoxes2D sample.

Checks that the sample has the expected type, array shapes, and consistent instance counts.

Parameters:

sample – The sample to validate.

Returns:

A list of validation errors.

class BoundingBoxes3DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for 3D bounding box instances.

Converts between tlc.data_types.bounding_boxes.BoundingBoxes3D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Inline:

  • to_row(): BoundingBoxes3D -> dict with instances/additional_data

  • from_row(): dict with instances/additional_data -> BoundingBoxes3D

accepts(
value: Any,
) bool

Check if the value is a BoundingBoxes3D dataclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is a BoundingBoxes3D dataclass instance.

from_row(
data: Any,
) Any

Create a BoundingBoxes3D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with bounding box data.

Returns:

A BoundingBoxes3D object.

Raises:

ValueError – If the row does not contain instances or if array lengths are inconsistent.

to_row(
sample: Any,
) Mapping[str, Any]

Convert a BoundingBoxes3D to the internal 3LC Table row format.

Parameters:

sample – A BoundingBoxes3D object.

Returns:

Wire-format dict with instances and instances_additional_data keys.

class EncodedSample

A pre-encoded binary sample with its file extension.

Use this to hand pre-encoded bytes to the write pipeline without round-tripping through a Python decode/re-encode cycle. Common cases:

  • Hugging Face datasets with datasets.Image(decode=False) yield {"bytes": b"...", "path": "image.jpg"}; an adapter can wrap that as EncodedSample(bytes=..., extension=".jpg").

  • Any source that produces format-known bytes (scraped, user-supplied, etc.).

An ExternalSampleType receiving an EncodedSample writes the bytes verbatim to the allocated URL, preserving the original format.

bytes: tlc.sample_types._sample_type.EncodedSample.bytes = None

The encoded byte payload.

extension: str = None

File extension including the leading dot (e.g. ".png", ".jpg").

class ExternalNumpyArraySampleType

Bases: tlc.sample_types._sample_type.ExternalSampleType

External transform for numpy arrays stored as .npy files.

accepts(
value: Any,
) bool
file_extension = .npy
load(
url: Url,
) ndarray
save(
sample: ndarray,
url: Url,
) None
validate_sample(
sample: Any,
) list[ValidationError]
class ExternalSampleType

Bases: tlc.sample_types._sample_type.SampleType

Base class for sample types that store data externally.

Subclass this when data should be stored outside the table row — as individual files, objects, or resources accessed via tlc.Url. This covers local files, S3 objects, GCS blobs, or any custom URL adapter target.

Override save() and load(), and set file_extension to the appropriate suffix (e.g., ".npy", ".png").

Use the register_sample_type() decorator to register custom types.

accepts(
value: Any,
) bool

Check whether value is a live sample that should be externalized.

External sample types should override this method. The write pipeline uses accepts() to distinguish sample-form values (live Python objects to be externalized via save()) from row-form values (pre-externalized URL strings, returned unchanged). A subclass that leaves the default will treat every value as row-form and silently skip externalization — almost never what you want.

A typical implementation is an isinstance() check against the native sample type, optionally also accepting EncodedSample and/or raw bytes when the default externalize() pre-encoded-bytes fast paths should apply.

Parameters:

value – The value to check.

Returns:

True if value is in sample form and should be externalized.

externalize(
sample: Any,
ctx: ExternalizationContext,
) str

Externalize a sample to a file and return the backing URL.

This is where the write pipeline enters an external sample type. The default implementation handles three cases in order:

  1. Pre-encoded bytes (EncodedSample or raw bytes) — written verbatim to a new URL, skipping any decode/encode round-trip. EncodedSample supplies its own extension; raw bytes use file_extension.

  2. Backing file known (via source_url()) — the existing URL is returned as-is; no copy is made.

  3. Encode from scratch — a new URL is allocated and save() is called.

The returned URL string may be absolute; the writer pipeline normalizes every URL leaf (including ExternalSampleType leaves) to a table-relative string as a final step. Subclasses need not relativize themselves.

Subclasses rarely need to override this. The standard extension points are save(), load(), accepts(), and optionally source_url(). Override externalize() itself only when you need unusual storage logic (multiple files per sample, content-addressed naming, conditional writes).

Parameters:
  • sample – Value to externalize. May be a native Python object (e.g., PIL.Image), raw bytes, or EncodedSample.

  • ctx – The externalization context carrying the URL allocator, table URL, schema path, and relativization depth.

Returns:

URL string pointing at the externalized file. Typically absolute; the pipeline relativizes it later.

file_extension: str = <Multiline-String>

File extension for externally stored data (e.g., ".npy", ".png").

abstract load(
url: Url,
) Any

Load a sample from an external URL.

The implementation should read bytes via url.read_bytes() and deserialize to a Python object.

Parameters:

url – The source URL to read from (works with any storage backend).

Returns:

The Python object in sample form.

abstract save(
sample: Any,
url: Url,
) None

Write a sample to an external URL.

The implementation should serialize the sample and write bytes via url.write_bytes().

Parameters:
  • sample – The Python object in sample form.

  • url – The target URL to write to (works with local files, S3, GCS, etc.).

source_url(
sample: Any,
) str | None

Return the URL of an existing file backing this sample, if any.

Optional optimization hook. When the sample is already backed by a file on disk (or any URL-addressable location), overriding this to return that URL lets externalize() reference the existing file instead of calling save() to write a copy.

Typical use cases: PIL images loaded from disk expose filename and _tlc_url attributes; tensors loaded from .npy files can carry their source path; any sample produced by load() can stash the URL it was loaded from.

Parameters:

sample – The Python object in sample form.

Returns:

A URL or path string for an existing file that already holds this sample’s content, or None if no backing file exists (the default, which causes externalize to call save).

class ExternalTorchTensorSampleType

Bases: tlc.sample_types._sample_type.ExternalSampleType

External transform for torch tensors stored as .npy files (falls back to .pt on read).

accepts(
value: Any,
) bool
file_extension = .npy
load(
url: Url,
) Tensor
save(
sample: Tensor | ndarray,
url: Url,
) None
validate_sample(
sample: Any,
) list[ValidationError]
class ExternalizationContext

Context passed to externalize().

Carries the minimal state a sample type needs to externalize one value: the schema path this leaf sits at, and a URL allocator for new files. URL relativization is the writer pipeline’s job — sample types just return an absolute URL.

Variables:

schema_path – Tuple path to the leaf in the schema tree. E.g. ("image",) for a top-level column, or ("instances", "mask") for a column nested inside a composite.

allocate_url(
extension: str,
) Url

Allocate the next file URL for the column at this leaf path.

Files are organized per-leaf-path: nested paths are joined with dots to form a flat subdirectory name (e.g., instances.mask). Each path has its own counter, so URLs are of the form <bulk_data_url>/<joined-path>/<counter><ext>.

Parameters:

extension – File extension including the leading dot (e.g. ".png").

Returns:

Absolute URL for the next file.

descend(
segment: str,
) ExternalizationContext

Return a new context whose schema path has segment appended.

Used when recursively descending into a composite schema: each child leaf sees a context whose schema_path reflects its position in the tree, so allocated URLs land in the right per-leaf subdirectory.

Parameters:

segment – The child key to append to the current path.

Returns:

A new context with the extended path; the underlying URL allocator is shared by reference.

schema_path: tuple[str, ...] = None
class Geometry2DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for generic 2D geometry instances.

Converts between tlc.data_types.geometries.Geometry2D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Also accepts any tlc.data_types.Geometry2DBase subclass during schema inference via accepts().

accepts(
value: Any,
) bool

Check if the value is a Geometry2DBase subclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is a Geometry2DBase instance.

from_row(
data: Any,
) Any

Create a Geometry2D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with geometry data.

Returns:

A Geometry2D object.

to_row(
sample: Any,
) dict[str, Any]

Convert a Geometry2D to the internal 3LC Table row format.

Parameters:

sample – A Geometry2D object.

Returns:

A dictionary with the structure expected by 3LC Tables.

class Geometry3DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for generic 3D geometry instances.

Converts between tlc.data_types.geometries.Geometry3D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Also accepts any tlc.data_types.Geometry3DBase subclass during schema inference via accepts().

accepts(
value: Any,
) bool

Check if the value is a Geometry3DBase subclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is a Geometry3DBase instance.

from_row(
data: Any,
) Any

Create a Geometry3D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with geometry data.

Returns:

A Geometry3D object.

to_row(
sample: Any,
) dict[str, Any]

Convert a Geometry3D to the internal 3LC Table row format.

Parameters:

sample – A Geometry3D object.

Returns:

A dictionary with the structure expected by 3LC Tables.

class Hidden

Bases: tlc.sample_types._sample_type.Identity

SampleType for hidden columns that should be excluded from sample view.

Hidden columns are present in row view but absent from sample view. to_row() and from_row() are inherited from Identity (pass-through) but should never be called in practice since hidden columns are filtered out before transform application.

is_included_in_sample = False
class Identity

Bases: tlc.sample_types._sample_type.SampleType

Pass-through transform that performs no conversion.

Used as the default when a schema has no explicit transform configured. to_row() and from_row() return the value unchanged.

class JpegImageSampleType

Bases: tlc.sample_types._image.PILImageSampleType

PIL.Image stored as JPEG. See PILImageSampleType for the contract.

file_extension = .jpeg
class Keypoints2DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for 2D keypoint instances.

Converts between tlc.data_types.keypoints.Keypoints2D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Inline:

  • to_row(): Keypoints2D -> dict with instances/additional_data

  • from_row(): dict with instances/additional_data -> Keypoints2D

accepts(
value: Any,
) bool

Check if the value is a Keypoints2D dataclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is a Keypoints2D dataclass instance.

from_row(
data: Any,
) Any

Create a Keypoints2D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with keypoint data.

Returns:

A Keypoints2D object.

Raises:

ValueError – If the row does not contain instances or if array lengths are inconsistent.

to_row(
sample: Any,
) Mapping[str, Any]

Convert a Keypoints2D to the internal 3LC Table row format.

Parameters:

sample – A Keypoints2D object.

Returns:

Wire-format dict with instances and instances_additional_data keys.

Raises:

ValueError – If both visibility and confidence arrays are present (only one is supported).

LEGACY_SAMPLE_TYPE_MAPPING: dict[str, dict[str, Any]] = None
class LargeBytes

Bases: tlc.sample_types._sample_type.ExternalSampleType

SampleType for large binary data stored as external files.

accepts(
value: Any,
) bool

Check if the value is bytes or EncodedSample.

EncodedSample is accepted so the default externalize() path can honor an alternative extension supplied by the caller.

Parameters:

value – The value to check.

Returns:

True if the value is bytes or EncodedSample.

file_extension = .bin
from_row(
data: bytes,
) bytes

Pass through bytes.

Parameters:

data – The bytes data.

Returns:

The same bytes data.

load(
url: Url,
) bytes

Read bytes from a URL.

Parameters:

url – The source URL.

Returns:

The file contents as bytes.

save(
data: bytes,
url: Url,
) None

Write bytes to a URL.

Parameters:
  • data – The bytes data.

  • url – The target URL.

class NumpyArraySampleType

Bases: tlc.sample_types._sample_type.SampleType

Inline transform for numpy arrays.

The sample type is shape-blind: it returns the ndarray as-is. The wrapping Schema decides whether to flatten (zero-copy reshape(-1)) or convert to a nested Python list, based on its own declared dims and column type.

accepts(
value: Any,
) bool
from_row(
data: Any,
) ndarray
to_row(
sample: ndarray,
) Any
validate_sample(
sample: Any,
) list[ValidationError]
class OrientedBoundingBoxes2DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for 2D oriented bounding box instances.

Converts between tlc.data_types.obb.OrientedBoundingBoxes2D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Inline:

  • to_row(): OrientedBoundingBoxes2D -> dict with instances/additional_data

  • from_row(): dict with instances/additional_data -> OrientedBoundingBoxes2D

accepts(
value: Any,
) bool

Check if the value is an OrientedBoundingBoxes2D dataclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is an OrientedBoundingBoxes2D dataclass instance.

from_row(
data: Any,
) Any

Create an OrientedBoundingBoxes2D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with OBB data.

Returns:

An OrientedBoundingBoxes2D object.

Raises:

ValueError – If the row does not contain instances.

to_row(
sample: Any,
) Mapping[str, Any]

Convert an OrientedBoundingBoxes2D to the internal 3LC Table row format.

Parameters:

sample – An OrientedBoundingBoxes2D object.

Returns:

Wire-format dict with instances and instances_additional_data keys.

class OrientedBoundingBoxes3DSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for 3D oriented bounding box instances.

Converts between tlc.data_types.obb.OrientedBoundingBoxes3D dataclass (sample form) and the hierarchical dict row format used by 3LC Tables.

Inline:

  • to_row(): OrientedBoundingBoxes3D -> dict with instances/additional_data

  • from_row(): dict with instances/additional_data -> OrientedBoundingBoxes3D

accepts(
value: Any,
) bool

Check if the value is an OrientedBoundingBoxes3D dataclass instance.

Parameters:

value – The value to check.

Returns:

True if the value is an OrientedBoundingBoxes3D dataclass instance.

from_row(
data: Any,
) Any

Create an OrientedBoundingBoxes3D from a 3LC Table row dict.

Parameters:

data – A dictionary representing a single row from a 3LC Table with 3D OBB data.

Returns:

An OrientedBoundingBoxes3D object.

Raises:

ValueError – If the row does not contain instances.

to_row(
sample: Any,
) Mapping[str, Any]

Convert an OrientedBoundingBoxes3D to the internal 3LC Table row format.

Parameters:

sample – An OrientedBoundingBoxes3D object.

Returns:

Wire-format dict with instances and instances_additional_data keys.

class PILImageSampleType

Bases: tlc.sample_types._sample_type.ExternalSampleType

SampleType for PIL.Image stored as external PNG files.

Subclasses override file_extension and _save_format to write other formats (JPEG, WEBP). Read behavior is shared — PIL sniffs the format from the file’s magic bytes — so any PIL.Image-decoding subclass can read files written by any other.

The configured format is an encoder, not a re-encoder. It only governs how a PIL.Image constructed in memory is written to disk for the first time. Inputs already backed by a file (PIL images with filename/_tlc_url, raw bytes, EncodedSample, URL strings) pass through verbatim — see externalize().

  • save(): PIL.Image -> write to URL (with backed-file optimization)

  • load(): URL -> PIL.Image

accepts(
value: Any,
) bool

Check if the value is a PIL.Image, bytes, or EncodedSample.

Pre-encoded bytes are accepted as sample form because externalize() writes them verbatim, avoiding a decode/re-encode round-trip (e.g. for Hugging Face datasets loaded with datasets.Image(decode=False)).

Parameters:

value – The value to check.

Returns:

True if the value is a PIL.Image, bytes, or EncodedSample.

file_extension: str = .png
from_row(
data: bytes,
) Image

Convert bytes back to a PIL.Image.

Called for inline storage only. For file-based storage, load() is used instead and from_row() is never called.

Parameters:

data – Image bytes to deserialize.

Returns:

The PIL.Image object.

load(
url: Url,
) Image

Load a PIL.Image from a URL.

Parameters:

url – The source URL.

Returns:

The PIL.Image object with _tlc_url attribute set to the URL string.

save(
image: Image,
url: Url,
) None

Encode an in-memory PIL.Image to url in this transform’s format.

Called by externalize() only when the image has no backing file (source_url returned None). File-backed images are referenced verbatim by the pipeline and never reach this method.

Parameters:
  • image – The PIL.Image to encode.

  • url – The target URL.

Raises:

ValueError – If the image mode is incompatible with the configured format.

source_url(
image: Image,
) str | None

Return the URL of the image’s backing file, if any.

Checks the _tlc_url attribute (set by load()) and the filename attribute (set by PIL.Image.open). Returns the URL regardless of file extension — a column declared with sample_type="pil_png" does not force a re-encode of a file-backed JPEG; the row simply stores the existing .jpg URL and no new file is written. The configured format governs encoding only when an image has to be created from scratch (no backing file).

Parameters:

image – The PIL.Image to check.

Returns:

The URL or path string of the backing file, or None for in-memory images that must be encoded via save.

to_row(
image: Image,
) bytes

Encode a PIL.Image sample to bytes in this transform’s format.

Not used by the write pipeline — pipeline writes go through save() via externalize(). This method exists for direct callers that want the encoded bytes without touching the filesystem (round-trip tests, inline fallback).

Parameters:

image – The PIL.Image to encode.

Returns:

The image serialized as bytes in this transform’s format.

Raises:

ValueError – If the image mode is incompatible with the configured format.

validate_sample(
sample: Any,
) list[ValidationError]

Validate a PIL.Image sample.

Parameters:

sample – The sample to validate.

Returns:

A list of validation errors.

class Path

Bases: tlc.sample_types._sample_type.SampleType

SampleType that converts URL paths to absolute paths.

This is an identity transform for the data itself - it just ensures URLs are absolute when reading. The SampleType implementation did URL absolutization in sample_from_row, which is now handled elsewhere.

accepts(
value: Any,
) bool

Check if the value is a path string.

Parameters:

value – The value to check.

Returns:

True if the value is a string.

from_row(
data: str,
) str

Pass through the path (absolutization handled by Table).

Parameters:

data – The URL/path string.

Returns:

The same path string.

to_row(
path: str,
) str

Pass through the path.

Parameters:

path – The path string.

Returns:

The same path string.

class SampleType

Bases: abc.ABC

Base class for inline sample types.

A sample type converts between sample form (Python objects like PIL.Image, numpy arrays, or dataclasses) and row form (serializable data stored in tables).

This base class is for inline storage — data is stored directly in the table row. Override to_row() and from_row() for custom conversion logic. The default implementation is identity (returns the value unchanged).

For types that store data externally (images, large arrays), subclass ExternalSampleType instead.

Use the register_sample_type() decorator to register custom types.

accepts(
value: Any,
) bool

Check if this transform accepts the given value as a sample.

Used by the write pipeline to distinguish sample-form values (live Python objects to convert) from already-row-form values (passed through unchanged).

Inline sample types can usually leave the default (returns False) — the pipeline trusts the schema and calls to_row() directly. External sample types must override accepts() — see accepts().

Parameters:

value – The value to check.

Returns:

True if this transform can convert the value from sample form.

from_row(
data: Any,
) Any

Convert row data back to sample form.

The default implementation returns the data unchanged (identity). Override for custom inline conversion.

Parameters:

data – The data in row form.

Returns:

The Python object in sample form.

is_included_in_sample = True

Whether columns with this transform should be included in sample view.

to_row(
sample: Any,
) Any

Convert a sample to row form for storage.

The default implementation returns the sample unchanged (identity). Override for custom inline conversion.

Parameters:

sample – The Python object in sample form.

Returns:

The data in row form (e.g., nested list, dict, bytes).

validate_row(
row: Any,
) list[ValidationError]

Validate a row dict after serialization or before deserialization.

Override in subclasses to check that the row has the expected structure, types, and constraints.

Parameters:

row – The data in row form.

Returns:

A list of validation errors (empty if valid).

validate_sample(
sample: Any,
) list[ValidationError]

Validate a sample object before serialization.

Override in subclasses to check that the sample has the expected structure, types, and constraints before to_row() is called.

Parameters:

sample – The Python object in sample form.

Returns:

A list of validation errors (empty if valid).

class SampleTypeInfo

Bases: typing_extensions.TypedDict

Structured metadata for a registered sample type.

Initialize self. See help(type(self)) for accurate signature.

class_name: str = None
module: str = None
name: str = None
source: tlc.sample_types._sample_type.SampleTypeSource = None
class SampleTypeRegistry

Registry for sample types.

Sample types are registered by name and can be retrieved by that name. The name is used in Schema.resolved_sample_type to specify which sample type to use.

Note: This registry is not thread-safe. Built-in sample types are registered at import time, which is safe. If registering sample types at runtime from multiple threads, external synchronization is required.

classmethod discover_entrypoint_sample_types() list[type[SampleType]]

Discover and register sample types from installed entry points.

Scans for entry points in the tlc.sample_types group. Each entry point should reference a SampleType subclass. The entry point name becomes the registration name.

By default, names that are already registered are skipped (built-ins take priority). Set the class attribute force = True on the sample type class to replace an already-registered name.

Returns:

List of successfully loaded sample type classes.

classmethod get(
name: str,
) SampleType

Get a sample type instance by name.

All built-in sample types are parameter-free; per-row metadata travels on the sample-form value (e.g. SegmentationPolygons.relative), and per-column encoding choices are baked into the variant name (e.g. "pil_jpeg").

Parameters:

name – The registered name of the sample type.

Returns:

An instance of the sample type.

Raises:

KeyError – If no sample type is registered with the given name.

classmethod get_class(
name: str,
) type[SampleType]

Get the sample type class by name without instantiating it.

Parameters:

name – The registered name of the sample type.

Returns:

The sample type class.

Raises:

KeyError – If no sample type is registered with the given name.

classmethod get_registered_sample_types() dict[str, type[SampleType]]

Get all registered sample types.

Returns:

A copy of the name-to-class mapping.

classmethod has(
name: str,
) bool

Check if a sample type is registered with the given name.

Parameters:

name – The name to check.

Returns:

True if a sample type is registered with the name.

classmethod list_sample_type_names() list[str]

List the names of all registered sample types.

Triggers entry point discovery if it has not yet run.

Returns:

Sorted list of registered sample type names.

classmethod list_sample_types() list[SampleTypeInfo]

List all registered sample types with structured metadata.

Triggers entry point discovery if it has not yet run.

Returns:

One SampleTypeInfo per registered sample type, sorted by name.

classmethod load_sample_types_from_config(
config: list[dict[str, Any]],
*,
force: bool = False,
) list[type[SampleType]]

Load sample types from a configuration list.

Each entry should be a dictionary with:

  • module: The fully qualified module name.

  • class: The class name within that module.

  • name (optional): Registration name. Defaults to the entry point style lowercase class name if omitted.

  • force (optional): Override an already-registered name for this entry.

Parameters:
  • config – List of sample type configuration dictionaries.

  • force – Default force value. Individual entries can override with their own force key.

Returns:

List of successfully loaded sample type classes.

classmethod register_sample_type(
name: str,
sample_type_cls: type[SampleType],
*,
source: tlc.sample_types._sample_type.SampleTypeSource = 'runtime',
force: bool = False,
) None

Register a sample type class under the given name.

Parameters:
  • name – The name to register the sample type under.

  • sample_type_cls – The SampleType subclass to register.

  • source – Where this registration originates from.

  • force – If True, replace an existing registration. A warning is logged when overriding a built-in sample type.

Raises:

ValueError – If name is already registered with a different class and force is False.

classmethod unregister_sample_type(
name: str,
) bool

Unregister the sample type registered under name.

Parameters:

name – The name to unregister.

Returns:

True if the name was found and removed, False otherwise.

SampleTypeSource = None
class SegmentationMasksSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for mask-based instance segmentation and RLE storage.

Sample form: SegmentationMasks dataclass Storable form: dict with keys {image_height, image_width, instance_properties, rles}

Also accepts legacy dict input (with keys image_height, image_width, instance_properties, masks) for backward compatibility.

Example::

transform = SegmentationMasks()

sample = SegmentationMasks(
    image_height=480,
    image_width=640,
    masks=np.zeros((480, 640, 2), dtype=np.uint8),
    labels=np.array([1, 2]),
)

storable = transform.to_row(sample)
restored = transform.from_row(storable)
accepts(
value: Any,
) bool

Check if the value is a mask-based instance segmentation.

Parameters:

value – The value to check.

Returns:

True if value is a SegmentationMasks dataclass or a legacy dict with the expected keys.

from_row(
data: Mapping[str, Any],
) SegmentationMasks

Convert RLE storage format back to SegmentationMasks.

Parameters:

data – Dict with image_height, image_width, instance_properties, rles.

Returns:

SegmentationMasks dataclass.

to_row(
sample: SegmentationMasks | dict[str, Any],
) Mapping[str, Any]

Convert mask-based instance segmentation to RLE storage format.

Parameters:

sample – SegmentationMasks dataclass or legacy dict.

Returns:

Wire-format dict with image dimensions, instance properties, and RLEs.

class SegmentationPolygonsSampleType

Bases: tlc.sample_types._sample_type.SampleType

SampleType for polygon-based instance segmentation, stored as RLE on the wire.

Sample form: SegmentationPolygons dataclass. Storable form: dict with keys {image_height, image_width, instance_properties, rles}.

Coordinate convention travels on sample.relative (True = [0, 1], False = pixel). to_row() reads it to scale input polygons up to pixel space before RLE-encoding. from_row() always returns absolute pixel coords — RLE decoding is pixel-natural. When the column’s value type declares polygons_are_relative=True, tlc.Schema.from_row() honors it and applies to_relative() on the way out; direct callers of this method (transform.from_row(...)) must convert themselves if needed.

Also accepts a legacy dict input (with keys image_height, image_width, instance_properties, polygons) — assumed to be in pixel coordinates.

Example::

sample = SegmentationPolygons(
    image_height=480,
    image_width=640,
    polygons=[[10.0, 20.0, 30.0, 40.0, 50.0, 60.0]],
    labels=np.array([1]),
)

transform = SegmentationPolygonsSampleType()
storable = transform.to_row(sample)
restored = transform.from_row(storable)         # relative=False
relative = restored.to_relative()               # relative=True
accepts(
value: Any,
) bool

Check if the value is a polygon-based instance segmentation.

Parameters:

value – The value to check.

Returns:

True if value is a SegmentationPolygons dataclass or a legacy dict with the expected keys.

from_row(
data: Mapping[str, Any],
) SegmentationPolygons

Convert RLE storage format back to SegmentationPolygons in absolute pixel coords.

Always returns relative=False. The schema-driven path (tlc.Schema.from_row()) applies to_relative() on top when the column’s value type declares polygons_are_relative=True.

Parameters:

data – Dict with image_height, image_width, instance_properties, rles.

Returns:

SegmentationPolygons dataclass with relative=False.

to_row(
sample: SegmentationPolygons | Mapping[str, Any],
) Mapping[str, Any]

Convert polygon-based instance segmentation to RLE storage format.

Parameters:

sample – A SegmentationPolygons dataclass, or a legacy dict (assumed to be in pixel coordinates).

Returns:

Wire-format dict with image dimensions, instance properties, and RLEs.

class TorchTensorSampleType

Bases: tlc.sample_types._sample_type.SampleType

Inline transform for torch tensors.

Tensors are converted to a 1D numpy view (reshape(-1) for multi-dim, passthrough for 1D). The wrapping Schema reshapes back to its declared dims on read, so the sample type itself is shape-blind.

accepts(
value: Any,
) bool
from_row(
data: Any,
) Tensor
to_row(
sample: Tensor,
) Any
validate_sample(
sample: Any,
) list[ValidationError]
class ValidationError

A validation error or warning found during data validation.

Variables:
  • path – Dot-separated path to the problematic field (e.g., "instances.0.bbs_2d"). Empty string for errors at the root level.

  • message – Human-readable description of the issue.

  • severity"error" for hard failures, "warning" for non-critical issues.

message: str = None
path: str = None
severity: str = error
class WebpImageSampleType

Bases: tlc.sample_types._image.PILImageSampleType

PIL.Image stored as WEBP. See PILImageSampleType for the contract.

file_extension = .webp
get_sample_types() list[str]

List all registered sample type names.

Returns:

Sorted list of registered sample type names.

register_sample_type(
name: str,
*,
force: bool = False,
) Callable[[type[SampleType]], type[SampleType]]

Decorator to register a SampleType subclass by name.

Parameters:
  • name – The sample type name used in schema sample_type configuration.

  • force – If True, replace an existing registration for name. A warning is logged when overriding a built-in sample type.

Returns:

A decorator that registers the class and returns it unchanged.

Example::

import tlc

@tlc.sample_types.register_sample_type("pil_image")
class PILImage(SampleType):
    def to_row(self, sample):
        ...

    def from_row(self, data):
        ...