Tabular¶

Tabular columns hold simple data: numbers, strings, booleans, and lists thereof. Use the convenience schemas to describe them — each one sets the right data type and is recognized by the Dashboard.

Scalars¶

import tlc

schema = {
    "score": tlc.schemas.Float32Schema(),
    "count": tlc.schemas.Int32Schema(),
    "is_valid": tlc.BoolSchema(),
    "name": tlc.schemas.StringSchema(),
}

Arrays (the `shape` parameter)¶

All scalar schemas accept a shape parameter to describe arrays:

schema = {
    # Fixed-size list of 10 floats
    "features": tlc.schemas.Float32Schema(shape=10),

    # Variable-length list of integers
    "token_ids": tlc.schemas.Int32Schema(shape=(-1,)),

    # Variable 2D array (e.g. variable number of rows, 3 columns)
    "points": tlc.schemas.Float32Schema(shape=(-1, 3)),

    # Fixed 4x4 matrix
    "transform": tlc.schemas.Float32Schema(shape=(4, 4)),
}

Use -1 for variable-size dimensions (numpy convention). shape=10 is shorthand for shape=(10,).

Tip

Scalar arrays vs NumPy/Torch arrays: Float32Schema(shape=...) stores Python lists and returns them as-is in sample view. To get numpy.ndarray or torch.Tensor objects in sample view, pass sample_type="numpy_array" or sample_type="torch_tensor". For file-backed storage of large arrays, use ExternalNumpyArraySchema or ExternalTorchTensorSchema instead. See Embeddings for details.

Categorical¶

Categorical columns map integer values to named classes. This is the most common label representation in ML. See Categorical for the full schema, Dashboard editing, and prediction assignment workflows.

schema = {
    "label": tlc.schemas.CategoricalLabelSchema(classes=["cat", "dog", "bird"]),
}

The classes parameter accepts multiple formats:

# List of names (0-indexed)
tlc.schemas.CategoricalLabelSchema(classes=["cat", "dog"])

# Dict mapping indices to names
tlc.schemas.CategoricalLabelSchema(classes={0: "cat", 1: "dog"})

# Single class
tlc.schemas.CategoricalLabelSchema(classes="binary")