SampleType and structure#

When creating a Table using any of the Table.from_<format> methods, you have the option to provide a structure argument. To understand how and when to use this argument, let’s first take a step back and understand what 3LC needs to properly visualize your data in the Dashboard.

Schema#

As outlined in Objects, URLs and Schemas, 3LC uses a schema to understand the content of your data. If a Table contains paths to image files, for instance, the schema could specify that these strings are in fact paths to images, and the Dashboard would then be able to display these images rather than just the strings.

Sample vs Row#

Table objects are, as the name would suggest, tabular. This means that they are made up of some number of rows, which all share the same named columns. In Python terms, a Table row is a dictionary, where the keys are the column names. In the real world, however, data is not always represented this way. This is why we distinguish between the Sample view of the data and the Row view of the same data. The Sample view is how your data appears in your code, be it a list of bounding boxes or a tuple of NumPy arrays and PIL Images. The Row view is how your data is actually stored in the Table, with anonymous fields given names and large bulk data like images replaced with URLs. To be visualized in the Dashboard and edited, samples must be converted to rows.

SampleType#

A SampleType is used to both define the schema of a Table and to convert samples to rows and back. SampleType objects are structured like trees, where the leaves describe the values that make up your samples, like images or integers, and the internal nodes describe how these values are combined to form a model-ready sample. If our samples are tuples of images and integer labels, where we want those integer labels to be displayed according to a certain mapping in the Dashboard, we would define a SampleType like this:

sample_type = tlc.HorizontalTuple(
    name="sample",
    children=[
        tlc.Image("image"),
        tlc.CategoricalLabel("label", classes=["cat", "dog"])
    ]
)

If our samples consisted of lists of two such tuples, our sample type would look like this:

sample_type = tlc.HorizontalList(
    name="sample",
    children=[
        tlc.HorizontalTuple(
            name="tuple_1",
            children=[
                tlc.Image("image"),
                tlc.CategoricalLabel("label", classes=["cat", "dog"])
            ]
        ),
        tlc.HorizontalTuple(
            name="tuple_2",
            children=[
                tlc.Image("image"),
                tlc.CategoricalLabel("label", classes=["bird", "fish"])
            ]
        )
    ]
)

structure#

The full definition of a SampleType is often quite verbose. This is why most functions that accept a SampleType also accept a structure argument. This argument is a more concise definition of a SampleType that is easier to read and write. The two SampleTypes defined above could be written as follows:

structure_1 = (tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["cat", "dog"]))

structure_2 = [
    (tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["cat", "dog"])),
    (tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["bird", "fish"]))
]