SampleType and structure
#
When creating a Table
using any of the Table.from_<format>
methods, you have the option to provide a structure
argument. To understand how and when to use this argument, let’s first take a step back and understand what 3LC needs to
properly visualize your data in the Dashboard.
Schema#
As outlined in Objects, URLs and Schemas, 3LC uses a schema to understand the
content of your data. If a Table
contains paths to image files, for instance, the schema could specify that these
strings are in fact paths to images, and the Dashboard would then be able to display these images rather than just the
strings.
Sample vs Row#
Table
objects are, as the name would suggest, tabular. This means that they are made up of some number of rows, which
all share the same named columns. In Python terms, a Table
row is a dictionary, where the keys are the column names.
In the real world, however, data is not always represented this way. This is why we distinguish between the Sample
view of the data and the Row
view of the same data. The Sample
view is how your data appears in your code, be it a
list of bounding boxes or a tuple of NumPy arrays and PIL Images. The Row
view is how your data is actually stored in
the Table
, with anonymous fields given names and large bulk data like images replaced with URLs. To be visualized in
the Dashboard and edited, samples must be converted to rows.
SampleType
#
A SampleType
is used to both define the schema of a Table
and to convert
samples to rows and back. SampleType
objects are structured like trees, where the leaves describe the values that
make up your samples, like images or integers, and the internal nodes describe how these values are combined to
form a model-ready sample. If our samples are tuples of images and integer labels, where we want those integer labels
to be displayed according to a certain mapping in the Dashboard, we would define a SampleType
like this:
sample_type = tlc.HorizontalTuple(
name="sample",
children=[
tlc.Image("image"),
tlc.CategoricalLabel("label", classes=["cat", "dog"])
]
)
If our samples consisted of lists of two such tuples, our sample type would look like this:
sample_type = tlc.HorizontalList(
name="sample",
children=[
tlc.HorizontalTuple(
name="tuple_1",
children=[
tlc.Image("image"),
tlc.CategoricalLabel("label", classes=["cat", "dog"])
]
),
tlc.HorizontalTuple(
name="tuple_2",
children=[
tlc.Image("image"),
tlc.CategoricalLabel("label", classes=["bird", "fish"])
]
)
]
)
structure
#
The full definition of a SampleType
is often quite verbose. This is why most functions that accept a SampleType
also
accept a structure
argument. This argument is a more concise definition of a SampleType
that is easier to read and
write. The two SampleTypes defined above could be written as follows:
structure_1 = (tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["cat", "dog"]))
structure_2 = [
(tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["cat", "dog"])),
(tlc.PILImage("image"), tlc.CategoricalLabel("label", classes=["bird", "fish"]))
]