Glossary¶

Project

Project is the top-level directory in the 3LC ecosystem, representing a complete machine learning project. It is the primary organizational unit, encompassing all associated Runs, Datasets, and other data elements.

Dataset

A Dataset, in the context of 3LC, is a collection of Tables (with any associated bulk data), that are related to each other in some way.

Table

A table is a collection of samples belonging to a dataset. It is the fundamental data structure in 3LC. Tables are used as a means to translate and share data between the Dashboard and the Python package. They are also used to store metrics and other metadata. Tables are further described in the User Guide and the Python class tlc.Table.

Schema

A Schema describes an attribute of an object. This entails not only describing the data type of the attribute, but also the dimensionality, size, display parameters, and more. Schemas are further described in the User Guide and tlc.Schema in the Python API Documentation.

Sample

A sample is a single data instance or observation from a dataset. During training, a machine learning algorithm learns from the patterns present in the samples to create a model that can generalize to unseen data. In 3LC, samples are made available through using __getitem__ on a tlc.Table, e.g. table[0] for the first sample. In Python, the Sample View is the user-facing data representation of a Row.

Row

In 3LC, a row is the internal representation of a sample, which is the data visualized in the 3LC Dashboard. Where Samples can have arbitrary Python data, row data must be serializable. For example, images in 3LC are always a string in the row representation, but the sample representation could be a string, or a numpy array or PIL image with the pixel values after opening the image. If no SampleType is defined, the row and sample views are the same. To access the rows of a Table in Python, use Table.table_rows.

Weight

The weight of a sample refers to a numerical value assigned to that sample to indicate its relative importance or contribution during the training phase. Weights are described further in the User Guide.

Lineage

The lineage of a Table is the set of all tables that have been derived from it or it has been derived from. This includes tables that have been created by applying operations to the table, such as filtering, joining, or reducing, as well as tables that were used as input to create the table.

Run

An instance of an experiment or model training session within a project. Runs are stored under the runs directory in the project folder, each containing its own data, metrics, and results. Runs are described further in the User Guide and tlc.Run in the Python API Documentation.

Metric

Metrics are quantitative measures used to evaluate the performance and effectiveness of a machine learning model.

In 3LC, metrics are collected at a user-specified interval, and they are plotted against each other in the Dashboard. To learn more, see Collecting Metrics in the User Guide.

Embeddings

Embeddings are intermediate activations of a layer of your model. Embeddings can be very useful to extract and visualize, as they show you how your model has learned to interpret your data. It will highlight which data points it considers similar, which can drive the decisions you make about the dataset. 3LC provides utilities to extract the embeddings, reduce their dimensionality and interact with them through charts in the Dashboard. To learn more about embeddings in 3LC, check out Embeddings in the User Guide.

Revision

A dataset revision has had one or more modification applied, so that its appearance during training is different than it was originally. A revision is itself a Table, defined as the result of applying a set of changes to one or more input Tables. This way the initial training data is not duplicated or modified, and the changes are applied in-memory when the revised data is accessed. Examples of modifications are:

  • Adding or removing labels

  • Changing bounding boxes

  • Adjusting weights, including disabling samples

Bulk Data

Bulk Data is any input data required by a Run or Table that is stored by reference in the serialized form of the object. For example, a dataset may contain a table with a column of file paths to images. The images themselves are considered bulk data, external to the Table, and stored separately. Learn more about Bulk Data in the User Guide.

Alias

Aliasing is mechanism to abstract storage locations of 3LC Objects and their bulk data. Concretely, aliasing is a way of replacing the leading part of a URL with a shorter, sharable name. This is useful when sharing objects between different projects or users, or when moving objects and bulk data between different storage locations. Learn more about aliases in the User Guide.

Dashboard

The 3LC Dashboard is a web application that visualizes the samples and metrics from a Run. Read more about the Dashboard in the dedicated Dashboard Documentation.

Object Service

HTTP server that serves Runs and Tables to the Dashboard, and creates new Tables when edits are committed. The Object Service is designed to run in your infrastructure, for example on your laptop. To learn more, see the Python Package Documentation.

Commit

A commit can be created in the Dashboard in the pending edits menu in the top right. When a commit is created, a new sparse Revision Table is created with only the changes and a reference to the Table whose data to apply the changes to. To learn more, see Revisions in the User Guide.

CLI

The Command Line Interface (3LC CLI) is a unified tool to manage launching of the Object Service, data export, and configuration of 3LC. It also allows for such tasks to be automated from scripts. See the 3LC CLI Manual for more details.

Editable Column

Editable columns are columns whose data can be modified in the 3LC Dashboard, and are highlighted with a slightly different background color. Typically the dataset columns will be editable, while metrics data is not editable.

Virtual Column

Virtual columns are columns whose value(s) are computed based one or more other columns. They are not persisted when closing the Dashboard. They can be computed by selecting one or more columns and right-clicking one of them, then selecting the desired operation to apply. Virtual columns can be used to produce new virtual columns. To learn more, see Virtual Columns in the Dashboard Documentation.

Composite/Array Column

Some columns hold a set of values for each row in the table, such as embeddings and bounding boxes. These have special behavior when used in charts, and usually specific operations.

Reduced Table

A reduced table is a view of the data where the rows have been reduced based on the value of one or more columns. Reduced tables can be created by right clicking a column and selecting Create Reduced Table. To learn more, see Reduced Tables in the Dashboard Documentation.

Charts

A chart is a visualization in the Charts Panel. Charts can be created by selecting one or more columns, and either pressing 2 or 3 or right clicking and selecting Create 2D/3D chart. To learn more, see Charts in the Dashboard Documentation.

Filters

Filters can be applied to any table by using the Filters Panel on the left. Several filters can be applied at the same time. To learn more, see Filters in the Dashboard Documentation.