Glossary#

Table

A table is a collection of samples belonging to a dataset. It is the fundamental data structure in 3LC. Tables are used as a means to translate and share data between the Dashboard and the Python package. They are also used to store metrics and other metadata. Tables are further described in Tables.

Metric

Metrics are quantitative measures used to evaluate the performance and effectiveness of a machine learning model. In 3LC, metrics are collected at a user-specified interval, and they are plotted against each other in the Dashboard.

Sample

A sample is a single data instance or observation from a dataset. During training, a machine learning algorithm learns from the patterns present in the samples to create a model that can generalize to unseen data.

Weight

The weight of a sample refers to a numerical value assigned to that sample to indicate its relative importance or contribution during the training phase.

Schema

A Schema describes an attribute of an object. This entails not only describing the data type of the attribute, but also the dimensionality, size, display parameters, and more. Schemas are further described in Objects, URLs, and Schemas.

Run

An instance of an experiment or model training session within a project. Runs are stored under the runs directory in the project folder, each containing its own data, metrics, and results.

Training Data

Training data are a set of labelled samples that are used to train a machine learning model. It consists of input data and (optionally) corresponding output labels or target values.

Revision

A dataset revision has had one or more modification applied, so that its appearance during training is different than it was originally. Examples of modifications are:

  • Adding or removing labels

  • Changing bounding boxes

  • Adjusting weights, including disabling samples

It should be noted that a revised dataset does not duplicate or modify the initial training data, which are safely stored in their original locations. 3LC stores the modifications in a lightweight format and applies them in-memory when the revised dataset is accessed.

Lineage

The lineage of a Table is the set of all tables that have been derived from it or it has been derived from. This includes tables that have been created by applying operations to the table, such as filtering, joining, or reducing, as well as tables that were used as input to create the table.

Dataset

A Dataset, in the context of 3LC, is a collection of tables (with any associated bulk data), that are related to each other in some way.

Bulk Data

Bulk Data is any input data required by a Run or Table that is stored by reference in the serialized form of the object. For example, a dataset may contain a table with a column of file paths to images. The images themselves are considered bulk data, external to the Table, and stored separately.

Alias

Aliasing is mechanism to abstract storage locations of 3LC Objects and their bulk data. Concretely, aliasing is a way of replacing the leading part of a URL with a shorter, sharable name. This is useful when sharing objects between different projects or users, or when moving objects and bulk data between different storage locations.

Dashboard

The 3LC Dashboard is a web application that visualizes the samples and metrics from a Run.

Object Service

The 3LC Object Service is an HTTP service that lets notebooks and Python scripts communicate with the 3LC Dashboard. The object service is designed to run locally.

3LC CLI

The 3LC Command Line Interface (3LC CLI) is a unified tool to manage launching of the Object Service, data export, and configuration of 3LC. It also allows for such tasks to be automated from scripts. See the 3LC CLI Manual for more details.

Editable Column

Editable columns are columns whose data can be modified in the 3LC Dashboard, and are highlighted with a slightly different background color. Typically the dataset columns will be editable, while metrics data is not editable.

Virtual Column

Virtual columns are columns whose value(s) are computed based one or more other columns. They are not persisted when closing the Dashboard. They can be computed by selecting one or more columns and right-clicking one of them, then selecting the desired operation to apply. Virtual columns can be used to produce new virtual columns.

Composite/Array Column

Some columns hold a set of values for each row in the table, such as embeddings and bounding boxes. These have special behavior when used in charts, and usually specific operations.

Reduced Table

A reduced table is a view of the data where the rows have been reduced based on the value of one or more columns. Reduced tables can be created by right clicking a column and selecting Create Reduced Table.

Charts

A chart is a visualization in the Charts Panel. Charts can be created by selecting one or more columns, and either pressing 2 or 3 or right clicking and selecting Create 2D/3D chart.

Filters

Filters can be applied to any table by using the Filters Panel on the left. Several filters can be applied at the same time.

Project

Project is the top-level directory in the 3LC ecosystem, representing a complete machine learning project. It is the primary organizational unit, encompassing all associated Runs, Datasets, and other data elements.