Glossary¶
- Project
Project is the top-level directory in the 3LC ecosystem, representing a complete machine learning project. It is the primary organizational unit, encompassing all associated Runs, Datasets, and other data elements.
- Dataset
A Dataset, in the context of 3LC, is a collection of Tables (with any associated bulk data), that are related to each other in some way.
- Table
A table is a collection of samples belonging to a dataset. It is the fundamental data structure in 3LC. Tables are used as a means to translate and share data between the Dashboard and the Python package. They are also used to store metrics and other metadata. Tables are further described in the User Guide and the Python class
tlc.Table.- Schema
A Schema describes an attribute of an object. This entails not only describing the data type of the attribute, but also the dimensionality, size, display parameters, and more. Schemas are further described in the User Guide and
tlc.Schemain the Python API Documentation.- Sample
A sample is a single data instance or observation from a dataset. During training, a machine learning algorithm learns from the patterns present in the samples to create a model that can generalize to unseen data. In 3LC, samples are made available through using
__getitem__on atlc.Table, e.g.table[0]for the first sample. In Python, the Sample View is the user-facing data representation of a Row.- Row
In 3LC, a row is the internal representation of a sample, which is the data visualized in the 3LC Dashboard. Where Samples can have arbitrary Python data, row data must be serializable. For example, images in 3LC are always a string in the row representation, but the sample representation could be a string, or a numpy array or PIL image with the pixel values after opening the image. If no
SampleTypeis defined, the row and sample views are the same. To access the rows of aTablein Python, useTable.table_rows.- Weight
The weight of a sample refers to a numerical value assigned to that sample to indicate its relative importance or contribution during the training phase. Weights are described further in the User Guide.
- Lineage
The lineage of a Table is the set of all tables that have been derived from it or it has been derived from. This includes tables that have been created by applying operations to the table, such as filtering, joining, or reducing, as well as tables that were used as input to create the table.
- Run
An instance of an experiment or model training session within a project. Runs are stored under the
runsdirectory in the project folder, each containing its own data, metrics, and results. Runs are described further in the User Guide andtlc.Runin the Python API Documentation.- Metric
Metrics are quantitative measures used to evaluate the performance and effectiveness of a machine learning model.
In 3LC, metrics are collected at a user-specified interval, and they are plotted against each other in the Dashboard. To learn more, see Collecting Metrics in the User Guide.
- Embeddings
Embeddings are intermediate activations of a layer of your model. Embeddings can be very useful to extract and visualize, as they show you how your model has learned to interpret your data. It will highlight which data points it considers similar, which can drive the decisions you make about the dataset. 3LC provides utilities to extract the embeddings, reduce their dimensionality and interact with them through charts in the Dashboard. To learn more about embeddings in 3LC, check out Embeddings in the User Guide.
- Revision
A dataset revision has had one or more modification applied, so that its appearance during training is different than it was originally. A revision is itself a Table, defined as the result of applying a set of changes to one or more input Tables. This way the initial training data is not duplicated or modified, and the changes are applied in-memory when the revised data is accessed. Examples of modifications are:
Adding or removing labels
Changing bounding boxes
Adjusting weights, including disabling samples
- Bulk Data
Bulk Data is any input data required by a Run or Table that is stored by reference in the serialized form of the object. For example, a dataset may contain a table with a column of file paths to images. The images themselves are considered bulk data, external to the Table, and stored separately. Learn more about Bulk Data in the User Guide.
- Alias
Aliasing is mechanism to abstract storage locations of 3LC Objects and their bulk data. Concretely, aliasing is a way of replacing the leading part of a URL with a shorter, sharable name. This is useful when sharing objects between different projects or users, or when moving objects and bulk data between different storage locations. Learn more about aliases in the User Guide.
- Dashboard
The 3LC Dashboard is a web application that visualizes the samples and metrics from a Run. Read more about the Dashboard in the dedicated Dashboard Documentation.
- Object Service
HTTP server that serves Runs and Tables to the Dashboard, and creates new Tables when edits are committed. The Object Service is designed to run in your infrastructure, for example on your laptop. To learn more, see the Python Package Documentation.
- Commit
A commit can be created in the Dashboard in the pending edits menu in the top right. When a commit is created, a new sparse Revision Table is created with only the changes and a reference to the Table whose data to apply the changes to. To learn more, see Revisions in the User Guide.
- CLI
The Command Line Interface (3LC CLI) is a unified tool to manage launching of the Object Service, data export, and configuration of 3LC. It also allows for such tasks to be automated from scripts. See the 3LC CLI Manual for more details.
- Editable Column
Editable columns are columns whose data can be modified in the 3LC Dashboard, and are highlighted with a slightly different background color. Typically the dataset columns will be editable, while metrics data is not editable.
- Virtual Column
Virtual columns are columns whose value(s) are computed based one or more other columns. They are not persisted when closing the Dashboard. They can be computed by selecting one or more columns and right-clicking one of them, then selecting the desired operation to apply. Virtual columns can be used to produce new virtual columns. To learn more, see Virtual Columns in the Dashboard Documentation.
- Composite/Array Column
Some columns hold a set of values for each row in the table, such as embeddings and bounding boxes. These have special behavior when used in charts, and usually specific operations.
- Reduced Table
A reduced table is a view of the data where the rows have been reduced based on the value of one or more columns. Reduced tables can be created by right clicking a column and selecting
Create Reduced Table. To learn more, see Reduced Tables in the Dashboard Documentation.- Charts
A chart is a visualization in the Charts Panel. Charts can be created by selecting one or more columns, and either pressing
2or3or right clicking and selectingCreate 2D/3D chart. To learn more, see Charts in the Dashboard Documentation.- Filters
Filters can be applied to any table by using the Filters Panel on the left. Several filters can be applied at the same time. To learn more, see Filters in the Dashboard Documentation.