Data LayoutΒΆ
The 3LC ecosystem uses a hierarchical structure to organize the data it creates, storing objects like Runs
and Tables in a folder-based layout, grouped into Projects and
Datasets. The tlc Python package and 3LC Dashboard provide ways to create, modify and retrieve these
objects. When creating objects we therefore specify a project_name, for Runs a run_name and for Tables
dataset_name and table_name.
projects
βββ index.3lc.json
βββ <project_name>
βββ index.3lc.json
βββ default_aliases.3lc.yaml
βββ datasets
β βββ <dataset_1>
β β βββ tables
β β β βββ <table_1>
β β β β βββ object.3lc.json
β β β β βββ row_cache.parquet
β β β βββ <table_2>
β β β ...
β β βββ bulk_data
β βββ <dataset_2>
β βββ tables
β βββ bulk_data
β ...
βββ runs
βββ <run_1>
β βββ object.3lc.json
β βββ metric_0000
β βββ object.3lc.json
βββ <run_2>
...
...
The root of the 3LC project folder structure is the projects directory. Each project is a subdirectory of projects
and contains all the data and metadata associated with a specific machine learning project.
projects
βββ index.3lc.json
βββ <project_name>
βββ index.3lc.json
βββ default_aliases.3lc.yaml
βββ datasets
β βββ <dataset_1>
β β βββ tables
β β β βββ <table_1>
β β β β βββ object.3lc.json
β β β β βββ row_cache.parquet
β β β βββ <table_2>
β β β ...
β β βββ bulk_data
β βββ <dataset_2>
β βββ tables
β βββ bulk_data
β ...
βββ runs
βββ <run_1>
β βββ object.3lc.json
β βββ metric_0000
β βββ object.3lc.json
βββ <run_2>
...
...
Within each project, there can any number of datasets. Each dataset holds some number of tables, where each
corresponds to a revision of that dataset.
projects
βββ index.3lc.json
βββ <project_name>
βββ index.3lc.json
βββ default_aliases.3lc.yaml
βββ datasets
β βββ <dataset_1>
β β βββ tables
β β β βββ <table_1>
β β β β βββ object.3lc.json
β β β β βββ row_cache.parquet
β β β βββ <table_2>
β β β ...
β β βββ bulk_data
β βββ <dataset_2>
β βββ tables
β βββ bulk_data
β ...
βββ runs
βββ <run_1>
β βββ object.3lc.json
β βββ metric_0000
β βββ object.3lc.json
βββ <run_2>
...
...
A run is the 3LC object used to store hyperparameters, sample metrics and any other data related to a process that
produces some kind of output, often a training run.
The 3lc.object.json files
3LC objects are always represented as folder locations. Internally, the 3LC API uses a file called object.3lc.json to
store the serialized object. This is similar to the index.html file used to represent a folder in a web server. The
3lc.object.json file is always located in the same folder as the object it represents, and the location of the object
is the path to the folder containing the 3lc.object.json file. The file is automatically created and managed by the
3LC API, and users should not need to interact with it directly.
The index.3lc.json files
The primary job of the Object Service, outside of communicating the Dashboard, is indexing 3LC objects in the
project locations it is configured to scan. To avoid recursing through and opening every file in every project to pick
up changes, any 3LC code that produces or edits a 3LC Object will touch the index.3lc.json file for that project. When
the Object Service encounters this changed index.3lc.json, it will reindex this location and reflect the changes in
the data it sends to the Dashboard. To learn more about the indexing system, see the in-depth documentation of the
Object Service. The index.3lc.json files are automatically created and managed by the 3LC API, and
users should not need to interact with it directly.
bulk_data directories
While 3LC aims to avoid copying your data, in some cases it needs to be serialized because it is in-memory and not
backed by a file on persistent storage. For example, when recording predicted semantic segmentation masks, the
predictions are stored on disk in PNG files. In these cases, 3LC will store this data under bulk_data.
row_cache.parquet
3LC supports caching the row data of any given Table, which is useful when it is expensive to repeatedly
produce the data. An example is when you have made many revisions to your data, and many tables reference each other
which have call into each other to build the current Table. In the folder structure, within a Table, you will
sometimes therefore see a file row_cache.parquet, which can be loaded into memory quickly.
default_aliases.3lc.yaml
Any local configuration of 3LC can define a set of aliases. In addition, it is possible to define project default aliases, which will apply as a fallback for anyone scanning the project without the alias defined themselves. This is useful in settings where multiple people work on a project, and the referenced data (such as images) are available in a shared location referenced by the default alias. Read more about default aliases in the document on Sharing, and see the Deployment Examples for concrete examples.