3LC Customer Managed Architecture#

3LC (3 Lines of Code) is a tool for understanding and improving machine learning (ML) models and datasets. Powerful visualization tools, collection and analysis of custom metrics, and seamless editing of your dataset can all be unlocked with just 3 Lines of Code.

This document describes the software architecture of the 3LC system using the customer managed enterprise deployment, whether hosted in a private cloud or on-prem. The intended audience of this document is IT administrators tasked with deploying 3LC in such a customer managed environment. It is not intended as an end-user guide for using the system. For that, see the 3LC Documentation.

There is also a public SaaS version of 3LC where components are loaded over the public internet. The public SaaS system is beyond the scope of this document.

Introduction#

3LC is designed to integrate into an existing machine learning workflow, where data-collection, data labelling, model training and deployment are already in place.

3LC hooks into an already existing Python script or notebook for machine learning, and turns model training into an iterative and interactive process where a data scientist can analyze and modify the training data. The end result is an improved model and dataset that will later be deployed to production, with increased performance and/or accuracy.

Infographic

Context#

Technically, a data scientist will access the 3LC system, while also having access to a ML training script. The training and 3LC both need to have access to the same storage backend, which stores the initial training data, metrics collected during training, and potentially revisions of the training data created by 3LC.

3LC provides a thin and sparse system to create revisions to training data. The source data remains unchanged, and 3LC provides sparse revisions on top of the source data, similar to source control systems such as Git.

../../_images/context.png

Context Diagram for 3LC#

Since 3LC is designed to be deployed into exiting ML workflows, it is quite flexible in how it is deployed. It is possible to deploy the system both on end-user workstations, on-prem (GPU) nodes, as well as on cloud infrastructure.

When the customer managed deployment, no communication occurs outside the enterprise network, other than a license check at startup of the 3LC Object Service.

Building Blocks#

The 3LC system consists of four building blocks:

  1. The tlc Python module which will be imported in a training script. It hooks into machine learning frameworks and provides functionality to capture metrics during training, as well as providing support for the 3LC data format used for creating revised training datasets.

  2. The 3LC Dashboard which is a web-application that lets the user analyze training data and results, as well as make revisions to the training data. The web-application runs in the user’s web browser.

  3. The 3LC Object Service which allows the 3LC Dashboard to access the training data and metrics. It needs to run on a system that has access to the same storage backend as the training.

  4. The 3LC Dashboard Service which serves the 3LC Dashboard to the user’s browser. Once the Dashboard has been served, this service does not play a further role when using 3LC.

All of these components are made available by installing the 3lc-enterprise Python package, as described in the Quickstart guide.

The figure below illustrates a possible deployment scenario when deploying on a GPU node. Here, the 3LC Object Service and 3LC Dashboard Service both run on the GPU node and the user loads the 3LC Dashboard in a local browser. This requires that the relevant ports (by default 5015 and 8000) be open for access from the user’s workstation.

Since the 3LC Dashboard provides interactive 2D- and 3D-plots, it is recommended that users access the 3LC Dashboard through a local web browser to get the highest performance, instead of using remote desktop technologies to run a browser on the GPU node.

../../_images/container.png