Frequently Asked Questions#
What is 3LC?#
3LC is a unique platform revolutionizing machine learning by delivering granular, real-time insights into model training and data interactions. Seamlessly integrating with leading ML frameworks like PyTorch, 3LC equips data scientists with unprecedented control over data diagnosis and correction.
Unique Features:
Detailed per-sample, per-epoch metric recording to illuminate how the model interacts and learns from the data.
Real-time, interactive visual diagnosis and correction of data issues, enhancing data quality and model precision.
Innovative data management solution seamlessly merging original data with modifications, simplifying version control and avoiding data duplication and data movement.
Business Value:
Superior model performance: With enhanced data quality and insight into model-data interactions, models trained with 3LC are more accurate and reliable.
Enhanced efficiency: Real-time visual diagnosis and correction streamline the ML process, saving valuable time and reducing operational costs.
Better decision-making: Granular insight into the training process equips teams with data-driven evidence to optimize models.
3LC transforms machine learning from a traditionally opaque, “black-box” process into an open, interactive experience. It offers data scientists unparalleled control and deep insight into their models and the key model-data interactions, positioning 3LC as a groundbreaking solution in the ML landscape. By improving the fitness and correctness of datasets and enhancing the efficiency of model training, 3LC provides a substantial competitive edge in today’s data-driven business environment.
Is it spelled 3LC or tlc
?#
The company and product are both called 3LC. The lowercase tlc
name is only used when it is required to avoid
starting with a number, such as in the naming of a Python module or an environment variable.
Why do I have to install PyTorch separately?#
If you install 3LC in an environment without PyTorch and attempt to import the tlc
-package you will get an exception
complaining that TorchVision needs to be installed:
3LC requires torch and torchvision to be installed. A suitable version can be found at https://pytorch.org/
The reason 3LC does not depend on torch (and thus doesn’t automatically install it) is that PyTorch is available in several versions targeting different compute platforms. Most users will likely have already installed their preferred version of torch, and we do not want to modify their environment.
How can I export data from a 3LC Table?#
After the of successful usage of 3LC to modify at dataset, one might want to export the modified data out of a 3LC Table and into a common format such as CSV or Coco. This can be achieved either by using the 3LC CLI or through the Python API.
Using the 3LC CLI
In a terminal (with the tlc
Python Package installed), the command line tool can be invoked as follows:
$ 3lc export path/to/table.json <output-path>
The output format will be deduced from the extension of <output-path>
and the contents of the table, but can also be
explicitly specified using the --format
option.
Using the Table.export
method
The Table.export
-method provide a simple interface for exporting a
Table
directly from a Python notebook.
table = Table.from_url(input_url)
table.export(output_path)
The export method will deduce the output format from the extension of the output-path and the contents of the table.
How do I run the Dashboard and the Object Service on different machines?#
Security
For the current release there has been little focus on security in the Object Service. The default listening host of the Object Service is 127.0.0.1 because of this. Opening the Object Service to the internet has security implications and should be used with care.
To ensure seamless integration between your Dashboard app and the Object
Service, it is crucial to properly configure the network settings. By default,
the Dashboard app expects the Object Service to be accessible at
http://localhost:5051
. However, when hosting these on separate servers, you will
need to specify the correct address for the Object Service to the Dashboard.
Starting the Dashboard: If your Dashboard is hosted on a server (e.g.,
192.168.1.2
), you can start it with the following command:3lc-dashboard --host 192.168.1.2 --port 80
When accessed from a browser, the Dashboard will attempt to connect to the Object Service on
localhost:5051
of the browsing host by default.Configuring the Object Service Location: To direct the Dashboard to an Object Service hosted on a different machine (e.g.,
172.16.5.6:8080
), use the--object-service
option:3lc-dashboard --host 192.168.1.2 --port 80 --object-service http://172.16.5.6:8080
With this configuration, a browser on host
10.10.3.4
viewinghttp://192.168.1.2
will reach out to the Object Service athttp://172.16.5.6:8080
.Setting the Host for the Object Service: Ensure that the Object Service is configured to receive traffic on the correct network interface. This could be a specific IP address or a wildcard IP to accept requests on all interfaces:
3lc --host 0.0.0.0 --port 8080 service
By following these steps, you can successfully host your Dashboard and Object Service on separate machines, ensuring that they communicate.
What is the difference between .map
and .map_collect_metrics
?#
When collecting metrics with 3LC, you most likely want to use un-augmented data. When using .map
on your Tables,
transforms which are necessary to make the data compatible with your model are sometimes bundled together with
transforms for augmenting your data. .map_collect_metrics
lets you specify only the transforms you want 3LC to use
when performing metrics collection.
When collecting metrics with a Table which has not had .map_collect_metrics
called on it, 3LC will use the same
transforms as those used when .map
was called on the Table.
How do I create a 3LC Table from a Pandas DataFrame?#
The Table.from_pandas
method can be used to create a 3LC
Table
from a Pandas DataFrame
. If you want the Table
to be saved and appear in the 3LC Dashboard, you need to specify a url
for the Table.
import pandas as pd
import tlc
df = pd.read_csv("path/to/data.csv")
table = tlc.Table.from_pandas(df, url="path/to/table.json")
Why are my image transforms not applied in the Dashboard?#
If you have created a 3LC Table using the
Table.from_torch_dataset
on a TorchVision VisionDataset
which
has transforms applied to it, you might notice that the transforms are not applied when viewing the data in the
Dashboard. Because the transforms
of a VisionDataset
might contain augmentations, or conversion from a
PIL.Image.Image
to a torch.Tensor
, 3LC needs to persist the untransformed samples. These are the images which are
shown in the Dashboard.
The transforms
will still be applied as expected when getting samples from the Table
object in your code, and will
still be applied when training your model. If you want to see the transformed images in the Dashboard, you can
explicitly transform the samples in the __getitem__
method of your Dataset
class, instead of using the transforms
argument of the VisionDataset
class, before calling Table.from_torch_dataset
. If you are non-deterministically
augmenting your samples, or converting PIL images to tensors, these transforms still need to be added through the
transforms
argument, and not in the __getitem__
method.