ImportΒΆ

Tables can be thought of as a recipe to generate the rows of a dataset. Some Tables therefore refer to input data and declare the schema of their data, and are commonly referred to as importer tables.

Other Tables are a result of applying an operation to input Table(s), such as filtering away rows, applying edits or joining two tables together. These tables are referred to as procedural tables, and we will learn more about them in Revisions.

This page serves as an index into the different import methods, with simple examples, caveats and links to the API documentation for more details.

Importer Table.from_* parameters

The importer Table.from_* methods all offer a more or less identical interface outside of the parameters specific to that import format. The parameters common to each method are described here:

  • project_name: The project name should be something describing the model and/or dataset you are working on.

  • dataset_name: A descriptive name of a group of samples in your project, such as the split of the dataset.

  • table_name: The revision of your dataset. Often "initial" is a good value for new Tables.

  • root_url: Override for the configured project root url, to save the Table in a different location.

  • if_exists: What to do if a table with the same root url, project, dataset and table name already exist. The default is reuse, which means the existing Table is returned. It is also possible to use raise, which is useful when you expect there not to exist a Table, rename to add a suffix like _0000 and create a new Table, or overwrite to delete any existing Table and make a new one.

  • add_weight_column: Whether to add a column for sample weights, only visible in the Dashboard.

  • weight_column_value: The value to assign each weight to, if a weight column is to be added.

  • description: A description for the Table, which will be shown in the DESCRIPTION column in the Dashboard. You can think of this as a commit message.

  • extra_columns: The structure of any extra columns to add to the Table.

  • input_tables: URLs to any existing Tables to declare as inputs. These will get arrows to the Table being created in the LINEAGE column in the Dashboard.

  • table_url: Instead of providing a project_name, dataset_name, table_name and (optionally) a root_url, it is possible to instead provide the mutually exclusive table_url, which writes the Table to a completely custom location.

If your dataset is a Python dictionary of Python lists with column values (often called β€œstruct of arrays”), use tlc.Table.from_dict(). Specify a structure to tell 3LC how to interpret the data.

import tlc

data = {"image": ["/path/to/image0.png", "/path/to/image1.png", ...]}

table = tlc.Table.from_dict(
    data=data,
    structure=...,
    project_name="Images From Dict Project",
    dataset_name="train",
    table_name="initial",
)

If your dataset is an iterable of Rows or data in the Sample view (often called β€œarray of structs”), for example a Torch Dataset, use tlc.Table.from_torch_dataset(). Specify a structure to tell 3LC how to interpret the data.

import tlc

table = tlc.Table.from_torch_dataset(
    dataset=data,
    structure=...,
    project_name="Images From Iterable Project",
    dataset_name="train",
    table_name="initial",
)

Image Folder datasets are structured as image files in directories for each category in the dataset, like what is defined in the commonly used torchvision.datasets.ImageFolder, and in the following example:

root/
  β”œβ”€β”€ dog/
  β”‚  β”œβ”€β”€ image1.jpg
  β”‚  β”œβ”€β”€ image2.jpg
  β”‚  └── ...
  └── cat/
     β”œβ”€β”€ image1.jpg
     β”œβ”€β”€ image2.jpg
     └── ...

Use the method Table.from_image_folder() to create a tlc.Table from such an image folder dataset.

import tlc

table = tlc.Table.from_image_folder(
    root="root",
    project_name="Image Folder Dataset",
    dataset_name="train",
    table_name="initial",
)

For datasets in the COCO format, use tlc.Table.from_coco() and provide your task:

import tlc

table = tlc.Table.from_coco(
    annotations_file="path/to/annotations.json",
    image_folder="/path/to/images",
    project_name="My COCO Project",
    dataset_name="train",
    table_name="initial",
    task="detect",
)

The resulting tlc.Table has a column that references the images named image, and a column with ground truth labels whose name, associated Schema and data depends on which task is provided. See Computer Vision Columns for more details.

For datasets in the YOLO format, use tlc.Table.from_yolo() and provide your task. We recommend creating one tlc.Table for each split, in a loop like this:

import tlc

for split in ("train", "val", "test"):
    table = tlc.Table.from_yolo(
        dataset_yaml_file="path/to/dataset.yaml",
        project_name="My YOLO Project",
        dataset_name=split,
        table_name="initial",
        task="detect",
    )

The resulting tlc.Table has a column that references the images named image, and a column with ground truth labels whose name, associated Schema and data depends on which task is provided. See Computer Vision Columns for more details.

Use the method tlc.Table.from_hugging_face() to create a tlc.Table from a dataset available through the Hugging Face datasets package. The Hugging Face Dataset is downloaded, and the Features of the Dataset are mapped to a corresponding tlc.Schema.

import tlc

table = tlc.Table.from_hugging_face(
    path="beans",
    split="train",
    project_name="Beans Project",
    dataset_name="train",
    table_name="initial",
)

If your data is in a CSV (comma-separated values) file, use tlc.Table.from_csv().

import tlc

table = tlc.Table.from_csv(
    csv_file="path/to/file.csv",
    project_name="My CSV Project",
    dataset_name="split",
    table_name="initial",
)

If your data is in a pandas.DataFrame, use tlc.Table.from_pandas(). Provide a structure to indicate what each row contains.

import tlc
import pandas as pd

df = pd.DataFrame(
    data={
        "name": ["Sam Pullweight", "Max Epoch", "Minnie Epoch"], 
        "grade": [9, 8, 10],
    }
)

table = tlc.Table.from_pandas(
    df=df,
    structure={"name": tlc.String, "grade": tlc.Int},
    project_name="My Pandas Project",
    dataset_name="train",
    table_name="initial",
)

If your data is in a Apache Parquet file, use tlc.Table.from_parquet().

import tlc

table = tlc.Table.from_parquet(
    parquet_file="path/to/file.parquet",
    project_name="My Parquet Project",
    dataset_name="train",
    table_name="initial",
)

NDJSON (Newline-Delimited JSON) is a format where each line of a file contains a JSON object. For example, the following is a valid NDJSON file:

{"name": "Sam Pullweight", "grade": 9}
{"name": "Max Epoch", "grade": 8}
{"name": "Minnie Epoch", "grade": 10}

To create a Table from such a file, use tlc.Table.from_ndjson().

import tlc

table = tlc.Table.from_ndjson(
    ndjson_file="path/to/file.ndjson",
    project_name="My NDJSON Project",
    dataset_name="split",
    table_name="initial",
)

YOLO NDJSON is an alternative way to define datasets for Ultralytics YOLO models. The format stores metadata and labels in a single file, with a defined structure.

The following example is how the format is expected in 3LC. The first line is reserved for metadata, and the subsequent lines correspond to images and their labels.

{"task": "detect", "class_names": {"0": "cat", "1": "dog"}, "description": "Cats and Dogs"}
{"file": "image0.png", "width": 1280, "height": 920, "split": "train", "annotations": ["..."],}
{"file": "image1.png", "width": 640, "height": 480, "split": "train", "annotations": ["..."],}
{"file": "image2.png", "width": 1280, "height": 920, "split": "val", "annotations": ["..."],}
"..."

The format of the annotations depends on the task, refer to the Ultralytics Documentation for details.

To create a Table from a YOLO NDJSON file, use tlc.Table.from_yolo_ndjson(). We recommend creating one tlc.Table for each split, in a loop like this:

import tlc

for split in ("train", "val", "test"):
    table = tlc.Table.from_yolo_ndjson(
        ndjson_file="path/to/yolo.ndjson",
        image_folder="path/to/image/folder",
        split=split,
        project_name="My YOLO Project from NDJSON",
        dataset_name=split,
        table_name="initial",
    )

The image_folder parameter only needs to be used when the "file" paths in the NDJSON file are relative to some directory other than the one containing the NDJSON file.

If the JSON object in the first line of the NDJSON file contains a field "description", it will be used for the Table unless a description is provided to tlc.Table.from_yolo_ndjson().

A tlc.TableWriter is a way of producing a tlc.Table from rows or batches of Python data. Internally, it is used to write metrics.

import tlc

batches = [
    {
        "my_float": [0.0, 1.0, 2.0],
    },
    {
        "my_float": [3.0, 4.0, 5.0],
    }
]

table_writer = tlc.TableWriter(
    column_schemas={"my_float": tlc.Float32Schema()},
    project_name="Data from Table Writer Project",
    dataset_name="train",
    table_name="initial",
)

for batch in batches:
    table_writer.add_batch(batch)

table = table_writer.finalize()