ImportΒΆ
Tables can be thought of as a recipe to generate the rows of a dataset. Some Tables therefore refer to
input data and declare the schema of their data, and are commonly referred to as importer tables.
Other Tables are a result of applying an operation to input Table(s), such as filtering away rows, applying edits or
joining two tables together. These tables are referred to as procedural tables, and we will learn more about them in
Revisions.
This page serves as an index into the different import methods, with simple examples, caveats and links to the API documentation for more details.
Importer Table.from_* parameters
The importer Table.from_* methods all offer a more or less identical interface outside of the parameters specific to
that import format. The parameters common to each method are described here:
project_name: The project name should be something describing the model and/or dataset you are working on.dataset_name: A descriptive name of a group of samples in your project, such as the split of the dataset.table_name: The revision of your dataset. Often"initial"is a good value for new Tables.root_url: Override for the configured project root url, to save the Table in a different location.if_exists: What to do if a table with the same root url, project, dataset and table name already exist. The default isreuse, which means the existing Table is returned. It is also possible to useraise, which is useful when you expect there not to exist a Table,renameto add a suffix like_0000and create a new Table, oroverwriteto delete any existing Table and make a new one.add_weight_column: Whether to add a column for sample weights, only visible in the Dashboard.weight_column_value: The value to assign each weight to, if a weight column is to be added.description: A description for the Table, which will be shown in theDESCRIPTIONcolumn in the Dashboard. You can think of this as a commit message.extra_columns: The structure of any extra columns to add to the Table.input_tables: URLs to any existing Tables to declare as inputs. These will get arrows to the Table being created in theLINEAGEcolumn in the Dashboard.table_url: Instead of providing aproject_name,dataset_name,table_nameand (optionally) aroot_url, it is possible to instead provide the mutually exclusivetable_url, which writes the Table to a completely custom location.
If your dataset is a Python dictionary of Python lists with column values (often called βstruct of arraysβ), use
tlc.Table.from_dict(). Specify a
structure to tell 3LC how to interpret the data.
import tlc
data = {"image": ["/path/to/image0.png", "/path/to/image1.png", ...]}
table = tlc.Table.from_dict(
data=data,
structure=...,
project_name="Images From Dict Project",
dataset_name="train",
table_name="initial",
)
If your dataset is an iterable of Rows or data in the Sample view (often called βarray of structsβ), for example a
Torch Dataset, use tlc.Table.from_torch_dataset(). Specify a
structure to tell 3LC how to interpret the data.
import tlc
table = tlc.Table.from_torch_dataset(
dataset=data,
structure=...,
project_name="Images From Iterable Project",
dataset_name="train",
table_name="initial",
)
Image Folder datasets are structured as image files in directories for each category in the dataset, like what is
defined in the commonly used torchvision.datasets.ImageFolder, and in the following example:
root/
βββ dog/
β βββ image1.jpg
β βββ image2.jpg
β βββ ...
βββ cat/
βββ image1.jpg
βββ image2.jpg
βββ ...
Use the method Table.from_image_folder() to create a
tlc.Table from such an image folder dataset.
import tlc
table = tlc.Table.from_image_folder(
root="root",
project_name="Image Folder Dataset",
dataset_name="train",
table_name="initial",
)
For datasets in the COCO format, use tlc.Table.from_coco() and
provide your task:
import tlc
table = tlc.Table.from_coco(
annotations_file="path/to/annotations.json",
image_folder="/path/to/images",
project_name="My COCO Project",
dataset_name="train",
table_name="initial",
task="detect",
)
The resulting tlc.Table has a column that references the images named image,
and a column with ground truth labels whose name, associated Schema and data depends on which task is
provided. See Computer Vision Columns for more details.
For datasets in the YOLO format, use tlc.Table.from_yolo() and
provide your task. We recommend creating one tlc.Table for each split, in a
loop like this:
import tlc
for split in ("train", "val", "test"):
table = tlc.Table.from_yolo(
dataset_yaml_file="path/to/dataset.yaml",
project_name="My YOLO Project",
dataset_name=split,
table_name="initial",
task="detect",
)
The resulting tlc.Table has a column that references the images named image,
and a column with ground truth labels whose name, associated Schema and data depends on which task is
provided. See Computer Vision Columns for more details.
Use the method tlc.Table.from_hugging_face() to create a
tlc.Table from a dataset available through the Hugging Face datasets
package. The Hugging Face Dataset is downloaded, and the
Features of the Dataset are mapped to a corresponding
tlc.Schema.
import tlc
table = tlc.Table.from_hugging_face(
path="beans",
split="train",
project_name="Beans Project",
dataset_name="train",
table_name="initial",
)
If your data is in a CSV (comma-separated values) file, use tlc.Table.from_csv().
import tlc
table = tlc.Table.from_csv(
csv_file="path/to/file.csv",
project_name="My CSV Project",
dataset_name="split",
table_name="initial",
)
If your data is in a pandas.DataFrame, use tlc.Table.from_pandas().
Provide a structure to indicate what each row contains.
import tlc
import pandas as pd
df = pd.DataFrame(
data={
"name": ["Sam Pullweight", "Max Epoch", "Minnie Epoch"],
"grade": [9, 8, 10],
}
)
table = tlc.Table.from_pandas(
df=df,
structure={"name": tlc.String, "grade": tlc.Int},
project_name="My Pandas Project",
dataset_name="train",
table_name="initial",
)
If your data is in a Apache Parquet file, use tlc.Table.from_parquet().
import tlc
table = tlc.Table.from_parquet(
parquet_file="path/to/file.parquet",
project_name="My Parquet Project",
dataset_name="train",
table_name="initial",
)
NDJSON (Newline-Delimited JSON) is a format where each line of a file contains a JSON object. For example, the following is a valid NDJSON file:
{"name": "Sam Pullweight", "grade": 9}
{"name": "Max Epoch", "grade": 8}
{"name": "Minnie Epoch", "grade": 10}
To create a Table from such a file, use tlc.Table.from_ndjson().
import tlc
table = tlc.Table.from_ndjson(
ndjson_file="path/to/file.ndjson",
project_name="My NDJSON Project",
dataset_name="split",
table_name="initial",
)
YOLO NDJSON is an alternative way to define datasets for Ultralytics YOLO models. The format stores metadata and labels in a single file, with a defined structure.
The following example is how the format is expected in 3LC. The first line is reserved for metadata, and the subsequent lines correspond to images and their labels.
{"task": "detect", "class_names": {"0": "cat", "1": "dog"}, "description": "Cats and Dogs"}
{"file": "image0.png", "width": 1280, "height": 920, "split": "train", "annotations": ["..."],}
{"file": "image1.png", "width": 640, "height": 480, "split": "train", "annotations": ["..."],}
{"file": "image2.png", "width": 1280, "height": 920, "split": "val", "annotations": ["..."],}
"..."
The format of the annotations depends on the task, refer to the Ultralytics Documentation for details.
To create a Table from a YOLO NDJSON file, use tlc.Table.from_yolo_ndjson().
We recommend creating one tlc.Table for each split, in a loop like this:
import tlc
for split in ("train", "val", "test"):
table = tlc.Table.from_yolo_ndjson(
ndjson_file="path/to/yolo.ndjson",
image_folder="path/to/image/folder",
split=split,
project_name="My YOLO Project from NDJSON",
dataset_name=split,
table_name="initial",
)
The image_folder parameter only needs to be used when the "file" paths in the NDJSON file are relative to some
directory other than the one containing the NDJSON file.
If the JSON object in the first line of the NDJSON file contains a field "description", it will be used for the
Table unless a description is provided to tlc.Table.from_yolo_ndjson().
A tlc.TableWriter is a way of producing a
tlc.Table from rows or batches of Python data. Internally, it is used to
write metrics.
import tlc
batches = [
{
"my_float": [0.0, 1.0, 2.0],
},
{
"my_float": [3.0, 4.0, 5.0],
}
]
table_writer = tlc.TableWriter(
column_schemas={"my_float": tlc.Float32Schema()},
project_name="Data from Table Writer Project",
dataset_name="train",
table_name="initial",
)
for batch in batches:
table_writer.add_batch(batch)
table = table_writer.finalize()