Ingest PandaSet autonomous driving dataset¶

This notebook shows how to load 3D point clouds, 3D oriented bounding boxes and semantic segmentations from the PandaSet dataset into a 3LC Table.

Tables with large 3D geometries use the bulk data pattern for storing data. For details on the ingestion process, see the loading script

Running this notebook requires the PandaSet DevKit.

The dataset can be downloaded from HuggingFace. If you have already downloaded pandaset.zip, ensure the dataset root below points to the unzipped pandaset directory.

If not, the notebook will download pandaset.zip and unzip it into the dataset root directory. This requires authentication with HuggingFace, for example by setting the HF_TOKEN environment variable.

⚠️ Storage requirements

The unzipped dataset is ~42GB, and ingesting all sequences into 3LC will require another 50GB of disk space. Ensure you have enough free space before running the notebook.

Project Setup¶

[ ]:

PROJECT_NAME = "3LC Tutorials - Pandaset"
DATASET_NAME = "pandaset"
TABLE_NAME = "pandaset"
DATA_PATH = "../../../../data"
DOWNLOAD_PATH = "../../../../transient_data"
MAX_FRAMES = None
MAX_SEQUENCES = None

[ ]:

%pip install -q "pandaset @ git+https://github.com/scaleapi/pandaset-devkit.git@master#subdirectory=python"
%pip install -q 3lc
%pip install -q huggingface-hub

Imports¶

[ ]:

from pathlib import Path

from load_pandaset import load_pandaset

Prepare Dataset¶

[ ]:

DATASET_ROOT = Path(DOWNLOAD_PATH) / "pandaset"

if not DATASET_ROOT.exists():
    import zipfile

    from huggingface_hub import hf_hub_download

    print("Downloading dataset from HuggingFace")
    hf_hub_download(
        repo_id="georghess/pandaset",
        repo_type="dataset",
        filename="pandaset.zip",
        local_dir=DATASET_ROOT.parent.absolute().as_posix(),
    )

    with zipfile.ZipFile(f"{DATASET_ROOT.parent}/pandaset.zip", "r") as zip_ref:
        zip_ref.extractall(DATASET_ROOT.parent)

    # Remove the pandaset.zip file after extraction
    (DATASET_ROOT.parent / "pandaset.zip").unlink(missing_ok=True)
else:
    print(f"Dataset root {DATASET_ROOT} already exists")

Create Table¶

[ ]:

table = load_pandaset(
    dataset_root=DATASET_ROOT,
    table_name=TABLE_NAME,
    dataset_name=DATASET_NAME,
    project_name=PROJECT_NAME,
    data_path=DATA_PATH,
    max_frames=MAX_FRAMES,
    max_sequences=MAX_SEQUENCES,
)