URLs¶

3LC objects are identified by URLs, which are represented by the tlc.Url class in the Python API. URLs can refer to files on disk, or objects on cloud storage such as S3. An object’s URL is generally the location that it was read from and/or may be written to.

Schemes¶

3LC supports file paths and various cloud storage locations, each with a different scheme. A URL scheme is the first part of the URL, up to the first :. The following cards summarize the available schemes, how to configure credentials and how to install necessary dependencies.

File file://

URLs representing file paths may refer to data on a local disk, a mapped network drive, etc. URLs with no scheme are interpreted as file path URLs.

Credentials

To read from or write to the file system, 3LC requires access to the underlying file system and the necessary permissions for the relevant files and directories. Make sure the user or process running 3LC has appropriate read and/or write permissions for the paths you intend to use (including referenced bulk data), or file operations may fail.

Amazon S3 s3://

Amazon S3 URLs refer to data stored in S3 buckets.

Credentials

The tlc package generally uses the boto3 credentials order when accessing data stored on S3. In particular, this means that AWS environment variables take precedence, then the shared credential file (~/.aws/credentials), then the AWS config file (~/.aws/config), then the instance metadata service if running on an Amazon EC2 instance that has an IAM role configured.

Google Cloud Storage gcs://

Google Cloud Storage (GCS) URLs refer to data stored in GCS buckets.

Credentials

The tlc package generally uses Google’s application default credentials order when accessing data stored on GCS. In particular, this means that the GOOGLE_APPLICATION_CREDENTIALS environment variable takes precedence, then the gcloud application default credentials, then the instance metadata service if running on a Google Compute Engine (GCE) instance with an attached service account.

Installation

GCS support is not enabled by default and may be enabled by installing the 3lc[gcs] extra.

Azure Blob Storage abfs://

Azure Blob storage URLs refer to data stored in Azure Blob containers.

Credentials

The tlc package supports access to Azure Blob storage using AZURE_STORAGE environment variables. Common variations include:

  • AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY

  • AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_SAS_TOKEN

  • AZURE_STORAGE_CONNECTION_STRING

Installation

Azure Blob Storage support is not enabled by default and may be enabled by installing the 3lc[abfs] extra.

Cloud credential configuration across multiple processes using tlc

It is common with 3LC to run multiple processes that each use the tlc Python package independently, such as a training notebook and the 3LC Object Service. In order for those different components in different processes to interoperate correctly with respect to cloud storage URLs, it is important to configure their cloud credentials in a compatible way. For example, if cloud credentials are configured via environment variables, it is likely that the same environment variables should be set for each process.

Object URLs¶

The tlc Python package provides a standard set of method arguments used for creating or retrieving objects by URLs. These are summarized in the table below:

Parameter

Description

table_name/run_name

The name of the object, corresponding to the last part of the URL.

dataset_name

The dataset name to use. Defaults to default-dataset.

project_name

The project name to use. Defaults to default-project.

root_url

The project root URL to use. Defaults to the PROJECT_ROOT_URL configuration variable.

if_exists

How to handle the case where the object already exists. Typical values are “overwrite”, “reuse”, “rename”, and “raise”

table_url/run_url

A fully-qualified custom URL to the object, disregarding the project folder structure.

Examples¶

Create a URL to a table or run:

import tlc

table_url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
run_url = tlc.Url.create_run_url(run_name, project_name)

Create or retrieve a table from from some input data:

import tlc

data = {
    "column_1": [1, 2, 3],
    "column_2": ["a", "b", "c"]
}
table = tlc.Table.from_dict(data, table_name, dataset_name, project_name, if_exists="reuse")
# table is now a Table object with a URL of the form 
# <project_root>/<project_name>/datasets/<dataset_name>/tables/<table_name>

Common URL manipulation:

# Some examples of common URL manipulation
import tlc

# Create a unique URL from an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
unique_url = url.create_unique() # If a file or folder exists at the URL, appends a unique suffix
assert not unique_url.exists() # The URL is guaranteed to be unique

# Create a URL next to an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
next_to_url = url.create_sibling("new_table") # Creates a URL next to the existing one, with the name "new_table"
# Any object created at next_to_url will be in the same folder as the original object,
# and thereby belong to the same project and dataset (if applicable).

Custom URL Adapters¶

3LC tables can reference external data — images, point clouds, audio files — stored in systems that 3LC doesn’t natively support. Custom URL adapters let you teach 3LC how to read (and optionally write) these sources, so the data never needs to be copied. The adapter resolves a custom URL scheme into bytes on the fly whenever 3LC accesses a row.

The most common use case is a read-only external data adapter: you implement three methods and 3LC handles the rest.

Creating an Adapter¶

Subclass UrlAdapter and implement the required methods:

from tlc import Url
from tlc.url import UrlAdapter


class HttpImageUrlAdapter(UrlAdapter):
    """Read-only adapter that fetches images over HTTP(S)."""

    def schemes(self) -> list[str]:
        return ["img-http", "img-https"]

    def read_binary_content_from_url(self, url: Url) -> bytes:
        from urllib.request import urlopen

        real_url = f"{url.scheme.removeprefix('img-')}://{url.path}"
        with urlopen(real_url) as response:
            return response.read()

    def exists(self, url: Url) -> bool:
        import urllib.request
        from urllib.request import urlopen

        real_url = f"{url.scheme.removeprefix('img-')}://{url.path}"
        req = urllib.request.Request(real_url, method="HEAD")
        try:
            with urlopen(req) as response:
                return response.status == 200
        except Exception:
            return False

That’s it — three methods and your adapter is functional.

Complete examples

  • http-image-url-adapter — Minimal read-only adapter that fetches images over HTTP(S).

  • kitti-virtual-table — Virtual table over KITTI LiDAR point clouds, with on-the-fly de-interleaving.

  • nifti-virtual-table — Virtual table over NIfTI brain MRI volumes, with per-slice PNG rendering.

Writable adapters

To support writes (e.g., for a custom storage backend), additionally override write_binary_content_to_url() and is_writable(). See the UrlAdapter API reference for the full list of optional methods.

Registering an Adapter¶

Entry Points (recommended)¶

The recommended approach is to declare a Python entry point in your package’s pyproject.toml. Installing the package is all that’s needed — 3LC discovers and registers the adapter automatically.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "http-image-url-adapter"
version = "0.1.0"
dependencies = ["3lc"]

[project.entry-points."tlc.url_adapters"]
img-http = "http_image_url_adapter.adapter:HttpImageUrlAdapter"

[tool.hatch.build.targets.wheel]
packages = ["src/http_image_url_adapter"]

Decorator Registration¶

For quick prototyping or in-process use, apply the register_url_adapter() decorator:

from tlc.url import UrlAdapter, register_url_adapter

@register_url_adapter
class MyAdapter(UrlAdapter):
    def schemes(self) -> list[str]:
        return ["myscheme"]
    ...

Config-based Registration¶

For deployment-controlled setups, adapters can be loaded from the 3LC config file:

url_adapters:
  - module: my_package
    class: MyAdapter

Using Your Adapter¶

Once registered, URLs with your scheme work throughout 3LC:

from tlc import Url

url = Url("img-https://picsum.photos/id/10/400/300")
image_bytes = url.read()  # fetches the JPEG via your adapter

Listing Registered Adapters¶

import tlc

for info in tlc.url.list_url_adapters():
    print(f"{info['scheme']:12} {info['adapter_class']:30} {info['source']}")

Overriding Built-in Adapters¶

To replace a built-in adapter (e.g., to customize S3 behavior), set the force class attribute:

from tlc.url import UrlAdapter, register_url_adapter

@register_url_adapter
class MyCustomS3Adapter(UrlAdapter):
    force = True

    def schemes(self) -> list[str]:
        return ["s3"]
    ...

A warning is logged when a built-in scheme is overridden.

API Reference¶