URLs¶
3LC objects are identified by URLs, which are represented by the tlc.Url class in the Python
API. URLs can refer to files on disk, or objects on cloud storage such as S3. An object’s URL is generally the location
that it was read from and/or may be written to.
Schemes¶
3LC supports file paths and various cloud storage locations, each with a different scheme. A URL scheme is the first
part of the URL, up to the first :. The following cards summarize the available schemes, how to configure credentials
and how to install necessary dependencies.
File file://
URLs representing file paths may refer to data on a local disk, a mapped network drive, etc. URLs with no scheme are interpreted as file path URLs.
Credentials
To read from or write to the file system, 3LC requires access to the underlying file system and the necessary permissions for the relevant files and directories. Make sure the user or process running 3LC has appropriate read and/or write permissions for the paths you intend to use (including referenced bulk data), or file operations may fail.
Amazon S3
s3://
Amazon S3 URLs refer to data stored in S3 buckets.
Credentials
The tlc package generally uses the
boto3 credentials order
when accessing data stored on S3. In particular, this means that AWS environment variables take precedence, then the
shared credential file (~/.aws/credentials), then the AWS config file (~/.aws/config), then the instance metadata
service if running on an Amazon EC2 instance that has an IAM role configured.
Google Cloud Storage
gcs://
Google Cloud Storage (GCS) URLs refer to data stored in GCS buckets.
Credentials
The tlc package generally uses Google’s
application default credentials order
when accessing data stored on GCS. In particular, this means that the GOOGLE_APPLICATION_CREDENTIALS environment
variable takes precedence, then the gcloud application default credentials, then the instance metadata service if
running on a Google Compute Engine (GCE) instance with an attached service account.
Installation
GCS support is not enabled by default and may be enabled by installing the 3lc[gcs] extra.
Azure Blob Storage
abfs://
Azure Blob storage URLs refer to data stored in Azure Blob containers.
Credentials
The tlc package supports access to Azure Blob storage using AZURE_STORAGE environment variables. Common variations
include:
AZURE_STORAGE_ACCOUNT_NAMEandAZURE_STORAGE_ACCOUNT_KEYAZURE_STORAGE_ACCOUNT_NAMEandAZURE_STORAGE_SAS_TOKENAZURE_STORAGE_CONNECTION_STRING
Installation
Azure Blob Storage support is not enabled by default and may be enabled by installing the 3lc[abfs] extra.
Cloud credential configuration across multiple processes using tlc
It is common with 3LC to run multiple processes that each use the tlc Python package independently, such as a training
notebook and the 3LC Object Service. In order for those different components in different processes to interoperate
correctly with respect to cloud storage URLs, it is important to configure their cloud credentials in a compatible way.
For example, if cloud credentials are configured via environment variables, it is likely that the same environment
variables should be set for each process.
Object URLs¶
The tlc Python package provides a standard set of method arguments used for creating or retrieving objects by URLs.
These are summarized in the table below:
Parameter |
Description |
|---|---|
|
The name of the object, corresponding to the last part of the URL. |
|
The dataset name to use. Defaults to |
|
The project name to use. Defaults to |
|
The project root URL to use. Defaults to the |
|
How to handle the case where the object already exists. Typical values are “overwrite”, “reuse”, “rename”, and “raise” |
|
A fully-qualified custom URL to the object, disregarding the project folder structure. |
Examples¶
Create a URL to a table or run:
import tlc
table_url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
run_url = tlc.Url.create_run_url(run_name, project_name)
Create or retrieve a table from from some input data:
import tlc
data = {
"column_1": [1, 2, 3],
"column_2": ["a", "b", "c"]
}
table = tlc.Table.from_dict(data, table_name, dataset_name, project_name, if_exists="reuse")
# table is now a Table object with a URL of the form
# <project_root>/<project_name>/datasets/<dataset_name>/tables/<table_name>
Common URL manipulation:
# Some examples of common URL manipulation
import tlc
# Create a unique URL from an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
unique_url = url.create_unique() # If a file or folder exists at the URL, appends a unique suffix
assert not unique_url.exists() # The URL is guaranteed to be unique
# Create a URL next to an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
next_to_url = url.create_sibling("new_table") # Creates a URL next to the existing one, with the name "new_table"
# Any object created at next_to_url will be in the same folder as the original object,
# and thereby belong to the same project and dataset (if applicable).
Custom URL Adapters¶
3LC tables can reference external data — images, point clouds, audio files — stored in systems that 3LC doesn’t natively support. Custom URL adapters let you teach 3LC how to read (and optionally write) these sources, so the data never needs to be copied. The adapter resolves a custom URL scheme into bytes on the fly whenever 3LC accesses a row.
The most common use case is a read-only external data adapter: you implement three methods and 3LC handles the rest.
Creating an Adapter¶
Subclass UrlAdapter and implement the required methods:
from tlc import Url
from tlc.url import UrlAdapter
class HttpImageUrlAdapter(UrlAdapter):
"""Read-only adapter that fetches images over HTTP(S)."""
def schemes(self) -> list[str]:
return ["img-http", "img-https"]
def read_binary_content_from_url(self, url: Url) -> bytes:
from urllib.request import urlopen
real_url = f"{url.scheme.removeprefix('img-')}://{url.path}"
with urlopen(real_url) as response:
return response.read()
def exists(self, url: Url) -> bool:
import urllib.request
from urllib.request import urlopen
real_url = f"{url.scheme.removeprefix('img-')}://{url.path}"
req = urllib.request.Request(real_url, method="HEAD")
try:
with urlopen(req) as response:
return response.status == 200
except Exception:
return False
That’s it — three methods and your adapter is functional.
Complete examples
http-image-url-adapter — Minimal read-only adapter that fetches images over HTTP(S).
kitti-virtual-table — Virtual table over KITTI LiDAR point clouds, with on-the-fly de-interleaving.
nifti-virtual-table — Virtual table over NIfTI brain MRI volumes, with per-slice PNG rendering.
Writable adapters
To support writes (e.g., for a custom storage backend), additionally override
write_binary_content_to_url() and
is_writable(). See the
UrlAdapter API reference for the full list of optional methods.
Registering an Adapter¶
Entry Points (recommended)¶
The recommended approach is to declare a Python entry point in your package’s pyproject.toml.
Installing the package is all that’s needed — 3LC discovers and registers the adapter automatically.
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "http-image-url-adapter"
version = "0.1.0"
dependencies = ["3lc"]
[project.entry-points."tlc.url_adapters"]
img-http = "http_image_url_adapter.adapter:HttpImageUrlAdapter"
[tool.hatch.build.targets.wheel]
packages = ["src/http_image_url_adapter"]
Decorator Registration¶
For quick prototyping or in-process use, apply the register_url_adapter() decorator:
from tlc.url import UrlAdapter, register_url_adapter
@register_url_adapter
class MyAdapter(UrlAdapter):
def schemes(self) -> list[str]:
return ["myscheme"]
...
Config-based Registration¶
For deployment-controlled setups, adapters can be loaded from the 3LC config file:
url_adapters:
- module: my_package
class: MyAdapter
Using Your Adapter¶
Once registered, URLs with your scheme work throughout 3LC:
Listing Registered Adapters¶
import tlc
for info in tlc.url.list_url_adapters():
print(f"{info['scheme']:12} {info['adapter_class']:30} {info['source']}")
Overriding Built-in Adapters¶
To replace a built-in adapter (e.g., to customize S3 behavior), set the force class attribute:
from tlc.url import UrlAdapter, register_url_adapter
@register_url_adapter
class MyCustomS3Adapter(UrlAdapter):
force = True
def schemes(self) -> list[str]:
return ["s3"]
...
A warning is logged when a built-in scheme is overridden.
API Reference¶
UrlAdapter— Base adapter classUrlAdapterDirEntry— Directory entry returned bylist_dir()/stat()IfExistsOption— Write-semantics option forwrite_*methodsregister_url_adapter()— Decorator to register a custom adapter