URLs¶

3LC objects are identified by URLs, which are represented by the tlc.Url class in the Python API. URLs can refer to files on disk, or objects on cloud storage such as S3. An object’s URL is generally the location that it was read from and/or may be written to.

Schemes¶

3LC supports file paths and various cloud storage locations, each with a different scheme. A URL scheme is the first part of the URL, up to the first :. The following cards summarize the available schemes, how to configure credentials and how to install necessary dependencies.

File file://

URLs representing file paths may refer to data on a local disk, a mapped network drive, etc. URLs with no scheme are interpreted as file path URLs.

Amazon S3 s3://

Amazon S3 URLs refer to data stored in S3 buckets.

Google Cloud Storage gcs://

Google Cloud Storage (GCS) URLs refer to data stored in GCS buckets.

Azure Blob Storage abfs://

Azure Blob storage URLs refer to data stored in Azure Blob containers.

Cloud credential configuration across multiple processes using tlc

It is common with 3LC to run multiple processes that each use the tlc Python package independently, such as a training notebook and the 3LC Object Service. In order for those different components in different processes to interoperate correctly with respect to cloud storage URLs, it is important to configure their cloud credentials in a compatible way. For example, if cloud credentials are configured via environment variables, it is likely that the same environment variables should be set for each process.

Object URLs¶

The tlc Python package provides a standard set of method arguments used for creating or retrieving objects by URLs. These are summarized in the table below:

Parameter	Description
`table_name`/`run_name`	The name of the object, corresponding to the last part of the URL.
`dataset_name`	The dataset name to use. Defaults to `default-dataset`.
`project_name`	The project name to use. Defaults to `default-project`.
`root_url`	The project root URL to use. Defaults to the `PROJECT_ROOT_URL` configuration variable.
`if_exists`	How to handle the case where the object already exists. Typical values are “overwrite”, “reuse”, “rename”, and “raise”
`table_url`/`run_url`	A fully-qualified custom URL to the object, disregarding the project folder structure.

Examples¶

Create a URL to a table or run:

import tlc

table_url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
run_url = tlc.Url.create_run_url(run_name, project_name)

Create or retrieve a table from from some input data:

import tlc

data = {
    "column_1": [1, 2, 3],
    "column_2": ["a", "b", "c"]
}
table = tlc.Table.from_dict(data, table_name, dataset_name, project_name, if_exists="reuse")
# table is now a Table object with a URL of the form 
# <project_root>/<project_name>/datasets/<dataset_name>/tables/<table_name>

Common URL manipulation:

# Some examples of common URL manipulation
import tlc

# Create a unique URL from an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
unique_url = url.create_unique() # If a file or folder exists at the URL, appends a unique suffix
assert not unique_url.exists() # The URL is guaranteed to be unique

# Create a URL next to an existing one
url = tlc.Url.create_table_url(table_name, dataset_name, project_name)
next_to_url = url.create_sibling("new_table") # Creates a URL next to the existing one, with the name "new_table"
# Any object created at next_to_url will be in the same folder as the original object,
# and thereby belong to the same project and dataset (if applicable).