URL Details#

3LC objects are identified by URLs, which are represented by the Url class in the tlc Python API. An object’s URL is generally the location that it was read from and/or may be written to.

Supported URL Schemes#

Supported URLs include file paths and cloud storage locations on Amazon S3, Google Cloud Storage (GCS), and Azure Blob.

File Paths#

URLs representing file paths may refer to data on a local disk, a mapped network drive, etc. They may start with the file:// scheme, but URLs with no scheme are also interpreted as file path URLs.

Amazon S3#

Amazon S3 URLs refer to data stored in S3 buckets, and they begin with the s3:// scheme.

The tlc package generally uses the boto3 credentials order when accessing data stored on S3. In particular, this means that AWS environment variables take precedence, then the shared credential file (~/.aws/credentials), then the AWS config file (~/.aws/config), then the instance metadata service if running on an Amazon EC2 instance that has an IAM role configured.

Google Cloud Storage (GCS)#

Note

Note that GCS support is not enabled by default but may be enabled by installing the 3lc[gcs] extra.

Google Cloud Storage (GCS) URLs refer to data stored in GCS buckets, and they begin with the gs:// scheme.

The tlc package generally uses Google’s application default credentials order when accessing data stored on GCS. In particular, this means that the GOOGLE_APPLICATION_CREDENTIALS environment variable takes precedence, then the gcloud application default credentials, then the instance metadata service if running on a Google Compute Engine (GCE) instance with an attached service account.

Azure Blob Storage#

Note

Note that Azure Blob storage support is not enabled by default but may be enabled by installing the 3lc[abfs] extra.

Azure Blob storage URLs refer to data stored in Azure Blob containers, and they begin with the abfs:// scheme.

The tlc package supports access to Azure Blob storage using AZURE_STORAGE environment variables. Common variations include:

  • AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY

  • AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_SAS_TOKEN

  • AZURE_STORAGE_CONNECTION_STRING

Cloud credential configuration across multiple processes tlc

It is common with 3LC to run multiple processes that each use the tlc Python package independently, such as a training notebook and the 3LC Object Service. In order for those different components in different processes to interoperate correctly with respect to cloud storage URLs, it is important to configure their cloud credentials in a compatible way. For example, if cloud credentials are configured via environment variables, it is likely that the same environment variables should be set in for each process.