URL Details#
3LC objects are identified by URLs, which are represented by the Url
class in the tlc
Python API. An object’s URL is generally the location that it was read from and/or may be written to.
Supported URL Schemes#
Supported URLs include file paths and cloud storage locations on Amazon S3 and Google Cloud Storage (GCS).
File Paths#
URLs representing file paths may refer to data on a local disk, a mapped network drive, etc. They may start with the
file://
scheme, but URLs with no scheme are also interpreted as file path URLs.
Amazon S3#
Amazon S3 URLs refer to data stored in S3 buckets, and they begin with the s3://
scheme.
The tlc
package generally uses the
boto3 credentials order
when accessing data stored on S3. In particular, this means that AWS environment variables take precedence, then the
shared credential file (~/.aws/credentials
), then the AWS config file (~/.aws/config
), then the instance metadata
service if running on an Amazon EC2 instance that has an IAM role configured.
Google Cloud Storage (GCS)#
Note
Note that GCS support is not enabled by default but may be enabled by installing the tlc[gcs]
extra.
Google Cloud Storage (GCS) URLs refer to data stored in GCS buckets, and they begin with the gs://
scheme.
The tlc
package generally uses Google’s
application default credentials order
when accessing data stored on GCS. In particular, this means that the GOOGLE_APPLICATION_CREDENTIALS
environment
variable takes precedence, then the gcloud
application default credentials, then the instance metadata service if
running on a Google Compute Engine (GCE) instance with an attached service account.
Azure Blob Storage#
Note
Note that Azure Blob storage support is not enabled by default but may be enabled by installing the 3lc[abfs]
extra.
Azure Blob storage URLs refer to data stored in Azure Blob containers, and they begin with the abfs://
scheme.
The tlc
package supports access to Azure Blob storage using AZURE_STORAGE
environment variables. Common variations
include:
AZURE_STORAGE_ACCOUNT_NAME
andAZURE_STORAGE_ACCOUNT_KEY
AZURE_STORAGE_ACCOUNT_NAME
andAZURE_STORAGE_SAS_TOKEN
AZURE_STORAGE_CONNECTION_STRING
Cloud credential configuration across multiple processes tlc
It is common with 3LC to run multiple processes that each use the tlc Python package independently, such as a training notebook and the 3LC Object Service. In order for those different components in different processes to interoperate correctly with respect to cloud storage URLs, it is important to configure their cloud credentials in a compatible way. For example, if cloud credentials are configured via environment variables, it is likely that the same environment variables should be set in for each process.