tlc.core.url
#
The URL module provides functionality for working with URLs within the 3LC framework.
URLs in 3LC allow for a part of the URLs to be an alias. That is, a part of the URL can be replaced with an alias token, which is first expanded to a full path when the URL is used.
This is useful for sharing notebooks and data between different environments and users, where identical data might be checked out on different paths on different machines.
Module Contents#
Classes#
Class |
Description |
---|---|
An enumeration of the precedence levels for URL aliases. |
|
Maintains a list of currently registered UrlAliases. |
|
An enumeration of URL schemes. |
|
A class which represents a URL. |
Data#
Data |
Description |
---|---|
API#
- tlc.core.url.Options = None#
- class tlc.core.url.AliasPrecedence#
Bases:
enum.IntEnum
An enumeration of the precedence levels for URL aliases.
AliasPrecedence is used to determine the precedence of URL aliases when multiple aliases are registered (with different precedence). The algorithm considers aliases in ascending order, with lower value aliases taking precedence over higher precedence aliases. The precedence matching algorithm simply picks the first eligible alias and discards any other aliases with the same token.
Initialize self. See help(type(self)) for accurate signature.
- PRIMARY = 1#
The highest alias precedence.
This is the default precedence level. It is used for aliases specified in the input configuration file as well as for programmatically set aliases (unless a different precedence level is explicitly specified).
- SECONDARY = 2#
A secondary alias precedence level.
This level is considered when no higher precedence alias (like PRIMARY) is available. It is typically used for aliases discovered in data and serves as a fallback mechanism that can be overridden by an alias with higher precedence if needed.
- class tlc.core.url.UrlAliasRegistry#
Bases:
abc.ABC
Maintains a list of currently registered UrlAliases.
This registry allows for the management of URL aliases, which are shorthand tokens that map to full URL paths. By using this registry, applications can more easily handle URLs by converting long, cumbersome paths into shorter, manageable tokens. The class provides methods for registering, un-registering, applying, and expanding aliases.
- Example:
# Create a registry instance registry = UrlAliasRegistry.instance() # Register aliases registry.register_url_alias("<HOME>", "http://www.example.com/home") registry.register_url_alias("<PROFILE>", "http://www.example.com/profile") # Apply aliases to a URL string aliased_str = registry.apply_aliases("http://www.example.com/home/page") # aliased_str will be "<HOME>/page" # Expand aliases in a URL string original_str = registry.expand_aliases("<HOME>/page") # original_str will be "http://www.example.com/home/page"
Initialize a new UrlAliasRegistry instance.
Initializes the internal structures for maintaining URL aliases.
- url_alias_registry_instance: tlc.core.url.UrlAliasRegistry | None = None#
- static normalize_path(input: str) str #
Normalize a path string so that it conforms to the registry standard.
- Parameters:
input – The string to normalize.
- Returns:
The normalized string.
- register_url_alias(token: str, path: str, force: bool = False, precedence: tlc.core.url.AliasPrecedence = AliasPrecedence.PRIMARY) None #
Register and validates an alias for a URL.
- Parameters:
token – The token of the alias.
path – The path that the alias refers to
force – If True, force the registration of the alias even if it is already registered.
- Raises:
ValueError – If the token or path is invalid or if the alias already exists and would be modified.
- unregister_url_alias(token: str, precedence: tlc.core.url.AliasPrecedence = AliasPrecedence.PRIMARY) str #
Unregister an alias.
- Parameters:
token – The token of the alias.
- Returns:
The path that the alias referred to.
- Raises:
KeyError – If the alias is not registered.
- apply_aliases(input: str) str #
Apply registered URL aliases to a string. C:/Tmp/my_data => <TMP_PATH>/my_data
Replaces strings that contain a possible token with the corresponding alias token path. A string can only contain a single alias token and aliases may not be nested. If multiple tokens are found, the token that corresponds to the longest path will be used.
- Parameters:
input – The string to modify.
- Returns:
The modified string with aliases applied.
- expand_aliases(input: str, allow_unexpanded: bool = True) str #
Expand the alias if any in a string by substituting the token with the registered path segment.
- Parameters:
input – The string to modify.
allow_unexpanded – If
True
, aliases that cannot be expanded will be left in the string. IfFalse
, an exception will be raised if an alias cannot be expanded.
- Returns:
The modified string with aliases expanded.
- print_url_aliases(line_prefix: str = '') None #
Print all registered URL aliases.
Prints each alias and its corresponding path, each prefixed by
line_prefix
.- Parameters:
line_prefix – A string prefix to prepend to each printed line.
- static instance() tlc.core.url.UrlAliasRegistry #
Get the singleton instance of the UrlAliasRegistry.
- Returns:
The singleton instance of the UrlAliasRegistry.
- class tlc.core.url.Scheme#
Bases:
enum.Enum
An enumeration of URL schemes.
This enum is used to represent the scheme of a URL. The scheme is the part of the URL before the first colon (:). In order to get the string representation of the scheme, use the value property of the enum, e.g.
FILE.value == "file" HTTP.value == "http"
- FILE = file#
The file scheme.
This is used by URLs that point to files on the local file system.
- HTTP = http#
The HTTP scheme.
This is used by URLs that point to files on a web server.
- HTTPS = https#
The HTTPS scheme.
This is used by URLs that point to files on a web server when using a secure network connection.
- S3 = s3#
The S3 scheme.
This is used by URLs that point to files on an S3 bucket.
- GS = gs#
The GS scheme.
This is used by URLs that point to files on a Google Cloud Storage bucket.
- ABFS = abfs#
The ABFS scheme.
This is used by URLs that point to files on a Azure Data lake Blob Storage.
- API = api#
The API scheme.
This scheme is used by URLs that point to 3LC API endpoints.
- RELATIVE = relative#
The relative URL scheme.
A relative URL is a URL that does not have a scheme. Relative URLs can be combined with another URL in order to refer to a location relative to the other URL.
- ALIAS = alias#
Scheme for the URL alias.
This scheme is used by URLs that are aliases for other URLs, when the scheme cannot be detected. This is useful if an alias might point to e.g. either a file or an s3 bucket. In this case, the scheme of the URL will be determined when the alias is expanded. See the documentation for the Url-class for more information on alias expansion.
- class tlc.core.url.Url(value: str | pathlib.Path | tlc.core.url.Url | None = None, scheme: tlc.core.url.Scheme | None = None, normalized_path: str | None = None)#
Bases:
abc.ABC
A class which represents a URL.
A URL in 3LC is a combination of a scheme and a path. Many methods in 3LC accept URLs as arguments and/or return URLs. They are also used to refer to Tables and to cross reference between them. A file URL in 3LC will behave identically on both Posix and Windows systems.
Since a URL in 3LC might contain aliases, and even the scheme might not be determined until aliases are expanded, it is important to note which methods and properties will expand.
The path and scheme properties of the URL will expand aliases
- Examples:
Scheme is determined from the input string
file_url = Url("/path/to/file") # Or Url("file:///path/to/file") file_url.scheme == Scheme.FILE file_url.path == "/path/to/file" str(file_url) == "/path/to/file" # omit file:// scheme s3_url = Url("s3://bucket/path/to/object") s3_url.scheme == Scheme.S3 s3_url.path == "bucket/path/to/object" str(s3_url) == "s3://bucket/path/to/object" # include s3:// scheme gcs_url = Url("gs://bucket/path/to/object") gcs_url.scheme == Scheme.GS gcs_url.path == "bucket/path/to/object" str(gcs_url) == "gs://bucket/path/to/object" # include gs:// scheme relative_url = Url("path/to/file") relative_url.scheme == Scheme.RELATIVE relative_url.path == "path/to/file" str(relative_url) == "path/to/file" # omit relative:// scheme # *Aliases are expanded when the URL is used* # Assume <SAMPLE_DATA> is **not** registered alias_url = Url("<SAMPLE_DATA>/data.csv") alias_url.scheme == Scheme.ALIAS alias_url.path == "<SAMPLE_DATA>/data.csv" str(alias_url) == "<SAMPLE_DATA>/data.csv" # Set the alias UrlAliasRegistry.instance().register_url_alias(token="<SAMPLE_DATA>", path="/path/to/data") # It will now be expanded when using path and scheme properties alias_url.scheme == Scheme.FILE alias_url.path == "/path/to/data/data.csv" str(alias_url) == "<SAMPLE_DATA>/data.csv" # Set an alternative alias UrlAliasRegistry.instance().unregister_url_alias(token="<SAMPLE_DATA>") UrlAliasRegistry.instance().register_url_alias(token="<SAMPLE_DATA>", path="/alternate/path/to/data") alias_url.scheme == Scheme.FILE alias_url.path == "/alternate/path/to/data/data.csv" UrlAliasRegistry.instance().unregister_url_alias(token="<SAMPLE_DATA>")
- Terminology:
A normalized URL has a scheme, uses single-forward slashes as path separator, and does not end-with a slash.
An expanded URL has aliases expanded, and is normalized.
An absolute URL is a expanded which means that it can be used as a stable persisted reference.
Relative URLs are converted to absolute URLs based on an “owner” URL, or, if applicable, the current working directory of the process
Relative and Api URLs will have “relative://” or “api://” as their scheme but these schemes will be omitted from the stringified representation.
- Caveats:
The URL does not make any network calls or access to the file system. It therefore cannot resolve symlinks, and use of these is discouraged in combination with 3LC.
There are a few exotic Windows paths that are not supported:
The use of a Windows-drive letter without a slash, e.g.
C:foo/bar
, is not supported. UseC:/foo/bar
instead.
- Parameters:
value – The URL as a string, Path, or Url object. When this argument is passed as a string, it will be normalized and the scheme is deduced from the string contents.
scheme – The scheme of the URL, if known.
normalized_path – The normalized path of the URL, if known. If both scheme and normalized_path are passed, they will be used directly without any normalization or parsing. It is the responsibility of the caller to ensure that the scheme and normalized_path are valid.
- Raises:
ValueError – If the URL is specified with both value and scheme/path.
- property scheme: tlc.core.url.Scheme#
Return the scheme of the expanded URL.
Calling this method will expand aliases in the URL. If the alias cannot be expanded, it will return Scheme.ALIAS.
To access the scheme of the URL without expanding aliases, use the
_scheme
member variable.- Returns:
The scheme of the URL.
- Raises:
ValueError – If the url scheme cannot be determined.
- property path: str#
Return the path of the expanded URL.
Calling this method will expand aliases in the URL.
This will return the path without a scheme, so e.g. an S3 URL will return the path without the protocol.
Url("s3://bucket/table.json").path == "/bucket/table.json" Url("relative://foo/bar").path == "foo/bar"
- static absolute_from_relative(url: tlc.core.url.Url, owner: tlc.core.url.Url | str | None = None) tlc.core.url.Url #
Convert a relative URL to an absolute URL, given an owner URL.
- Parameters:
url – The relative URL to convert.
owner – The owner URL, if necessary for conversion.
- static relative_from(url: tlc.core.url.Url, owner: tlc.core.url.Url | None) tlc.core.url.Url #
Transform a URL into relative form taking a given owner URL into account.
Create an URL relative to the given owner URL that is equivalent to the absolute URL. The owner URL can be a parent directory of the absolute URL, but it may also be a directory or file that shares part of the absolute URL’s path. If the absolute URL and owner URL are not compatible, the function will raise a ValueError
If the transformation is not possible, for example if the URL and the owner have different schemes, the function will return the original URL.
- Example:
# Owner URL is a directory absolute_url = "s3://bucket/path/to/file.ext" owner_url = "s3://bucket/path" relative_url = Url.relative_from_absolute(absolute_url, owner_url) str(relative_url) == "to/file.ext" # Owner URL is a file absolute_url = "s3://bucket/path/to/file2.ext" owner_url = "s3://bucket/path/to/file1.ext" relative_url = Url.relative_from_absolute(absolute_url, owner_url) assert str(relative_url) == "../file2.ext"
- Raises:
ValueError – If the absolute URL and owner URL are not compatible
- expand_aliases(allow_unexpanded: bool = True) tlc.core.url.Url #
Expand aliases in the URL.
- Parameters:
allow_unexpanded – If
True
, aliases that cannot be expanded will be left in the URL. IfFalse
, an exception will be raised if an alias cannot be expanded.- Returns:
The scheme and path of the URL with aliases expanded.
- apply_aliases() tlc.core.url.Url #
Apply all registered aliases to this URL.
- Returns:
The URL with aliases applied.
- is_absolute() bool #
Check if the normalized, unexpanded URL is absolute.
Notice that this method does not expand aliases.
- Returns:
True if the URL is absolute, False otherwise.
- to_relative(owner: tlc.core.url.Url | str | None = None) tlc.core.url.Url #
Relativize a URL, including applying aliases.
- Parameters:
owner – The owner URL, if necessary for conversion.
- Returns:
A relative URL if possible, otherwise the original URL.
- Raises:
NotImplementedError – If the conversion is not supported.
- to_relative_with_max_depth(owner: tlc.core.url.Url, max_depth: int) tlc.core.url.Url #
Relativize the given URL with respect to the given owner URL, up to a maximum depth.
If
url
does not have a common prefix withowner
up tomax_depth
,url
is returned with only aliases.- Parameters:
url – The URL to relativize.
owner – The URL to relativize with respect to.
max_depth – The maximum depth to relativize up to.
- Returns:
The relativized URL.
- to_absolute(owner: tlc.core.url.Url | str | None = None) tlc.core.url.Url #
Convert a relative URL to an absolute URL.
- Parameters:
owner – The owner URL, if necessary for conversion.
- Returns:
An absolute URL.
- Raises:
NotImplementedError – If the conversion is not supported.
- escape() str #
Double-escape the URL string to handle paths in service endpoints.
Some services require double-escaping to process URLs correctly due to internal un-escaping passes.
- Returns:
A double-escaped URL string.
- replace(old: str, new: str) tlc.core.url.Url #
Replace occurrences of a substring in the URL with a new substring.
The intended use case for this method is to e.g., replace a file extension in a URL.
This methods textually replaces occurrences of the old substring with the new substring in the path of the URL. Notice that the replacement will happen on the normalized path, which is not necessarily identical to the path passed to the Url constructor when it was first created.
Changing the scheme of the URL is not supported, however it is possible to replace an alias. If the alias contains the scheme (e.g. url.scheme == ALIAS) the scheme can be changed.
Notice that this method does not expand aliases.
- Parameters:
old – The substring to be replaced.
new – The new substring to replace the old substring.
- Returns:
A new URL with the specified substring replaced.
- join(other: tlc.core.url.Url) tlc.core.url.Url #
Join two URLs.
The other URL needs to be a relative URL
- Parameters:
other – The URL to join with the current URL. Required to be relative.
- Returns:
A new URL, which is the result of joining the current and other URLs.
- Raises:
ValueError – If the other URL is not relative.
- create_unique(require_writable: bool = False) tlc.core.url.Url #
Create a unique and possibly writable version of the Url.
This method will create a unique URL by appending a unique identifier to the URL, if necessary. If the resulting URL is not writable, it will try to create a fallback URL in the PROJECT_ROOT_URL location.
The fallback mechanism is currently implemented for: - Table-URLs in the form of
/datasets/ /tables/ - Returns:
A unique Url (which is writable if so requested)
- create_sibling(name: str) tlc.core.url.Url #
Create a new Url next to the current Url.
- Example:
Url("C:/path/to/file.json").create_sibling("umap.json") == Url("C:/path/to/umap.json") Url("C:/path/to/dir").create_sibling("other") == Url("C:/path/to/other")
- Parameters:
name – The name of the new Url.
- Returns:
A new Url next to the current Url.
- to_str() str #
Convert the URL to a normalized string.
This returns the normalized, un-expanded URL as a string.
- Returns:
The URL as a string.
- static get_path_type(path: str) str #
Determine if a path, without scheme, is a Windows or Posix path.
- static normalize_chars(url: str) str #
Normalize characters in a URL.
- Parameters:
url – The URL to normalize.
- Returns:
The normalized URL.
- static get_normalized(value: str) tuple[tlc.core.url.Scheme, str] #
Get the normalized value of the string representation of a URL.
- static split_url(value: str) tuple[str, str] #
Split a URL into a scheme and a path.
Unlike urlparse, this function does not require a scheme to be present in the URL. It will also not parse the drive letter (e.g. C:/) in a Windows URL as part of the URL.
- static join_url(scheme: tlc.core.url.Scheme | None, path: str) str #
Join a scheme and a path into a URL.
- Parameters:
scheme – The scheme.
path – The path.
- Returns:
The URL with scheme applied
- static get_scheme(value: str) tlc.core.url.Scheme #
Get the scheme of the string representation of a URL.
- Parm value:
The URL as a string.
- Raises:
ValueError – If the URL scheme is not supported.
- Returns:
The scheme of the URL.
- open(mode: str) io.BufferedReader | io.TextIOWrapper #
Open the URL as a file.
- Parameters:
mode – The file mode to use when opening the URL.
- Returns:
A file-like object.
- Raises:
TypeError – If the URL cannot be opened as a file.
- read(mode: str = 'b') str | bytes #
Read the contents of the URL.
- Parameters:
mode – The mode to use when reading
- write(content: str | bytes, mode: str = 'b', if_exists: typing.Literal[overwrite, rename, raise] = 'overwrite') None #
Write data to a URL.
- Parameters:
content – The content to write.
mode – The mode to use when writing.
if_exists – The write options to use when writing, can be “overwrite”, “rename”, or “raise”.
- exists() bool #
Check if the URL exists.
- Returns:
True if the URL exists, False otherwise.
- Raises:
Exception – If the URL cannot be accessed.
- make_parents(exist_ok: bool = False) None #
Make all parent directories of the URL.
- Parameters:
exist_ok – If True, do not raise an exception if the directory already exists.
- Raises:
Exception – If the URL cannot be accessed.
- property parent: tlc.core.url.Url#
Get the parent URL of the URL.
- Returns:
The parent URL.
- property name: str#
Get the name of the URL.
- Example:
Url("C:/folder/file.txt").name == "file.txt" Url("C:/folder").name == "folder"
- Returns:
The name of the URL.
- property stem: str#
Get the stem of the URL.
- Example:
Url("example.json").stem == "example"
- Returns:
The stem of the URL.
- property extension: str#
Get the extension of the URL.
- Example:
Url("example.json").extension == ".json"
- Returns:
The extension of the URL.
- static api_url_for_object(obj: object) tlc.core.url.Url #
Get the API URL for an object.
This is the default URL for an object when a persistent URL is not specified. API URLs allow objects to be addressable as long as they are in memory.
- Parameters:
object – The object to get the API URL for.
- classmethod create_table_url(table_name: str | None = None, dataset_name: str | None = None, project_name: str | None = None, root: str | tlc.core.url.Url | None = None) tlc.core.url.Url #
Create a URL for a Table conforming to the 3LC project folder layout.
- Parameters:
table_name – The table name to use. If not provided, the default table name will be used.
dataset_name – The dataset name to use. If not provided, the default dataset name will be used.
project_name – The project name to use. If not provided, the current active project will be used.
root – The root url to use. If not provided, the project root url will be used.
- Returns:
A Url for a table with the specified names.
- classmethod create_run_url(run_name: str | None = None, project_name: str | None = None, root: str | tlc.core.url.Url | None = None) tlc.core.url.Url #
Create a URL for a run conforming to the 3LC project folder layout.
- Parameters:
run_name – The name of the run. If not provided, the default run name will be used.
project_name – The name of the project. If not provided, the active project will be used.
root – The root URL of the project. If not provided, the project root URL will be used.
- Returns:
A URL for a run with the specified names.
- to_minimal_dict(_: bool = False) str #
Convert the URL to a minimal, serializable representation.
- Returns:
The URL as a str.
- is_descendant_of(other: tlc.core.url.Url) bool #
Check if the URL is a descendant of another URL.
- Parameters:
other – The URL to check if the current URL is a descendant of.
- Returns:
True if the URL is a descendant of the other URL, False otherwise.
- is_dataset_table_url() bool #
Check if the URL is a standard dataset table URL.
- Returns:
True if the URL is a canonical table URL, False otherwise.