Objects, URLs, and Schemas#
This page gives a brief introduction to the concepts of Objects
, URLs
, and Schemas
in the context of tlc
Python package.
Objects#
Object is the base class of all objects in the tlc Python package. An Object
is a simple container for keys and values. All objects are required to have the attributes type
, schema
and created
.
In practice, you will never need to create an Object
directly. Instead, you will create instances of the two main subclasses: Table and Run.
However, it is useful to understand the common functionality provided by Object
, to better understand how to work with the tlc
package.
The main functionality provided by Object
is the ability to serialize an object to JSON, and to construct objects from JSON. The JSON representation of an object is guaranteed to be sufficient to recreate the object, thereby providing a simple way to store and retrieve objects from persistent storage. We can think of the JSON representation of an object as a “recipe” for creating the object.
Urls#
Instances of type Object
can be serialized to JSON and stored in various locations. The specific location of an object is defined by a Url, which could represent a local file path, or a path to some remote object storage.
Let’s proceed to create an Object
and save it to a local file path.
[ ]:
# Create a relative filepath Url
url = tlc.Url("./my_object")
# Create an object with the given url and write it to the url
my_object = tlc.Object(url=url)
my_object.write_to_url()
Serialized objects can be read back into memory using Object.from_url().
[ ]:
my_object_restored = tlc.Object.from_url(url)
Inspired by the pathlib
module, Url provides a simple and intuitive way to work with URLs:
[ ]:
# Create a relative filepath Url:
url = tlc.Url("./my_object")
# Create a absolute filepath Url:
url = tlc.Url("/path/to/my_object")
# Create a S3 Url:
url = tlc.Url("s3://bucket/my_object")
To create a Url
for a object located within the 3LC project structure, the methods Url.create_table_url() and Url.create_run_url() can be used.
[ ]:
my_table_url = tlc.Url.create_table_url("my-table", "my-dataset", "my-project")
my_run_url = tlc.Url.create_run_url("my-run", "my-project")
The Url
class provides a number of useful methods for working with URLs, such as exists()
, join()
, to_absolute()
, etc. See the Url API documentation for more details.
Schemas#
All tlc
Objects are described by schemas. A Schema is a tree-like structure that describes the layout of an object.
A schema
describes all serializable attributes of an object. Minimally, this includes describing the data type. In addition, the dimensionality, size, display name, number role, and more can be described by the schema.
An object-attribute can be either composite or atomic. This is signalled by the presence of either the value
attribute, or the values
attribute. If the value
attribute is present, then the schema is atomic, and the data is described by the ScalarValue subclass stored in the value
attribute of the schema. If the values
attribute is present, then the attribute is composite, and the schema can be recursed
by following the sub-attributes defined in the values
attribute of the schema, which is of type dict[str, Schema]
.
tlc
does not have a separate notion of list- or array-schemas. Instead, the dimensionality and shape of an attribute is described by the size0
, size1
, …, size5
attributes of [Schema], which are of type DimensionNumericValue.
Enough theory, let’s look at some examples!
[ ]:
# Examples of creating schemas for object attributes
# A schema describing a integer value between 0 and 100.
int_schema = tlc.Schema(
display_name="Integer Value Schema",
description="This is an example schema",
writable=False, # The value described by this schema is not writable
value=tlc.Int32Value(value_min=0, value_max=100),
)
# A schema describing a variable sized array of floats.
float_array_schema = tlc.Schema(
display_name="Float Array Schema",
description="This is an example schema",
writable=True, # The value described by this schema is writable
value=tlc.Float32Value(unit="ms"),
size0=tlc.DimensionNumericValue(value_min=1, value_max=10),
)
# Create a composite schema from the atomic schemas
composite_schema = tlc.Schema(
values={"float_array": float_array_schema, "int_value": int_schema},
)