tlc.core.objects.tables.from_tables.joined_table#

A procedural table where multiple input tables are joined into a singular table.

Module Contents#

Classes#

Class

Description

JoinedTable

A procedural table where multiple input tables are joined into a singular table.

API#

class tlc.core.objects.tables.from_tables.joined_table.JoinedTable(url: tlc.core.url.Url | None = None, created: str | None = None, row_cache_url: tlc.core.url.Url | None = None, row_cache_populated: bool | None = None, input_table_urls: Sequence[tlc.core.url.Url | tlc.core.objects.table.Table] | None = None, init_parameters: Any = None)#

Bases: tlc.core.objects.table.Table

A procedural table where multiple input tables are joined into a singular table.

This table allows for combining tables in a procedural fashion and provides tools to manage the collective data and schemas of these joined tables.

Example:

from tlc.core.objects.tables import JoinedTable
from tlc.core.url import Url

first_table = ...
second_table = ...

joined_table_url = Url("/path/to/joined_table.json")
joined_table = JoinedTable(url=joined_table_url, input_table_urls=[first_table, second_table])
assert len(joined_table) == len(first_table)  len(second_table)
# A joined table has now been created, but it is not yet persisted.

joined_table.write_to_url()
# The joined table is now persisted to the given URL.

The tables being joined must have the same columns, but the columns are allowed to have different schemas. In this case, the schemas will be joined together, with the following rules:

  • If the schemas are atomic (i.e. they have a value), the schemas must be compatible (i.e. have the same type). If the values have different value maps, a new joined value map will be created.

  • If the schemas are not atomic (i.e. they have sub-schemas), the schemas will be joined recursively.

  • If any of the schemas are incompatible, a ValueError will be raised.

Parameters:
  • url – The URL where the table should be persisted.

  • created – The creation timestamp for the table.

  • dataset_name – The name of the dataset the table belongs to.

  • row_cache_url – The URL for caching rows.

  • row_cache_populated – Flag indicating if the row cache is populated.

  • input_table_urls – A list of URLs or table references for the tables to be joined.

  • init_parameters – Parameters for initializing the table from JSON.

is_all_parquet() bool#

Check if all leaf-tables are stored as parquet

flatten_input_files() list[tlc.core.url.Url]#

Collect the input_url property of all TableFromParquet leaf-tables