tlc.core.helpers.bulk_data_helper¶

Module Contents¶

Classes¶

Class

Description

BinaryChunkWriter

Writes numpy arrays to a binary chunk file and tracks offsets.

BulkDataAccessor

Helper class for accessing bulk data rows from a Table.

BulkDataHelper

Helper class for working with bulk data.

BulkDataRowProcessor

Processes rows to extract bulk data arrays and replace with URL references.

Data¶

API¶

class BinaryChunkWriter(
file_path: Path,
table_url: Url,
max_size_bytes: float,
)¶

Writes numpy arrays to a binary chunk file and tracks offsets.

This class manages writing of raw binary data to a file, tracking the current position for offset-based URL generation. It will be used as an intermediate format that can later be virtualized.

Parameters:
  • file_path – Path to the binary chunk file to write

  • table_url – URL of the parent table for relative path calculation

  • max_size_bytes – Maximum size in bytes before rotation is suggested

close() None¶

Close the file handle if open.

should_rotate() bool¶

Check if the chunk file has reached its size limit.

Returns:

True if the current position exceeds the max size

write_array(
array: ndarray,
) str¶

Write a numpy array to the chunk file and return a URL reference.

The array is flattened and written as raw bytes. Returns a URL string in the format: ‘relative/path/file.raw:offset-length’

Parameters:

array – Numpy array to write

Returns:

URL string with offset and length

class BulkDataAccessor(
table: Table,
)¶

Helper class for accessing bulk data rows from a Table.

Initialize the BulkDataAccessor.

Parameters:

table – The Table to access bulk data rows from.

class BulkDataHelper¶

Helper class for working with bulk data.

static get_bulk_data_property_url(
property_name: str,
) str¶

Get the bulk data property URL for a given property name.

Parameters:

property_name – The name of the property to get the bulk data property URL for.

Returns:

The bulk data property URL.

Example:

BulkDataHelper.get_bulk_data_property_url('vertices_3d')
# 'vertices_3d_binary_property_url'

BulkDataHelper.get_bulk_data_property_url('sensors_2d.instances.vertices_2d_additional_data.range')
# 'sensors_2d.instances.vertices_2d_additional_data.range_binary_property_url'
static get_bulk_data_url(
base_path: Url,
start: int,
length: int,
) str¶

Get the bulk data URL for a given base path, start, and length.

Parameters:
  • base_path – The base path to the bulk data file. (can be absolute or relative)

  • start – The start offset of the data in the file.

  • length – The length of the data in the file.

Returns:

The bulk data URL.

class BulkDataRowProcessor(
table_url: Url,
paths: Sequence[str] | None = None,
context_key: str | None = None,
chunk_size_mb: float = DEFAULT_CHUNK_SIZE_MB,
bulk_data_url: Url | Path | str | None = None,
)¶

Processes rows to extract bulk data arrays and replace with URL references.

This class uses a unified configuration that can span multiple columns. All configured leaf arrays are written into a shared chunk per context (e.g., per sequence), which tends to be more efficient when writing row-by-row.

Parameters:
  • table_url – The URL of the target table (used to compute relative URLs)

  • paths – Sequence of full leaf paths to store as bulk data. All paths share one chunk file per context in <table_url>/../../bulk_data.

  • context_key – Optional row key used to group chunks (e.g., sequence_id)

  • chunk_size_mb – Maximum chunk file size in megabytes

close_all() None¶

Close all active chunk writers.

process_batch(
batch: MutableMapping[str, list[Any]],
) MutableMapping[str, list[Any]]¶
process_row(
row: dict[str, Any],
) dict[str, Any]¶

Process a row to extract arrays and replace with URL references.

For each configured full path:

  • Traverses the row structure (lists are handled automatically)

  • Writes found arrays to the shared chunk file for the row’s context

  • Replaces the leaf with a sibling <leaf>_binary_property_url string or list of strings

Parameters:

row – Input row dictionary with actual numpy arrays

Returns:

Modified row with arrays replaced by URL references

CODEBOOKRAW_FILE_EXTENSION = .raw¶
CODEBOOKRAW_START_AND_LENGTH_SEPARATOR = -¶
CODEBOOKRAW_STRING_ROLE = URL/raw¶
CODEBOOK_PROPERTY_SUFFIX = _binary_property_url¶
CODEBOOK_URL_SEPARATOR = :¶
DEFAULT_CHUNK_SIZE_MB = 50.0¶