tlc.core.helpers.bulk_data_helper

Module Contents

Classes

Class

Description

BinaryChunkWriter

Writes numpy arrays to a binary chunk file and tracks offsets.

BulkDataAccessor

Helper class for accessing bulk data rows from a Table.

BulkDataHelper

Helper class for working with bulk data.

BulkDataRowProcessor

Processes rows to extract bulk data arrays and replace with URL references.

Data

API

class BinaryChunkWriter(
file_path: Path,
table_url: Url,
max_size_bytes: float,
)

Writes numpy arrays to a binary chunk file and tracks offsets.

This class manages writing of raw binary data to a file, tracking the current position for offset-based URL generation. It will be used as an intermediate format that can later be virtualized.

Parameters:
  • file_path – Path to the binary chunk file to write

  • table_url – URL of the parent table for relative path calculation

  • max_size_bytes – Maximum size in bytes before rotation is suggested

close() None

Close the file handle if open.

should_rotate() bool

Check if the chunk file has reached its size limit.

Returns:

True if the current position exceeds the max size

write_array(
array: ndarray,
) str

Write a numpy array to the chunk file and return a URL reference.

The array is flattened and written as raw bytes. Returns a URL string in the format: ‘relative/path/file.raw:offset-length’

Parameters:

array – Numpy array to write

Returns:

URL string with offset and length

class BulkDataAccessor(
table: Table,
)

Helper class for accessing bulk data rows from a Table.

Initialize the BulkDataAccessor.

Parameters:

table – The Table to access bulk data rows from.

class BulkDataHelper

Helper class for working with bulk data.

static get_bulk_data_property_url(
property_name: str,
) str

Get the bulk data property URL for a given property name.

Parameters:

property_name – The name of the property to get the bulk data property URL for.

Returns:

The bulk data property URL.

Example:

BulkDataHelper.get_bulk_data_property_url('vertices_3d')
# 'vertices_3d_binary_property_url'

BulkDataHelper.get_bulk_data_property_url('sensors_2d.instances.vertices_2d_additional_data.range')
# 'sensors_2d.instances.vertices_2d_additional_data.range_binary_property_url'
class BulkDataRowProcessor(
table_url: Url,
paths: Sequence[str] | None = None,
context_key: str | None = None,
chunk_size_mb: float = DEFAULT_CHUNK_SIZE_MB,
)

Processes rows to extract bulk data arrays and replace with URL references.

This class uses a unified configuration that can span multiple columns. All configured leaf arrays are written into a shared chunk per context (e.g., per sequence), which tends to be more efficient when writing row-by-row.

Parameters:
  • table_url – The URL of the target table (used to compute relative URLs)

  • paths – Sequence of full leaf paths to store as bulk data. All paths share one chunk file per context in <table_url>/../../bulk_data.

  • context_key – Optional row key used to group chunks (e.g., sequence_id)

  • chunk_size_mb – Maximum chunk file size in megabytes

close_all() None

Close all active chunk writers.

process_batch(
batch: MutableMapping[str, list[Any]],
) MutableMapping[str, list[Any]]
process_row(
row: dict[str, Any],
) dict[str, Any]

Process a row to extract arrays and replace with URL references.

For each configured full path:

  • Traverses the row structure (lists are handled automatically)

  • Writes found arrays to the shared chunk file for the row’s context

  • Replaces the leaf with a sibling <leaf>_binary_property_url string or list of strings

Parameters:

row – Input row dictionary with actual numpy arrays

Returns:

Modified row with arrays replaced by URL references

CODEBOOKRAW_FILE_EXTENSION = .raw
CODEBOOKRAW_START_AND_LENGTH_SEPARATOR = -
CODEBOOKRAW_STRING_ROLE = URL/raw
CODEBOOK_PROPERTY_SUFFIX = _binary_property_url
CODEBOOK_URL_SEPARATOR = :
DEFAULT_CHUNK_SIZE_MB = 50.0