`tlc.core.helpers.bulk_data_helper`¶

Module Contents¶

Classes¶

Class	Description
`BinaryChunkWriter`	Writes numpy arrays to a binary chunk file and tracks offsets.
`BulkDataAccessor`	Helper class for accessing bulk data rows from a Table.
`BulkDataHelper`	Helper class for working with bulk data.
`BulkDataRowProcessor`	Processes rows to extract bulk data arrays and replace with URL references.

Data¶

Data	Description
`CODEBOOKRAW_FILE_EXTENSION`
`CODEBOOKRAW_START_AND_LENGTH_SEPARATOR`
`CODEBOOKRAW_STRING_ROLE`
`CODEBOOK_PROPERTY_SUFFIX`
`CODEBOOK_URL_SEPARATOR`
`DEFAULT_CHUNK_SIZE_MB`

API¶

class BinaryChunkWriter( file_path: Path, table_url: Url, max_size_bytes: float, )¶

Writes numpy arrays to a binary chunk file and tracks offsets.

This class manages writing of raw binary data to a file, tracking the current position for offset-based URL generation. It will be used as an intermediate format that can later be virtualized.

Parameters:

file_path – Path to the binary chunk file to write
table_url – URL of the parent table for relative path calculation
max_size_bytes – Maximum size in bytes before rotation is suggested

close() → None¶: Close the file handle if open.

should_rotate() → bool¶

Check if the chunk file has reached its size limit.

Returns:: True if the current position exceeds the max size

write_array( array: ndarray, ) → str¶

Write a numpy array to the chunk file and return a URL reference.

The array is flattened and written as raw bytes. Returns a URL string in the format: ‘relative/path/file.raw:offset-length’

Parameters:: array – Numpy array to write
Returns:: URL string with offset and length

class BulkDataAccessor( table: Table, )¶

Helper class for accessing bulk data rows from a Table.

Initialize the BulkDataAccessor.

Parameters:: table – The Table to access bulk data rows from.

class BulkDataHelper¶

Helper class for working with bulk data.

static get_bulk_data_property_url( property_name: str, ) → str¶

Get the bulk data property URL for a given property name.

Parameters:: property_name – The name of the property to get the bulk data property URL for.
Returns:: The bulk data property URL.
Example:

BulkDataHelper.get_bulk_data_property_url('vertices_3d')
# 'vertices_3d_binary_property_url'

BulkDataHelper.get_bulk_data_property_url('sensors_2d.instances.vertices_2d_additional_data.range')
# 'sensors_2d.instances.vertices_2d_additional_data.range_binary_property_url'

class BulkDataRowProcessor( table_url: Url, paths: Sequence[str] | None = None, context_key: str | None = None, chunk_size_mb: float = DEFAULT_CHUNK_SIZE_MB, )¶

Processes rows to extract bulk data arrays and replace with URL references.

This class uses a unified configuration that can span multiple columns. All configured leaf arrays are written into a shared chunk per context (e.g., per sequence), which tends to be more efficient when writing row-by-row.

Parameters:

table_url – The URL of the target table (used to compute relative URLs)
paths – Sequence of full leaf paths to store as bulk data. All paths share one chunk file per context in <table_url>/../../bulk_data.
context_key – Optional row key used to group chunks (e.g., sequence_id)
chunk_size_mb – Maximum chunk file size in megabytes

close_all() → None¶: Close all active chunk writers.

process_batch( batch: MutableMapping[str, list[Any]], ) → MutableMapping[str, list[Any]]¶

process_row( row: dict[str, Any], ) → dict[str, Any]¶

Process a row to extract arrays and replace with URL references.

For each configured full path:

Traverses the row structure (lists are handled automatically)
Writes found arrays to the shared chunk file for the row’s context
Replaces the leaf with a sibling <leaf>_binary_property_url string or list of strings

Parameters:: row – Input row dictionary with actual numpy arrays
Returns:: Modified row with arrays replaced by URL references

CODEBOOKRAW_FILE_EXTENSION = .raw¶

CODEBOOKRAW_START_AND_LENGTH_SEPARATOR = -¶

CODEBOOKRAW_STRING_ROLE = URL/raw¶

CODEBOOK_PROPERTY_SUFFIX = _binary_property_url¶

CODEBOOK_URL_SEPARATOR = :¶

DEFAULT_CHUNK_SIZE_MB = 50.0¶

tlc.core.helpers.bulk_data_helper¶

Module Contents¶

Classes¶

Data¶

API¶

`tlc.core.helpers.bulk_data_helper`¶