tlc.core.helpers.bulk_data_helper¶
Module Contents¶
Classes¶
Class |
Description |
|---|---|
Writes numpy arrays to a binary chunk file and tracks offsets. |
|
Helper class for accessing bulk data rows from a Table. |
|
Helper class for working with bulk data. |
|
Processes rows to extract bulk data arrays and replace with URL references. |
Data¶
API¶
- class BinaryChunkWriter( )¶
Writes numpy arrays to a binary chunk file and tracks offsets.
This class manages writing of raw binary data to a file, tracking the current position for offset-based URL generation. It will be used as an intermediate format that can later be virtualized.
- Parameters:
file_path – Path to the binary chunk file to write
table_url – URL of the parent table for relative path calculation
max_size_bytes – Maximum size in bytes before rotation is suggested
- should_rotate() bool¶
Check if the chunk file has reached its size limit.
- Returns:
True if the current position exceeds the max size
- write_array(
- array: ndarray,
Write a numpy array to the chunk file and return a URL reference.
The array is flattened and written as raw bytes. Returns a URL string in the format: ‘relative/path/file.raw:offset-length’
- Parameters:
array – Numpy array to write
- Returns:
URL string with offset and length
- class BulkDataAccessor(
- table: Table,
Helper class for accessing bulk data rows from a Table.
Initialize the BulkDataAccessor.
- Parameters:
table – The Table to access bulk data rows from.
- class BulkDataHelper¶
Helper class for working with bulk data.
- static get_bulk_data_property_url(
- property_name: str,
Get the bulk data property URL for a given property name.
- Parameters:
property_name – The name of the property to get the bulk data property URL for.
- Returns:
The bulk data property URL.
- Example:
BulkDataHelper.get_bulk_data_property_url('vertices_3d') # 'vertices_3d_binary_property_url' BulkDataHelper.get_bulk_data_property_url('sensors_2d.instances.vertices_2d_additional_data.range') # 'sensors_2d.instances.vertices_2d_additional_data.range_binary_property_url'
- class BulkDataRowProcessor(
- table_url: Url,
- paths: Sequence[str] | None = None,
- context_key: str | None = None,
- chunk_size_mb: float = DEFAULT_CHUNK_SIZE_MB,
Processes rows to extract bulk data arrays and replace with URL references.
This class uses a unified configuration that can span multiple columns. All configured leaf arrays are written into a shared chunk per context (e.g., per sequence), which tends to be more efficient when writing row-by-row.
- Parameters:
table_url – The URL of the target table (used to compute relative URLs)
paths – Sequence of full leaf paths to store as bulk data. All paths share one chunk file per context in
<table_url>/../../bulk_data.context_key – Optional row key used to group chunks (e.g.,
sequence_id)chunk_size_mb – Maximum chunk file size in megabytes
- process_batch(
- batch: MutableMapping[str, list[Any]],
- process_row( ) dict[str, Any]¶
Process a row to extract arrays and replace with URL references.
For each configured full path:
Traverses the row structure (lists are handled automatically)
Writes found arrays to the shared chunk file for the row’s context
Replaces the leaf with a sibling
<leaf>_binary_property_urlstring or list of strings
- Parameters:
row – Input row dictionary with actual numpy arrays
- Returns:
Modified row with arrays replaced by URL references
- CODEBOOKRAW_FILE_EXTENSION = .raw¶
- CODEBOOKRAW_START_AND_LENGTH_SEPARATOR = -¶
- CODEBOOKRAW_STRING_ROLE = URL/raw¶
- CODEBOOK_PROPERTY_SUFFIX = _binary_property_url¶
- CODEBOOK_URL_SEPARATOR = :¶
- DEFAULT_CHUNK_SIZE_MB = 50.0¶