Bulk Data in 3LC

../../_images/kitti-dark.png
../../_images/kitti-light.png

Bulk data refers to data that is too large to be included in the serialized version of a 3LC Table. Instead, the data is stored in separate files on disk, and the references are stored in the Table. When a Table is viewed in the 3LC Dashboard, only the required data is requested from the Object Service, which leads to faster loading times and lower memory usage.

External Data Pattern

The external data pattern will be familiar to 3LC users working with images, as the same pattern is used to store and load images on-demand. External bulk data is an extension of this pattern to general binary data.

Experimental Feature

Bulk data geometry is an experimental feature. APIs and usage patterns may change in the future.

Example

The following example shows how to store a single point cloud in a 3LC Table using bulk data.

import numpy as np

import tlc
from tlc.core.helpers.bulk_data_helper import BulkDataAccessor

# Define the bounds of the point cloud
bounds = (-100, -100, -100, 100, 100, 100)

# Create a point cloud
points = np.random.rand(1000, 3).astype(np.float32)

# Define a schema for the column containing the point cloud
schema = tlc.Geometry3DSchema(
    include_3d_vertices=True,
    is_bulk_data=True,
)

# Create a Geometry3DInstances object for the point cloud
instances = tlc.Geometry3DInstances.create_empty(*bounds)
instances.add_instance(points)

# Create a TableWriter for writing the point cloud
writer = tlc.TableWriter(
    table_name="point_cloud",
    project_name="point_cloud_project",
    column_schemas={"points": schema},
)

# Write the point cloud to disk
writer.add_row({"points": instances.to_row()})
table = writer.finalize()

# The bulk data is stored in the Table's bulk_data_url property.
print(table.bulk_data_url)
# Url('relative://../../bulk_data/samples/point_cloud')

# The Table can now be loaded and visualized in the 3LC Dashboard.
# To access the point cloud data in Python, use a BulkDataAccessor
accessor = BulkDataAccessor(table)
row = accessor[0]
points_reloaded = tlc.Geometry3DInstances.from_row(row["points"])

# The data read back is now a flattened array of floats.
np.testing.assert_array_equal(points.reshape(-1), points_reloaded.vertices[0])

Limitations

Bulk data is currently limited to geometric data - vertices, lines, and triangles. When creating a bulk data Table, the data will be cached in a binary format on disk. Writing bulk data Tables directly to object storage is not supported, but a bulk data Table can be copied directly to a remote location and will be accessible from the new location without any additional configuration (beyond configuring the Object Service to scan the new location).

Storage Size

Ingesting bulk data requires additional storage, so the size of the data should be considered. Monitor the size of the cached data and manage bulk data folders carefully to avoid running out of space.