3LC Python Package Version 2.6#


Enhancements and Fixes#

  • [12974] Catch mistakes with using collect_metrics with a pytorch Dataset or DataLoader and give more helpful error messages

  • [12956] Fixed a bug where training on a Table after deleting rows would result in an error

  • [13005] Fixed a bug with indexing Run objects that could cause updates during training not to be picked up

  • [12971] Fixed an issue where the new “$..” syntax for aliases could fail to expand correctly in some cases


Enhancements and Fixes#

  • [12899] Fixed an issue where TableWriter state could be set incorrectly if an exception occurred during setup

  • [12948] Fixed a crash when serializing large tensors for Tables

  • [12949] Hide large tensors in the Dashboard by default for now since they are only represented by a URL to where they are stored

  • [12953] Fixed bug in reduce_embeddings_multiple_parameters



  • [12697] Added support for Torch tensors to SampleType

  • [12748] Made it possible for arbitrarily large tensors and numpy arrays to work with 3LC Table

    • This is done by creating two distinct SampleType classes for each tensor type, Small- and Large-

    • A SmallNumpyArray will function as the previously named NumpyArray. The array is converted into a list of lists (of lists, etc.) and stored in the rows of the table by value. This quickly becomes infeasible for even moderately large arrays.

    • The new LargeNumpyArray serializes the array to a file in the bulk_data directory of the table, and places a reference to this file in the row of the table. When requesting the sample-view of that element, the array is loaded back into memory from disk. The values in these arrays won’t be visible or editable in the Dashboard, but looking at individual values in arrays with >1000 elements probably would not be very useful anyway.

  • [11203] Made it possible to delete rows altogether with an EditedTable, which can then be used to run training with those rows excluded from the dataset

  • [12740] Added Table.revision method that can take a tag, table_url, or table_name and return the relevant table

  • [12663] Made it possible to define aliases needed for a project within the project structure itself. This is useful in general, and in particular it will allow us to add public examples without requiring a new release of the Python package.

  • [12875] Made it possible to create a Table from a folder of images using Table.from_image_folder

Enhancements and Fixes#

  • [12313] Cache and re-use references to Dataloaders for metrics collection so that they do not have to be recreated for each worker thread, which caused a significant performance hit with using num_workers > 0 on Windows

  • [12528] Added an extra_columns argument to all Table.from_X methods that can be used to create schema for additional columns at table creation time

  • [12608] Made it possible to convert an inferred parquet schema to a 3LC schema when it contains a list of structs

  • [12291] Added average pooling flattening strategies for the embeddings metrics collector

  • [12603] Allow str argument (in addition to Url) as foreign_table_url in Run.reduce_embeddings_by_foreign_table_url

  • [12607] Do not add IOU to schema when compute_derived_metrics is False for bounding-box metrics collector

  • [12695] Make sure we don’t update modified time when checking if a directory is writable

  • [12698] When creating a new Table using the high-level from_X methods, if the specified table_url already exists, but that directory does not contain an object.3lc.json file, go ahead and delete the directory since it represents a malformed table that was never actually successfully created.

  • [12553] Infer device in example notebooks to support users on Mac (using mps) or without a GPU

  • [12692] Create a Run’s bulk-data folder on creation to avoid timing issues

  • [12735] Added shuffle argument to Table.create_sampler

  • [12774] Support YOLO YAML files without a path key in TableFromYolo

  • [12783] Made it so Url init raises on bad input

  • [12791] Lifted everything in tlc.client.utils up into the tlc namespace to make referencing contained types more convenient in e.g. the YOLO integration

  • [12776] Deprecated use of NumpyInt and NumpyFloat as SampleType, since a plain Int or Float works just as well

  • [12746] Fixed removal of columns that are deleted by override schema

  • [12665] Disable command palette for Object Service TUI since it is not intended to be used

  • [12837] Made it possible to reduce columns without the number role “nn_embedding”

  • [12837] Changed the default n_components for UMAP embeddings reduction to 2 since that is the default in the source umap package and it makes it consistent with pacmap

  • [12876] Make it possible to pass a single table to tlc.reduce_embeddings, and have it return a single table

Known Issues#

  • The tlc Python package does not detect, handle, or support NaN (Not-a-Number) values in tlc.Table, and their presence may lead to unpredictable behavior or inconsistencies within the system.