View source
Download
.ipynb
Weighted Table Subset Selection¶
This notebook demonstrates how to apply zero weights to a subset of table rows for selective data processing.

This technique is particularly useful in active learning and data labeling workflows, where only a subset of rows should be utilized for training or considered for labeling in each iteration.
Specifically, this example demonstrates balanced coreset selection on a dataset, setting all non-coreset rows’ weights to zero. The coreset selection strategy can be adapted to employ different approaches, such as random sampling, uncertainty-based sampling, or other model-driven selection criteria.
Install dependencies¶
[ ]:
%pip install 3lc
%pip install git+https://github.com/3lc-ai/3lc-examples.git
Imports¶
[ ]:
import tlc
from tlc_tools.split import get_balanced_coreset_indices, set_value_in_column_to_fixed_value
Project setup¶
[ ]:
PROJECT_NAME = "3LC Tutorials - CIFAR-10"
DATASET_NAME = "CIFAR-10-train"
TABLE_NAME = "initial"
Load input table¶
This assumes CIFAR-10-train has been created by running the notebook create-table-from-torch.ipynb.
[ ]:
table = tlc.Table.from_names(TABLE_NAME, DATASET_NAME, PROJECT_NAME)
Compute coreset¶
[ ]:
# This function ensures the coreset is exactly balanced in terms of the split_by column.
# The size parameter is the fraction of the minority class that should be included in the coreset.
coreset_indices, non_coreset_indices = get_balanced_coreset_indices(
table,
size=0.01, # CIFAR-10-train has 5000 samples per class, so 0.01 will result in 500 samples per class
split_by="Label",
random_seed=42,
)
Weight non-coreset rows to 0¶
[ ]:
coreset_table = set_value_in_column_to_fixed_value(
table,
"weight",
non_coreset_indices,
0.0,
)
[ ]:
coreset_table
Remove non-coreset samples¶
[ ]:
from tlc_tools.split import keep_indices
subset = keep_indices(
table, coreset_indices, table_name="balanced-subset", table_description="Keep only a size 500 coreset"
)