Sample Weights#

Not all samples are created equal, some are more important than others. When training a model, you might want some samples to appear more often, some less often, and some not at all. This is where sample weights come in.

When to use sample weights#

Sample weights can be useful in a variety of scenarios. For example, you might want to:

  • Balance your dataset by giving more weight to underrepresented classes or properties

  • Focus on samples that are more difficult for your model

  • Ignore samples that are irrelevant to your current task

  • Start training with a small labelled subset of your data and gradually label and include more samples as the model improves

Modifying weights in the 3LC dashboard#

When viewing a Table or Run in the 3LC dashboard, you will see a column labelled “weight”, with its value set to 1 for all rows by default. Increasing this value tells 3LC that you want to see this sample more often, and decreasing it tells 3LC that you want to see it less often. Setting the value to 0 tells 3LC to ignore the sample during training entirely. Like all other value modifications in the Dashboard, weights can be updated on selections of multiple rows at once.

Using weights during training#

Having edited the weights of a Table in the 3LC dashboard, you probably want to try them out in a new Run. The exact syntax for enabling sample weights during training will depend based on the library or integration you are using. If you are using a PyTorch DataLoader, for example, you can use your Table to create a Sampler object and pass it to the DataLoader, as shown in the following example:

from torch.utils.data import DataLoader
import tlc

dataset = ...

# A table with .latest() will use the latest revision of the table, 
# which includes the weights you set in the 3LC dashboard
table = tlc.Table.from_torch_dataset(dataset, table_name="example_table").latest()

# Pass the output of table.create_sampler() to the sampler argument of the DataLoader
dataloader = DataLoader(dataset, sampler=table.create_sampler())

The create_sampler method returns a Sampler object that can be used to sample from the dataset with the weights you set in the 3LC dashboard. This method can also take additional arguments to control the behavior of the sampler.

Using weights during metrics collection#

When collecting metrics with 3LC, metrics will be collected for all samples by default, regardless of their weight. Just like using sample weights for training, the syntax for using sample weights during metrics collection will depend on the library or integration you are using. If you are using the collect_metrics function in your code, you can use the exclude_zero_weights argument to exclude samples with a weight of 0 from the metrics collection. If set, these samples will not appear in the generated Run in the Dashboard.