Sample Weights#
Not all samples are created equal, some are more important than others. When training a model, you might want some samples to appear more often, some less often, and some not at all. This is where sample weights come in.
When to use sample weights#
Sample weights can be useful in a variety of scenarios. For example, you might want to:
Balance your dataset by giving more weight to underrepresented classes or properties
Focus on samples that are more difficult for your model
Ignore samples that are irrelevant to your current task
Start training with a small labelled subset of your data and gradually label and include more samples as the model improves
Modifying weights in the 3LC dashboard#
When viewing a Table
or Run
in the 3LC dashboard, you will see a column labelled “weight”, with its value set to 1
for all rows by default. Increasing this value tells 3LC that you want to see this sample more often, and decreasing it
tells 3LC that you want to see it less often. Setting the value to 0 tells 3LC to ignore the sample during training
entirely. Like all other value modifications in the Dashboard, weights can be updated on selections of multiple rows at
once.
Using weights during training#
Having edited the weights of a Table in the 3LC dashboard, you probably want to try them out in a new Run. The exact
syntax for enabling sample weights during training will depend based on the library or integration you are using. If you
are using a PyTorch DataLoader, for example, you can use your Table to create a Sampler
object and pass it to the
DataLoader, as shown in the following example:
from torch.utils.data import DataLoader
import tlc
dataset = ...
# A table with .latest() will use the latest revision of the table,
# which includes the weights you set in the 3LC dashboard
table = tlc.Table.from_torch_dataset(dataset, table_name="example_table").latest()
# Pass the output of table.create_sampler() to the sampler argument of the DataLoader
dataloader = DataLoader(dataset, sampler=table.create_sampler())
The create_sampler
method
returns a Sampler
object that can be used to sample from the dataset with the weights you set in the 3LC dashboard.
This method can also take additional arguments to control the behavior of the sampler.
Using weights during metrics collection#
When collecting metrics with 3LC, metrics will be collected for all samples by default, regardless of their weight. Just
like using sample weights for training, the syntax for using sample weights during metrics collection will depend on the
library or integration you are using. If you are using the
collect_metrics
function in your code, you can use the
exclude_zero_weights
argument to exclude samples with a weight of 0 from the metrics collection. If set, these samples
will not appear in the generated Run in the Dashboard.