tlc.client.utils#

Module Contents#

Classes#

Class

Description

SubsetSequentialSampler

Samples elements sequentially from a given list of indices.

RangeSampler

Samples elements sequentially from a range

RepeatByWeightSampler

Repeats elements based on their weight.

Functions#

Function

Description

bytes2str

str2bytes

take

batched_iterator

without_transforms

Ensures that, if the dataset is a Torchvision dataset, its transforms are temporarily removed.

relativize_with_max_depth

Relativize the given URL with respect to the given owner URL, up to a maximum depth.

standardized_transforms

Create a new transforms function which takes the whole sample as its only argument, rather than destructuring it.

get_column_from_pyarrow_table

Return a the specified column of the table as a pyarrow table.

API#

tlc.client.utils.bytes2str(obj: bytes) str#
tlc.client.utils.str2bytes(s: str) bytes#
tlc.client.utils.take(iterator: collections.abc.Iterator, batch_size: int) list#
tlc.client.utils.batched_iterator(iterator: collections.abc.Iterable, batch_size: int) collections.abc.Iterator[list]#
class tlc.client.utils.SubsetSequentialSampler(indices: list[int])#

Bases: torch.utils.data.sampler.Sampler[int]

Samples elements sequentially from a given list of indices.

class tlc.client.utils.RangeSampler(end: int, start: int = 0, step: int = 1)#

Bases: torch.utils.data.sampler.Sampler[int]

Samples elements sequentially from a range

class tlc.client.utils.RepeatByWeightSampler(weights: list[float], shuffle: bool = True)#

Bases: torch.utils.data.sampler.Sampler[int]

Repeats elements based on their weight.

tlc.client.utils.without_transforms(dataset: torch.utils.data.Dataset) Generator[Callable | None, None, None]#

Ensures that, if the dataset is a Torchvision dataset, its transforms are temporarily removed.

Parameters:

dataset – The dataset to temporarily remove transforms from.

tlc.client.utils.relativize_with_max_depth(url: tlc.core.url.Url, owner: tlc.core.url.Url, max_depth: int) tlc.core.url.Url#

Relativize the given URL with respect to the given owner URL, up to a maximum depth.

Deprecated: Use Url.to_relative_with_max_depth instead.

tlc.client.utils.standardized_transforms(transforms: Callable[..., Any]) Callable[[Any], Any]#

Create a new transforms function which takes the whole sample as its only argument, rather than destructuring it.

Parameters:

transforms – The transforms function to standardize.

Returns:

The standardized transforms function.

tlc.client.utils.get_column_from_pyarrow_table(table: pyarrow.Table, name: str, combine_chunks: bool = True) pyarrow.Array | pyarrow.ChunkedArray#

Return a the specified column of the table as a pyarrow table.

To get nested sub-columns, use dot notation. E.g. ‘column.sub_column’. The values in the column will be the row-view of the table. A column which is a PIL image in its sample-view, for instance, will be returned as a column of strings.

Parameters:
  • name – The name of the column to get.

  • combine_chunks – Whether to combine the chunks of the returned column in the case that it is a ChunkedArray. Defaults to True.

Returns:

A pyarrow array containing the specified column.

Raises:

KeyError – If the column does not exist in the table.