tlc.integration.hugging_face.table_from_hugging_face_dataset¶

A Table object for representing an in-memory Hugging Face dataset.

Module Contents¶

Classes¶

Class

Description

TableFromHuggingFaceDataset

A Table object for representing an in-memory Hugging Face dataset.

API¶

class TableFromHuggingFaceDataset(
hf_dataset: Dataset | None = None,
url: Url | None = None,
created: str | None = None,
description: str | None = None,
row_cache_url: Url | None = None,
row_cache_populated: bool | None = None,
override_table_rows_schema: Any = None,
init_parameters: Any = None,
input_tables: list[Url] | None = None,
)¶

Bases: tlc.integration.hugging_face.table_from_hugging_face_base._TableFromHuggingFaceBase

A Table object for representing an in-memory Hugging Face dataset.

The TableFromHuggingFaceDataset class wraps a pre-loaded datasets.Dataset instance. This is useful when the dataset has been constructed programmatically, filtered, or loaded locally.

Example:

import datasets
import tlc

hf_dataset = datasets.load_dataset("imdb", split="train")
table = tlc.Table.from_hugging_face_dataset(hf_dataset, table_name="imdb-train")
Parameters:

hf_dataset – An in-memory datasets.Dataset instance.

Returns:

An instance of the TableFromHuggingFaceDataset class.

Parameters:
  • url – The URL of the table.

  • created – The creation time of the table.

  • description – The description of the table.

  • row_cache_url – The URL of the row cache.

  • row_cache_populated – Whether the row cache is populated.

  • override_table_rows_schema – The schema to override the table rows schema.

  • init_parameters – The initial parameters of the table.

  • input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage. This parameter serves as an explicit mechanism for tracking table relationships beyond the automatic lineage tracing typically managed by subclasses.

property hf_dataset: Dataset¶