tlc.integration.hugging_face.table_from_hugging_face#

A Table object for representing a Hugging Face dataset.

Module Contents#

Classes#

Class

Description

TableFromHuggingFace

A Table object for representing a Hugging Face dataset.

API#

class tlc.integration.hugging_face.table_from_hugging_face.TableFromHuggingFace(hugging_face_path: str | None = None, hugging_face_name: str | None = None, hugging_face_split: str | None = None, url: tlc.core.url.Url | None = None, created: str | None = None, description: str | None = None, row_cache_url: tlc.core.url.Url | None = None, row_cache_populated: bool | None = None, override_table_rows_schema: Any = None, init_parameters: Any = None, input_tables: list[tlc.core.url.Url] | None = None)#

Bases: tlc.core.objects.tables.in_memory_rows_table._InMemoryRowsTable

A Table object for representing a Hugging Face dataset.

The TableFromHuggingFace class is an interface between 3LC and the Hugging Face datasets library. For datasets with multiple subsets, use hugging_face_name to specify the subset. Use hugging_face_split to specify the desired split.

Example:

table = TableFromHuggingFace(
    hugging_face_path="glue",
    hugging_face_name="mrpc",
    hugging_face_split="train",
)
print(table.table_rows[0])
Parameters:
  • hugging_face_path – The path to the Hugging Face dataset.

  • hugging_face_name – Name or configuration of the subset. Optional.

  • hugging_face_split – The split to use. Optional, defaults to train.

Returns:

An instance of the TableFromHuggingFace class.

Raises:

ValueError if the Hugging Face dataset is not provided.

Parameters:
  • url – The URL of the table.

  • created – The creation time of the table.

  • description – The description of the table.

  • row_cache_url – The URL of the row cache.

  • row_cache_populated – Whether the row cache is populated.

  • override_table_rows_schema – The schema to override the table rows schema.

  • init_parameters – The initial parameters of the table.

  • input_tables – A list of Table URLs that are considered direct predecessors in this table’s lineage. This parameter serves as an explicit mechanism for tracking table relationships beyond the automatic lineage tracing typically managed by subclasses.

property hf_dataset: datasets.Dataset#
property sample_type: tlc.client.sample_type.SampleType#