Row Cache¶

3LC supports caching the row data of any given Table. While the actual serialization format is an implementation detail, we currently use Parquet as the default format. And while the location of this cache is custom, the row_cache_url is typically row_cache.parquet relative to the Table directory.

For any Table, when sample or row data is requested, cached data is efficiently read and presented when a row cache exists. Otherwise, the data production pipeline is invoked which could involve many tables referencing each other.

When to cache data¶

When data production is expensive, it can be a good idea to cache your data such that it loads faster and doesn’t have to be produced each time a new process requests the data. Consider caching row data when:

  • Input format is expensive to parse (e.g. Table.from_yolo())

  • Lineage has grown long

  • Input data is transient, such as an in-memory Python list with data. The row cache can be used as a means to persist the data.

How to create row cache for a table¶

Currently, all importer Table.from_* methods create a row cache by default. For Tables without a row cache, one may be created using:

table.write_to_row_cache(create_url_if_empty=True)

In the Dashboard¶

In the Dashboard, you can view the row cache url column in the Tables pane for a given Project to see if a Table has a persisted row cache. Notice that the initial Tables have row caches, while the sparse revision EditedTables do not.