Write a Table Directly from Row Data#
In this sample, we will write a tlc.Table
directly by adding rows from a JSON file one by one.
We will use a tlc.TableWriter
instance to build up the table, and then write it to a file.
The written table will be of type tlc.TableFromParquet
, and the table data will be backed by a parquet file.
Project Setup#
[2]:
PROJECT_NAME = "Table Writer Examples"
DATASET_NAME = "Mammoth"
TABLE_NAME = "mammoth-10k"
TLC_PUBLIC_EXAMPLES_DEVELOPER_MODE = True
INSTALL_DEPENDENCIES = False
[4]:
%%capture
if INSTALL_DEPENDENCIES:
%pip --quiet install torch --index-url https://download.pytorch.org/whl/cu118
%pip --quiet install torchvision --index-url https://download.pytorch.org/whl/cu118
%pip --quiet install tlc
Download Source Data#
We will use the 3D mammoth data as an example. This is a popular toy dataset commonly used in the dimensionality reduction literature.
The original data can be found in the PaCMAP github repository.
[7]:
import requests
response = requests.get("https://raw.githubusercontent.com/YingfanWang/PaCMAP/master/data/mammoth_3d.json")
input_data = response.json()
[8]:
<class 'list'>
10000
3
Writing the Table#
We construct a TableWriter
, which will determine the URL and the schema of the table we want to write. In our case, the table will contain a single column of 3-vectors.
[9]:
import tlc
column_name = "points"
table_writer = tlc.TableWriter(
dataset_name=DATASET_NAME,
project_name=PROJECT_NAME,
table_name=TABLE_NAME,
description="A table containing 10,000 3D points of a mammoth",
if_exists="overwrite",
column_schemas={column_name: tlc.FloatVector3Schema("3D Points")},
)
[10]:
# Next we add the data to the table row by row.
for point in input_data:
table_writer.add_row({column_name: point})
# Finally, we flush the table writer to ensure that all data is written to disk.
table = table_writer.finalize()
[11]:
# Inspect the first row of the table:
table[0]
[11]:
{'points': [430.82598876953125, 106.86399841308594, 24.492000579833984]}
Next Steps#
The written table can now be viewed in the 3LC Dashboard.
Some ideas for further exploration:
Visualize the data in a scatter plot
Apply dimensionality reduction to the data
Segment the data into clusters