View source Download .ipynb

Compare dimensionality reduction methods¶

This notebook demonstrates how to perform dimensionality reduction on a column in a tlc.Table using two different dimensionality reduction algorithms, pacmap and umap.

image1

The Table we will be using in this notebook contains a column of points in 3 dimensions. We reduce these columns to points in 2 dimensions. While a dimensionality reduction from 3 to 2 is not the most typical use case for dimensionality reduction, it is a good way to visualize and compare the effects of different dimensionality reduction algorithms.

To run this notebook, you must also have run:

Install dependencies¶

[ ]:
%pip install "3lc[umap,pacmap]"
[ ]:
import tlc

# Load the table from the previous example. It contains a single column containing the 3D points.
table = tlc.Table.from_names(table_name="mammoth-10k", dataset_name="Mammoth", project_name="3LC Tutorials - Mammoth")

table.columns
[ ]:
umap_params_1 = {
    "n_components": 2,  # Project the data to 2 dimensions
    "n_neighbors": 15,  # Local connectivity, fewer neighbors create more local clusters
    "min_dist": 0.1,  # Minimum distance between points in the embedding space, preserves more local structure
    "metric": "euclidean",  # Use Euclidean distance to measure similarity
    "retain_source_embedding_column": True,
    "source_embedding_column": "points",
}

reduced_umap_1 = tlc.reduce_embeddings(table, method="umap", **umap_params_1)

umap_params_2 = {
    "n_components": 2,  # Project the data to 2 dimensions
    "n_neighbors": 50,  # Local connectivity, more neighbors create more global structure
    "min_dist": 0.5,  # Minimum distance between points in the embedding space, allows more spread out embedding
    "metric": "manhattan",  # Use Manhattan distance to measure similarity
    "retain_source_embedding_column": True,
    "source_embedding_column": "points",
}

reduced_umap_2 = tlc.reduce_embeddings(table, method="umap", **umap_params_2)
[ ]:
pacmap_param_1 = {
    "n_components": 2,  # Project the data to 2 dimensions
    "n_neighbors": 10,  # Number of neighbors to consider, fewer neighbors emphasize local structure
    "MN_ratio": 0.5,  # Ratio of mid-near pairs, balancing between local and global structure
    "FP_ratio": 2.0,  # Ratio of far pairs, emphasizing the global structure more
    "retain_source_embedding_column": True,
    "source_embedding_column": "points",
}

reduced_pacmap_1 = tlc.reduce_embeddings(reduced_umap_2, method="pacmap", **pacmap_param_1)

pacmap_param_2 = {
    "n_components": 2,  # Project the data to 2 dimensions
    "n_neighbors": 30,  # Number of neighbors to consider, more neighbors emphasize global structure
    "MN_ratio": 1.0,  # Ratio of mid-near pairs, equal balance between local and global structure
    "FP_ratio": 1.0,  # Ratio of far pairs, standard emphasis on global structure
    "retain_source_embedding_column": True,
    "source_embedding_column": "points",
}

reduced_pacmap_2 = tlc.reduce_embeddings(reduced_pacmap_1, method="pacmap", **pacmap_param_2)