How to create a virtual column#

Virtual columns are metrics derived from existing columns in the Dashboard. They are called “virtual” columns because they do not persist after closing the Dashboard. Along with collected metrics, virtual columns help you gain insights into your data and models.

Virtual column operations can be viewed as built-in Dashboard algorithms for data manipulation. The Dashboard has extensive virtual column operations. For a full list of operations, please visit the Operations page.

Creating a virtual column#

To derive a virtual column, select one or more columns, RightClick on one of the selected columns, hover over Derive virtual column, and select an operation to derive the virtual column.

The virtual column will be next to the input column if there is only one input column or located at the far right of the table if there are multiple input columns. In the screenshot below, the virtual column Lossrank is derived from a single column Loss and the virtual column RMSE/MAE is derived from a divide operation of two columns RMSE and MAE.

Properties of virtual columns#

Virtual columns have following properties:

  • Virtual columns can be derived from one or more input columns.

  • Virtual columns can further be derived from input virtual columns.

  • The available operations depend on the type of data in the input columns. For instance, the False Positives (FP) operation is only available when the BBs and BBspredicted columns are selected.

  • Some operations, e.g., divide and group, are order-dependent, i.e., they depend on the selection order of the input columns.

  • Global operations, e.g., rank and occurrence, calculate the results based on the context of all rows, whereas local operations, e.g., abs and sum, calculate the results on a single row basis.

  • Virtual columns are dynamic, that is, the values of virtual columns will be automatically re-calculated if the values of the input columns change.

  • Virtual columns are always calcuated on the basis of the full table and not affected by filters. If you want to create a virtual column on a portion of the data, you can create a subset table and derive the virtual column there.

  • Some operations can be chained together such that an intermediate virtual column does not need to be created. Chained operations are indicated by an arrow next to the operation’s name. For instance, the group operation can often be chained up with subsequent operations as shown in the figure below. An accumulative loss of all epochs for each sample can be calculated by selecting sum under group. The derived virtual column is named Lossby(Example_id, Foreign table)_sum.

Use cases#

Below are a few examples to illustrate how virtual columns can be used and beneficial for your data analysis.

Distribution of aspect ratios#

If the images in your dataset have a wide range of aspect ratios, you can analyze the distribution of aspect ratios with following steps.

Steps:

  1. Derive an aspect ratio column Height/Width: Select the Height and Width columns, then select the Divide operation

  2. Derive a ranking column Heght/Widthrank: Select the Heght/Width column, then select the Rank operation

  3. Create a 2D chart of Height/Width vs Heght/Widthrank: Select the two columns and press 2

Forgetting events#

A forgetting event (FE) is defined as an correct prediction in an epoch followed by an incorrect one in the next epoch. FE-prone samples are often considered as hard ones to learn. We can find those FE-prone samples by following steps. Note: metrics of at least five epochs need to be collected to conduct this workflow.

Steps:

  1. Derive an column of accuracy difference between an epoch and previous one: Select Accuracy, then select the From previous operation. This virtual column is named Accuracyfrom_prev

  2. Derive an chained Group|Min column: Select the Accuracyfrom_prev, Example_id, and Foreign table in this order, then select the chained Group|Min operation. This derived column is named (Accuracy_from_prevby-(Example_id, Foreign table))_min

  1. Filter (Accuracy_from_prevby-(Example_id, Foreign table))_min to -1. The resulted samples are those with forgetting events.

Embeddings travel distances#

We can use this workflow to explore the samples that travel long distances in the embeddings space. Embeddings travel distances are the sum of Euclid lengths from one epoch to the next for a given sample. Long travel distances can be used as an indication of hard samples. This workflow requires embeddings collection for multiple epochs. Note that this workflow is also available in how to use workflows.

Steps:

  1. Derive a chained From previous|Length column on the embeddings: select the Embeddings column, then select the chained From previous|Length operation. This virtual column is named Embeddings_from_prevlength.

  2. Derive a chained Group|Sum column on the the Length virtual column: select Embeddings_from_prevlength, Example_id, and Foreign table in this order, then select the chained Group|Sum operation. The derived column is named (Embeddings_from_prev_lengthby-(Example_id, Foreign table))_sum.

  3. Filter (Embeddings_from_prev_lengthby-(Example_id, Foreign table))_sum to mid to high value range. The resulted samples are those with long embeddings travel distances.