Virtual Columns

Virtual columns are metrics derived from existing columns in the Dashboard, through applying some operation to one or more columns. Filtering and sorting on these virtual columns can give you deeper insights into your model and training set and help you find issues with your data, your model, or perhaps your entire approach to the problem at hand. The exact operations available will depend on the type of data in the selected columns. In general we can divide operations into two main categories: Global operations and Local operations. For a full list of operations, see Operations.

Creating a virtual column

To derive a virtual column, select one or more columns, RightClick on one of the selected columns, hover over Derive virtual column, and select an operation to derive the virtual column.

The virtual column will be next to the input column if there is only one input column or located at the far right of the table if there are multiple input columns. In the screenshot below, the virtual column Lossrank is derived from a single column Loss and the virtual column RMSE/MAE is derived from a divide operation of two columns RMSE and MAE.

Properties of virtual columns

Virtual columns have following properties:

  • Virtual columns can be derived from one or more input columns.

  • Virtual columns can further be derived from input virtual columns.

  • The available operations depend on the type of data in the input columns. For instance, the False Positives (FP) operation is only available when the BBs and BBspredicted columns are selected.

  • Some operations, e.g., divide and group, are order-dependent, i.e., they depend on the selection order of the input columns.

  • Global operations, e.g., rank and occurrence, calculate the results based on the context of all rows, whereas local operations, e.g., abs and sum, calculate the results on a single row basis.

  • Virtual columns are dynamic, that is, the values of virtual columns will be automatically re-calculated if the values of the input columns change.

  • Virtual columns are always calcuated on the basis of the full table and not affected by filters. If you want to create a virtual column on a portion of the data, you can create a subset table and derive the virtual column there.

  • Some operations can be chained together such that an intermediate virtual column does not need to be created. Chained operations are indicated by an arrow next to the operation’s name. For instance, the group operation can often be chained up with subsequent operations as shown in the figure below. An accumulative loss of all epochs for each sample can be calculated by selecting sum under group. The derived virtual column is named Lossby(Example_id, Foreign table)_sum.

Use cases

Below are a few examples to illustrate how virtual columns can be used and beneficial for your data analysis.

Distribution of aspect ratios

If the images in your dataset have a wide range of aspect ratios, you can analyze the distribution of aspect ratios with following steps.

Steps:

  1. Derive an aspect ratio column Height/Width: Select the Height and Width columns, then select the Divide operation

  2. Derive a ranking column Heght/Widthrank: Select the Heght/Width column, then select the Rank operation

  3. Create a 2D chart of Height/Width vs Heght/Widthrank: Select the two columns and press 2

Forgetting events

A forgetting event (FE) is defined as an correct prediction in an epoch followed by an incorrect one in the next epoch. FE-prone samples are often considered as hard ones to learn. We can find those FE-prone samples by following steps. Note: metrics of at least five epochs need to be collected to conduct this workflow.

Steps:

  1. Derive an column of accuracy difference between an epoch and previous one: Select Accuracy, then select the From previous operation. This virtual column is named Accuracyfrom_prev

  2. Derive an chained Group|Min column: Select the Accuracyfrom_prev, Example_id, and Foreign table in this order, then select the chained Group|Min operation. This derived column is named (Accuracy_from_prevby-(Example_id, Foreign table))_min

  3. Filter (Accuracy_from_prevby-(Example_id, Foreign table))_min to -1. The resulted samples are those with forgetting events.

Embeddings travel distances

We can use this workflow to explore the samples that travel long distances in the embeddings space. Embeddings travel distances are the sum of Euclid lengths from one epoch to the next for a given sample. Long travel distances can be used as an indication of hard samples. This workflow requires embeddings collection for multiple epochs. Note that this workflow is also available in how to use workflows.

Steps:

  1. Derive a chained From previous|Length column on the embeddings: select the Embeddings column, then select the chained From previous|Length operation. This virtual column is named Embeddings_from_prevlength.

  2. Derive a chained Group|Sum column on the the Length virtual column: select Embeddings_from_prevlength, Example_id, and Foreign table in this order, then select the chained Group|Sum operation. The derived column is named (Embeddings_from_prev_lengthby-(Example_id, Foreign table))_sum.

  3. Filter (Embeddings_from_prev_lengthby-(Example_id, Foreign table))_sum to mid to high value range. The resulted samples are those with long embeddings travel distances.

Operations

The following sections describe all the available operations for creating virtual columns.

Global operations

Global operations need the full context of every row of its input columns to produce a result for any individual row. The Occurrences operation, which counts the number of times each value(s) occur in one or more columns, is an example of a global operation. A complete list of the available global operations, along with their input requirements, can be seen below.

Operation

Input Type

Output type

Description

Traversal index

Any

Number

Traversal index maximizing the walk within the input column coordinate space

Bin

Number

Number

The input value, discretized to a set number of bins

Cluster by threshold

Any

Number

Groups data points by specified distance threshold.

Rank

Any

Number

The rank of each row, as sorted by the input value(s)

Group

Any

Same as input

Group index shared by all rows with the same value(s) in the input column(s)

Nearest neighbor

Numbers

Number

Distance to closest neighbor within the input column coordinate space

Nearest neighbors

Numbers

Numbers

Links to closest neighbors within the input column coordinate space

Group-neighbor links

Any

Numbers

Links to Table rows within same group

Zip

Number

Number

Joins (‘zips’) multiple columns into a single, component column.

Link distance

Numbers

Numbers

The ‘jump’ distance of each link.

Category

Any

Number

The category of each input value (i.e. their index within a combined value map.

Derivative

Numbers

Numbers

For a numeric input column evolving over time, this operation returns the difference between the next value and the previous one, divided by two (and handling first and last values appropriately).

Deviation

Numbers

Numbers

For a numeric input column evolving over time, this operation returns the difference between a value and the average between the next and previous one.

From previous

Numbers

Numbers

For a numeric input column evolving over time, this operation returns the difference between this value and the previous one.

To next

Numbers

Numbers

For a numeric input column evolving over time, this operation returns the difference between the next value and the current one.

Aggregate

Numbers

Numbers

For a numeric input column evolving over time, this operation reports the aggregated sum at each epoch/iteration.

Occurrence

Any

Number

Number of columns with the same value(s) in the input column(s)

Primary element

Any

Boolean

Whether an element is first in a group or not

Normalize

Number

Number

Normalizes the column so that its sum is 1.

Semantic matches (before)

String

Number

How many candidates were originally matched between the wide- and the narrow-search scenarios.

Semantic matches (after)

String

Number

How many candidates are finally matched between the wide- and the narrow-search scenarios.

Chunk-to-question links

String

Number

Links to Questions referring to this Article chunk.

Question-to-chunk links

String

Number

Links to article chunks referenced by this question.

Chunk relation

String

Number

Relation between this Table row and the current selection

Embedding distances (before)

String

Number

The original distances from the question to the top-K candidates (in ND-embedding space).

Embedding distances (after)

String

Number

The final distances from the question to the top-K candidates (in ND-embedding space).

Semantic similarities (before)

String

Number

The original semantic similarities between the question and the top-K candidates from embedding space.

Semantic similarities (after)

String

Number

The final semantic similarities between the question and the top-K candidates from embedding space.

In foreign table

Number

Boolean

Whether the row is present in a foreign table

Run constants

Number

Any

The entire ‘constants’ structure of the referenced Run

Foreign table row

Number

Any

The entire row of a foreign table, as referenced by a foreign key in the input table row

Foreign table row edited

Number

Boolean

Whether the foreign table row has any pending edits

Index

None

Number

Row index within table

Filtered index

None

Number

Row index within filtered table

Random

None

Number

Random value [0..1]

Local operations

Local operations only need the context of a single row to produce a result. We can further subdivide local operations into three categories: Unary operations, Order-independent operations, and Order-dependent operations.

Unary operations

Unary operations take a single column as input, and produce a result for each row in that column. The Character count operation, which returns the length of a string, is an example of a unary operation. A complete list of the available unary operations, along with their input requirements, can be seen below.

Operation

Input Type

Output type

Description

Parse string

String

Number

The input string parsed as a number. In the case of multiple numeric values being present, the last one is reported.

PickInner[…]

Any

Any

A single element picked from dimension 0 of an array.

Pick[…]

Any

Any

A single element picked from the outermost dimension of an array.

PickProperty[…]

Any

Any

A child property picked from a composite property.

PickUnrolledProperty[…]

Any

Any

A child property picked by unrulling a composite property.

Count

String

Number

The number of elements within the input array.

Element index

Numbers

Numbers

The index of each array element.

Reverse element index

Numbers

Numbers

The reverse index of each array element (i.e. 0 refers to last element in array).

Pick random

Any

Any

A random element picked from the input array (if any)

Normalize

Numbers

Numbers

The input vector in normalized form

Abs

Number

Number

The absolute value of the numeric input value

Character count

String

Number

The number of characters in the input string (including whitespace).

Log

Number

Number

The log value of the numeric input value

Non-zero

Numbers

Boolean

Whether the numeric input value is non-zero

Not

Any

Boolean

The boolean ‘not’ of the input value(s)

Sign

Number

Number

The sign (i.e. -1, 0, or 1) of the input value(s).

Inverse

Number

Number

The inverse (i.e. 1/x) of the input value(s).

Raw

Any

Same as input

The raw value of the input property, i.e. with value maps and/or string roles removed from the schema (recursive when required)

Word count

String

Number

The number of words in the input string

Filename

String

String

The file name from the input URL

Zero

Number

Boolean

Whether the numeric input value is zero

Positive

Number

Boolean

Whether the numeric input value is above zero

Negative

Number

Boolean

Whether the numeric input value is below zero

In range

Number

Boolean

Whether the numeric input values are within a user specified range

In set

Number

Boolean

Whether the categorical input values are part of the user specified set

Value selected

Any

Boolean

Whether the value is shared with one or more selected Table rows

Url

Number

Number

The URL defined by each input value

Angle

Number

Number

The angle of th input 2D vector

* A

Numbers

Numbers

An input value multiplied by a constant number.

+ A

Numbers

Numbers

An input value with an added constant value.

^ A

Numbers

Numbers

An input value raised to a constant power.

Exp

Numbers

Numbers

A constant value raised to a power defined by the input value.

==

Numbers

Boolean

Whether an input value equals a constant number.

>

Numbers

Boolean

Whether an input value is greater than a constant number.

>=

Numbers

Boolean

Whether an input value is greater than or equal to a constant number.

<

Numbers

Boolean

Whether an input value is less than a constant number.

<=

Numbers

Boolean

Whether an input value is less than or equal to a constant number.

Token length

String

Number

The number of characters in each input token.

Overlap ratio

List elements

Numbers

Quotient stating how much one bounding box is overlapped by others within an image

Unique overlap ratio

List elements

Numbers

Quotient stating how much one bounding box is uniquely overlapped by others within an image

Rectangles (absolute)

List elements

Rectangles

The rectangle geometry of the bounding box list (in absolute min/max pixel coordinates)

Rectangles (relative)

List elements

Rectangles

The rectangle geometry of the bounding box list (in min/max coordinates relative to the reference image size)

Area

Rectangles

Numbers

The area of each rectangle within the input list

Aspect

Rectangles

Numbers

The aspect (i.e. width divided by height) of each rectangle within the input list

Width

Rectangles

Numbers

The width of each rectangle within the input list

Height

Rectangles

Numbers

The height of each rectangle within the input list

Non-maximum suppression (NMS)

List elements

List elements

A bounding box list where non-maximum-suppression has been performed (with a user-specified IoU threshold)

Angle

Geo

Number

The rotation angle of each bounding box.

Distance to origin

Geo

Number

The distance to the origin (0, 0) from each bounding box.

Area

Geo

Number

The area of each bounding box.

Aspect

Geo

Number

The aspect ration (width divided by height) of each bounding box.

Width

Geo

Number

The width of each bounding box.

Height

Geo

Number

The height of each bounding box.

Center (X)

Geo

Number

The center x-position of each bounding box.

Center (Y)

Geo

Number

The center y-position of each bounding box.

Volume

Geo

Number

The volume of each 3D bounding box.

Aspect

Geo

Number

The aspect ratio of each 3D bounding box (i.e. longest edge / shortest edge).

Week since epoch

Datetime string

Number

For input datetime string, returns the week since epoch.

Hour of day

Datetime string

Number

For input datetime string, returns the hour [0..23].

Day of week

Datetime string

Number

For input datetime string, returns the day of week.

Milliseconds since epoch

Datetime string

Number

For an input datetime string, returns the number of milliseconds since epoch.

ConstantBool

None

Boolean

A constant boolean value

ConstantInt

None

Number

A constant integer value

ConstantFloat

None

Number

A constant floating point value

ConstantString

None

String

A constant string value

Line length

Geo

Numbers

The length of each line within the input geometry

Triangle area

Geo

Numbers

The area of each triangle within the input geometry

Order-independent operations

Order-independent operations take two or more columns as input, and produce a result for each row in those columns. The order of the input columns does not matter. The Sum operation, which adds two or more columns together, is an example of an order-independent operation. A complete list of the available order-independent operations, along with their input requirements, can be seen below.

Operation

Input Type

Output type

Description

Sum

Numbers

Number

The sum of all numeric input values

Average

Numbers

Number

The arithmetic average of the numeric input values.

Common

Any

Any

The common value within the input array (if any)

Length

Numbers

Number

The length of the input vector (i.e. the square root of the sum of squared values).

Equal

Any

Boolean

Whether all input values are equal.

Not equal

Any

Boolean

Whether some of the input values differ.

Max

Numbers

Number

The largest numeric value of all input values

Min

Numbers

Number

The smallest numeric value of all input values

Multiply

Numbers

Number

The product of all numeric input values.

Range

Numbers

Number

The absolute range (i.e. ‘max value - min value’) across all numeric input values.

Unique

Any

Any

A list of all unique input values (i.e. with duplicates removed)

Sorted

Numbers

Numbers

The input vector, sorted in ascending order

Median

Numbers

Number

The median of the input vector

Order-dependent operations

Order-dependent operations take two or more columns as input, and produce a result for each row in those columns. The order of the input columns affects the results of these operations. The “Subtract” operation, which subtracts one or more columns from another, is an example of an order-dependent operation. A complete list of the available order-dependent operations, along with their input requirements, can be seen below.

Operation

Input Type

Output type

Description

Divide

Numbers

Number

The first value divided by the second.

To string

Any

String

The input value(s) converted to a string. In the case of multiple input columns, values are comma separated. In the case of nested input columns, the values are represented as JSON.

Hash

Any

Number

A numeric hash value calculated from all input values (including scalars, nested inputs and arrays

Subtract

Numbers

Number

The first numeric input value minus all following ones.

Delta

Numbers

Number

The difference between one column and the next (i.e. ‘next value - this value’

Delta angle

Numbers

Number

The difference between the two input angles, full-circle symmetry taken into account.

Delta angle (two-way)

Numbers

Number

The difference between the two input angles, half-circle symmetry taken into account.

Delta angle (four-way)

Numbers

Number

The difference between the two input angles, quad-circle symmetry taken into account.

Matched prediction (TP)

List elements

Boolean

Whether each predicted element is a True Positive, i.e. has a matching ground truth instance

Unmatched prediction (FP)

List elements

Boolean

Whether each prediction is a False Positive, i.e. lacks a matching ground truth instance.

Unmatched ground truth (FN)

List elements

Boolean

Whether each ground truth instance is a False Negative, i.e. has no matching prediction

Matched ground truth (TP)

List elements

Boolean

Whether each ground truth instance is a True Positive, i.e. has a matching prediction

_matched_prediction

List elements

Numbers

The for all matched predictions (or 0.0 for unmatched ones).

_matched_ground_truth

List elements

Numbers

The for all matched ground truth instances (or 0.0 for unmatched ones).

Per-point OKS_matched_ground_truth

List elements

Numbers

The OKS for all points within matched ground truth instances (or NaN for unmatched ones).

Per-point OKS_matched_prediction

List elements

Numbers

The OKS for all points within matched predictions (or NaN for unmatched ones).

Precision

List elements

Numbers

The Precision score calculated between the predicted and ground truth instances (considering the and confidence thresholds).

Recall

List elements

Numbers

The Recall score calculated between the predicted and ground truth instances (considering the and confidence thresholds).

F1

List elements

Numbers

The F1 score calculated between the predicted and ground truth instances (considering the and confidence thresholds).

Matched value

List elements

Any

The matched value for each ground truth or predicted instance (considering the and confidence thresholds).

Greater than or equal

Numbers

Boolean

Whether each value is greater than or equal to the next.

Greater than

Numbers

Boolean

Whether each value is greater than the next.

Less than or equal

Numbers

Boolean

Whether each value is less than or equal to than the next.

Less than

Numbers

Boolean

Whether each value is less than the next.

Divide (element-wise)

Numbers

Number

The first value divided by the second. (element-wise)

Subtract (element-wise)

Numbers

Number

The first numeric input value minus all following ones. (element-wise)

Delta angle (element-wise)

Numbers

Number

The difference between the two input angles, full-circle symmetry taken into account. (element-wise)

Delta angle (two-way) (element-wise)

Numbers

Number

The difference between the two input angles, half-circle symmetry taken into account. (element-wise)

Delta angle (four-way) (element-wise)

Numbers

Number

The difference between the two input angles, quad-circle symmetry taken into account. (element-wise)

Sum (element-wise)

Numbers

Number

The sum of all numeric input values (element-wise)

Average (element-wise)

Numbers

Number

The arithmetic average of the numeric input values. (element-wise)

Length (element-wise)

Numbers

Number

The length of the input vector (i.e. the square root of the sum of squared values). (element-wise)

Equal (element-wise)

Any

Boolean

Whether all input values are equal. (element-wise)

Not equal (element-wise)

Any

Boolean

Whether some of the input values differ. (element-wise)

Greater than or equal (element-wise)

Numbers

Boolean

Whether each value is greater than or equal to the next. (element-wise)

Greater than (element-wise)

Numbers

Boolean

Whether each value is greater than the next. (element-wise)

Less than or equal (element-wise)

Numbers

Boolean

Whether each value is less than or equal to than the next. (element-wise)

Less than (element-wise)

Numbers

Boolean

Whether each value is less than the next. (element-wise)

Max (element-wise)

Numbers

Number

The largest numeric value of all input values (element-wise)

Min (element-wise)

Numbers

Number

The smallest numeric value of all input values (element-wise)

Multiply (element-wise)

Numbers

Number

The product of all numeric input values. (element-wise)

Normalize (element-wise)

Numbers

Numbers

The input vector in normalized form (element-wise)

Abs (element-wise)

Number

Number

The absolute value of the numeric input value (element-wise)

Character count (element-wise)

String

Number

The number of characters in the input string (including whitespace). (element-wise)

Log (element-wise)

Number

Number

The log value of the numeric input value (element-wise)

Non-zero (element-wise)

Numbers

Boolean

Whether the numeric input value is non-zero (element-wise)

Not (element-wise)

Any

Boolean

The boolean ‘not’ of the input value(s) (element-wise)

Sign (element-wise)

Number

Number

The sign (i.e. -1, 0, or 1) of the input value(s). (element-wise)

Inverse (element-wise)

Number

Number

The inverse (i.e. 1/x) of the input value(s). (element-wise)

Zero (element-wise)

Number

Boolean

Whether the numeric input value is zero (element-wise)

Url (element-wise)

Number

Number

The URL defined by each input value (element-wise)

Angle (element-wise)

Number

Number

The angle of th input 2D vector (element-wise)