t-Stochastic Neighbor Embedding#

Use t-Stochastic Neighbor Embedding to decompose a multivariate data into clusters that maintain underlying structure.

Creates new “channels” named {name}_1 and {name}_2, where name is the Name attribute.

The same decomposition may not be appropriate for different subsets of the data set. If this is the case, you can use the By attribute to specify metadata by which to aggregate the data before estimating (and applying) a model. The tSNE parameters such as the distance metric.

Name: The operation name; determines the name of the new columns.

Channels: The channels to apply the decomposition to.

Scale: Re-scale the data in the specified channels before fitting.

Metric: How should we measure “distance”? Euclidean distance makes sense if the number of dimensions (ie, channels) is small, but as the number of dimensions increases, maybe try cosine or one of the others.

Perplexity: The balance between the local and global structure of the data. Larger datasets benefit from higher perplexity, but be warned – runtime scales linearly with perplexity!

Sample: What proportion of the data set to use for training? Defaults to 1% of the dataset to help with runtime.

By: A list of metadata attributes to aggregate the data before estimating the model. For example, if the experiment has two pieces of metadata, Time and Dox, setting By to ["Time", "Dox"] will fit the model separately to each subset of the data with a unique combination of Time and Dox.