cytoflow.views.mst#

Plots a minimum spanning tree of a statistic. Particularly useful for visualizing the results of a clustering operations such as KMeansOp and SOMOp.

MSTView – plots the minimum spanning tree.

class cytoflow.views.mst.MSTView[source]#

Bases: HasStrictTraits

A view that creates a minimum spanning tree view of a statistic.

Set statistic to the name of the statistic to plot; set feature to the name of that statistic’s feature you’d like to analyze. Then, set locations to another statistic whose features are the locations (in any number of dimensions) of the nodes in the tree – usually these are cluster centroids from KMeansOp or SOMOp (see the example below). The view computes a minimum-spanning tree containing the nodes and lays it out in two dimensions.

There are three different ways of plotting the value at each location in tree:

  • Setting style to heat (the default) will produce an MST with a circle at each vertex and the color of the circle is related to the intensity of the value of feature. (In this scenario, variable is ignored.)

  • Setting style to pie will draw a pie plot at each location. If variable is set, then the values of variable are used as the categories of the pie, and the arc length of each slice of pie is related to the intensity of the value of feature. If variable is unset, then feature is ignored and the features of the statistic are used as the categories.

  • Setting style to petal will draw a “petal plot” in each cell. If variable is set, then the values of variable are used as the categories, but unlike a pie plot, the arc width of each slice is equal. Instead, the radius of the pie slice scales with the square root of the intensity, so that the relationship between area and intensity remains the same. If variable is unset, then feature is ignored and the features of the statistic are used as the categories.

Warning

If style is pie or petal, then negative data will be clipped to 0!

Optionally, you can set size_function to scale the circles (or pies or petals) by a function computed on Experiment.data. (Often set to len to scale by the number of events in each cluster.)

Note

If you’d like to select events based on this view (by drawing a polygon around the nodes of the tree), you can do that with SOMOp.

statistic#

The statistic to plot. Must be a key in Experiment.statistics.

Type:

Str

locations#

A statistic whose levels are the same as statistic and whose features are the dimensions of the locations of each node to plot.

Type:

Str

.. note:: If `style` is ``heat``, then the levels of `statistic` must be the

same as the levels of locations. If style is pie or petal, the levels of statistic must be the levels of locations plus variable.

locations_level#

Which level in the locations statistic is different at each location? The values of the others must be specified in the plot_name parameter of plot. Optional if there is only one level in locations.

Type:

Str

locations_features#

Which features in locations to use. By default, use all of them.

Type:

List(Str)

.. warning::

The KMeansOp statistic is mostly locations, but also has the a Proportion feature. You likely don’t want to use it as a location for laying out the minimum spanning tree!

variable#

The variable used for plotting pie and petal plots. Must be left empty for a heatmap.

Type:

Str

feature#

The column in the statistic to plot (often a channel name.)

Type:

Str

style#

What kind of plot to make?

Type:

Enum(heat, pie, petal) (default = heat)

scale#

For a heat map, how should the color of feature be scaled before plotting? For pie and petal maps, how should the input data be normalized to [0,1] before plotting?

Type:

{‘linear’, ‘log’, ‘logicle’}

size_function#

If set, separate the Experiment into subsets by levels of locations, compute a function on them, and scale the size of each tree node by those values. The callable should take a single pandas.DataFrame argument and return a positive float or value that can be cast to float (such as int). Of particular use is len, which will scale the cells by the number of events in each subset.

Type:

Callable (default: None)

metric#

What metric should be used to compute distance in the tree? Must be one of braycurtis, canberra, chebyshev, cityblock, correlation, cosine, dice, euclidean, hamming, jaccard, jensenshannon, kulczynski1, mahalanobis, matching, minkowski, rogerstanimoto, russellrao, seuclidean, sokalmichener, sokalsneath, sqeuclidean, yule. Suggestion: use euclidean for small numbers of dimensions (location features) and cosine for larger numbers.

Type:

Str (default: euclidean)

subset#

An expression that specifies the subset of the statistic to plot. Passed unmodified to pandas.DataFrame.query.

Type:

str

Note

MSTView is not a subclass of BaseView or any of its descendants. It implements the IView interface but does it does not use seaborn.FacetGrid for laying out its plots.

Examples

Make a little data set.

>>> import cytoflow as flow
>>> import_op = flow.ImportOp()
>>> import_op.tubes = [flow.Tube(file = "Plate01/RFP_Well_A3.fcs",
...                              conditions = {'Dox' : 10.0}),
...                    flow.Tube(file = "Plate01/CFP_Well_A4.fcs",
...                              conditions = {'Dox' : 1.0})]
>>> import_op.conditions = {'Dox' : 'float'}
>>> ex = import_op.apply()

Compute some KMeans clusters

>>> km = flow.KMeansOp(name = "KMeans",
...                    channels = ["V2-A", "Y2-A", "B1-A"],
...                    scale = {"V2-A" : "logicle",
...                             "Y2-A" : "logicle",
...                             "B1-A" : "logicle"},
...                    num_clusters = 20)
>>> km.estimate(ex)
>>> ex2 = km.apply(ex)

Add a statistic

>>> ex3 = flow.ChannelStatisticOp(name = "ByDox",
...                               channel = "Y2-A",
...                               by = ["KMeans", "Dox"],
...                               function = len).apply(ex2)

Plot the minimum spanning tree

>>> flow.MSTView(statistic = "ByDox",
...              locations = "KMeans",
...              locations_features = ["V2-A", "Y2-A", "B1-A"],
...              feature = "Y2-A",
...              variable = "Dox",
...              style = "pie").plot(ex3)
plot(experiment, plot_name=None, **kwargs)[source]#

Plot a chart of a variable’s values against a statistic.

Parameters:
  • experiment (Experiment) – The Experiment to plot using this view.

  • plot_name (str) – If this IView can make multiple plots, plot_name is the name of the plot to make. Must be one of the values retrieved from enum_plots.

  • title (str) – Set the plot title

  • legend (bool) – Plot a legend or color bar? Defaults to True.

  • legendlabel (str) – Set the label for the color bar or legend

  • palette (palette name) – Colors to use for the different levels of the hue variable. Should be something that can be interpreted by seaborn.color_palette. If plotting a heat map, this should be a continuous color map (‘viridis’ is the default.) Otherwise, choose either a discrete color map (‘deep’ is the default) or a continuous color map from which equi-spaced colors will be drawn.

  • radius (float) – The radius of the circle or pie plots, on a scale from 0 to 1.

  • All other parameters are passed to the `matplotlib.patches.Circle` or

  • `matplotlib.patches.Wedge` construtors (ie, they should be patch attributes).

enum_plots(experiment)[source]#

Enumerate the named plots we can make from this set of statistics.

Returns:

An iterator across the possible plot names.

Return type:

iterator