cytoflow.operations.kmeans

Use k-means clustering to cluster events in any number of dimensions. kmeans has three classes:

KMeansOp – the IOperation to perform the clustering.

KMeans1DView – a diagnostic view of the clustering (1D, using a histogram)

KMeans2DView – a diagnostic view of the clustering (2D, using a scatterplot)

class cytoflow.operations.kmeans.KMeansOp[source]

Bases: traits.has_traits.HasStrictTraits

Use a K-means clustering algorithm to cluster events.

Call estimate to compute the cluster centroids.

Calling apply creates a new categorical metadata variable named name, with possible values {name}_1 …. name_n where n is the number of clusters, specified with num_clusters.

The same model may not be appropriate for different subsets of the data set. If this is the case, you can use the by attribute to specify metadata by which to aggregate the data before estimating (and applying) a model. The number of clusters is the same across each subset, though.

name

The operation name; determines the name of the new metadata column

Type

Str

channels

The channels to apply the clustering algorithm to.

Type

List(Str)

scale

Re-scale the data in the specified channels before fitting. If a channel is in channels but not in scale, the current package-wide default (set with set_default_scale) is used.

Note

Sometimes you may see events labeled {name}_None – this results from events for which the selected scale is invalid. For example, if an event has a negative measurement in a channel and that channel’s scale is set to “log”, this event will be set to {name}_None.

Type

Dict(Str : {“linear”, “logicle”, “log”})

num_clusters

How many components to fit to the data? Must be a positive integer.

Type

Int (default = 2)

by

A list of metadata attributes to aggregate the data before estimating the model. For example, if the experiment has two pieces of metadata, Time and Dox, setting by to ["Time", "Dox"] will fit the model separately to each subset of the data with a unique combination of Time and Dox.

Type

List(Str)

Examples

Make a little data set.

>>> import cytoflow as flow
>>> import_op = flow.ImportOp()
>>> import_op.tubes = [flow.Tube(file = "Plate01/RFP_Well_A3.fcs",
...                              conditions = {'Dox' : 10.0}),
...                    flow.Tube(file = "Plate01/CFP_Well_A4.fcs",
...                              conditions = {'Dox' : 1.0})]
>>> import_op.conditions = {'Dox' : 'float'}
>>> ex = import_op.apply()

Create and parameterize the operation.

>>> km_op = flow.KMeansOp(name = 'KMeans',
...                       channels = ['V2-A', 'Y2-A'],
...                       scale = {'V2-A' : 'log',
...                                'Y2-A' : 'log'},
...                       num_clusters = 2)

Estimate the clusters

>>> km_op.estimate(ex)

Plot a diagnostic view

>>> km_op.default_view().plot(ex)
../../_images/cytoflow-operations-kmeans-4.png

Apply the gate

>>> ex2 = km_op.apply(ex)

Plot a diagnostic view with the event assignments

>>> km_op.default_view().plot(ex2)
../../_images/cytoflow-operations-kmeans-6.png
estimate(experiment, subset=None)[source]

Estimate the k-means clusters

Parameters
  • experiment (Experiment) – The Experiment to use to estimate the k-means clusters

  • subset (str (default = None)) – A Python expression that specifies a subset of the data in experiment to use to parameterize the operation.

apply(experiment)[source]

Apply the KMeans clustering to the data.

Returns

a new Experiment with one additional entry in Experiment.conditions named name, of type category. The new category has values name_1, name_2, etc to indicate which k-means cluster an event is a member of.

The new Experiment also has one new statistic called centers, which is a list of tuples encoding the centroids of each k-means cluster.

Return type

Experiment

default_view(**kwargs)[source]

Returns a diagnostic plot of the k-means clustering.

Returns

IView

Return type

an IView, call KMeans1DView.plot to see the diagnostic plot.

class cytoflow.operations.kmeans.KMeans1DView[source]

Bases: cytoflow.operations.base_op_views.By1DView, cytoflow.operations.base_op_views.AnnotatingView, cytoflow.views.histogram.HistogramView

A diagnostic view for KMeansOp (1D, using a histogram)

op

The op whose parameters we’re viewing.

Type

Instance(KMeansOp)

facets

A read-only list of the conditions used to facet this view.

Type

List(Str)

by

A read-only list of the conditions used to group this view’s data before plotting.

Type

List(Str)

channel

The channel this view is viewing. If you created the view using default_view, this is already set.

Type

Str

scale

The way to scale the x axes. If you created the view using default_view, this may be already set.

Type

{‘linear’, ‘log’, ‘logicle’}

subset

An expression that specifies the subset of the statistic to plot. Passed unmodified to pandas.DataFrame.query.

Type

str

xfacet

Set to one of the Experiment.conditions in the Experiment, and a new column of subplots will be added for every unique value of that condition.

Type

String

yfacet

Set to one of the Experiment.conditions in the Experiment, and a new row of subplots will be added for every unique value of that condition.

Type

String

huefacet

Set to one of the Experiment.conditions in the in the Experiment, and a new color will be added to the plot for every unique value of that condition.

Type

String

huescale

How should the color scale for huefacet be scaled?

Type

{‘linear’, ‘log’, ‘logicle’}

plot(experiment, **kwargs)[source]

Plot the plots.

Parameters
  • experiment (Experiment) – The Experiment to plot using this view.

  • title (str) – Set the plot title

  • xlabel (str) – Set the X axis label

  • ylabel (str) – Set the Y axis label

  • huelabel (str) – Set the label for the hue facet (in the legend)

  • legend (bool) – Plot a legend for the color or hue facet? Defaults to True.

  • sharex (bool) – If there are multiple subplots, should they share X axes? Defaults to True.

  • sharey (bool) – If there are multiple subplots, should they share Y axes? Defaults to True.

  • row_order (list) – Override the row facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • col_order (list) – Override the column facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • hue_order (list) – Override the hue facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • height (float) – The height of each row in inches. Default = 3.0

  • aspect (float) – The aspect ratio of each subplot. Default = 1.5

  • col_wrap (int) – If xfacet is set and yfacet is not set, you can “wrap” the subplots around so that they form a multi-row grid by setting this to the number of columns you want.

  • sns_style ({“darkgrid”, “whitegrid”, “dark”, “white”, “ticks”}) – Which seaborn style to apply to the plot? Default is whitegrid.

  • sns_context ({“paper”, “notebook”, “talk”, “poster”}) – Which seaborn context to use? Controls the scaling of plot elements such as tick labels and the legend. Default is talk.

  • palette (palette name, list, or dict) – Colors to use for the different levels of the hue variable. Should be something that can be interpreted by seaborn.color_palette, or a dictionary mapping hue levels to matplotlib colors.

  • despine (Bool) – Remove the top and right axes from the plot? Default is True.

  • min_quantile (float (>0.0 and <1.0, default = 0.001)) – Clip data that is less than this quantile.

  • max_quantile (float (>0.0 and <1.0, default = 1.00)) – Clip data that is greater than this quantile.

  • lim ((float, float)) – Set the range of the plot’s data axis.

  • orientation ({‘vertical’, ‘horizontal’})

  • num_bins (int) – The number of bins to plot in the histogram. Clipped to [100, 1000]

  • histtype ({‘stepfilled’, ‘step’, ‘bar’}) – The type of histogram to draw. stepfilled is the default, which is a line plot with a color filled under the curve.

  • density (bool) – If True, re-scale the histogram to form a probability density function, so the area under the histogram is 1.

  • linewidth (float) – The width of the histogram line (in points)

  • linestyle ([‘-’ | ‘–’ | ‘-.’ | ‘:’ | “None”]) – The style of the line to plot

  • alpha (float (default = 0.5)) – The alpha blending value, between 0 (transparent) and 1 (opaque).

  • color (matplotlib color) – The color to plot the annotations. Overrides the default color cycle.

  • plot_name (Str) – If this IView can make multiple plots, plot_name is the name of the plot to make. Must be one of the values retrieved from enum_plots.

class cytoflow.operations.kmeans.KMeans2DView[source]

Bases: cytoflow.operations.base_op_views.By2DView, cytoflow.operations.base_op_views.AnnotatingView, cytoflow.views.scatterplot.ScatterplotView

A diagnostic view for KMeansOp (2D, using a scatterplot).

op

The op whose parameters we’re viewing.

Type

Instance(KMeansOp)

facets

A read-only list of the conditions used to facet this view.

Type

List(Str)

by

A read-only list of the conditions used to group this view’s data before plotting.

Type

List(Str)

xchannel

The channels to use for this view’s X axis. If you created the view using default_view, this is already set.

Type

Str

ychannel

The channels to use for this view’s Y axis. If you created the view using default_view, this is already set.

Type

Str

xscale

The way to scale the x axis. If you created the view using default_view, this may be already set.

Type

{‘linear’, ‘log’, ‘logicle’}

yscale

The way to scale the y axis. If you created the view using default_view, this may be already set.

Type

{‘linear’, ‘log’, ‘logicle’}

subset

An expression that specifies the subset of the statistic to plot. Passed unmodified to pandas.DataFrame.query.

Type

str

xfacet

Set to one of the Experiment.conditions in the Experiment, and a new column of subplots will be added for every unique value of that condition.

Type

String

yfacet

Set to one of the Experiment.conditions in the Experiment, and a new row of subplots will be added for every unique value of that condition.

Type

String

huefacet

Set to one of the Experiment.conditions in the in the Experiment, and a new color will be added to the plot for every unique value of that condition.

Type

String

huescale

How should the color scale for huefacet be scaled?

Type

{‘linear’, ‘log’, ‘logicle’}

plot(experiment, **kwargs)[source]

Plot the plots.

Parameters
  • experiment (Experiment) – The Experiment to plot using this view.

  • title (str) – Set the plot title

  • xlabel (str) – Set the X axis label

  • ylabel (str) – Set the Y axis label

  • huelabel (str) – Set the label for the hue facet (in the legend)

  • legend (bool) – Plot a legend for the color or hue facet? Defaults to True.

  • sharex (bool) – If there are multiple subplots, should they share X axes? Defaults to True.

  • sharey (bool) – If there are multiple subplots, should they share Y axes? Defaults to True.

  • row_order (list) – Override the row facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • col_order (list) – Override the column facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • hue_order (list) – Override the hue facet value order with the given list. If a value is not given in the ordering, it is not plotted. Defaults to a “natural ordering” of all the values.

  • height (float) – The height of each row in inches. Default = 3.0

  • aspect (float) – The aspect ratio of each subplot. Default = 1.5

  • col_wrap (int) – If xfacet is set and yfacet is not set, you can “wrap” the subplots around so that they form a multi-row grid by setting this to the number of columns you want.

  • sns_style ({“darkgrid”, “whitegrid”, “dark”, “white”, “ticks”}) – Which seaborn style to apply to the plot? Default is whitegrid.

  • sns_context ({“paper”, “notebook”, “talk”, “poster”}) – Which seaborn context to use? Controls the scaling of plot elements such as tick labels and the legend. Default is talk.

  • palette (palette name, list, or dict) – Colors to use for the different levels of the hue variable. Should be something that can be interpreted by seaborn.color_palette, or a dictionary mapping hue levels to matplotlib colors.

  • despine (Bool) – Remove the top and right axes from the plot? Default is True.

  • min_quantile (float (>0.0 and <1.0, default = 0.001)) – Clip data that is less than this quantile.

  • max_quantile (float (>0.0 and <1.0, default = 1.00)) – Clip data that is greater than this quantile.

  • xlim ((float, float)) – Set the range of the plot’s X axis.

  • ylim ((float, float)) – Set the range of the plot’s Y axis.

  • alpha (float (default = 0.25)) – The alpha blending value, between 0 (transparent) and 1 (opaque).

  • s (int (default = 2)) – The size in points^2.

  • marker (a matplotlib marker style, usually a string) – Specfies the glyph to draw for each point on the scatterplot. See matplotlib.markers for examples. Default: ‘o’

  • color (matplotlib color) – The color to plot the annotations. Overrides the default color cycle.

  • plot_name (Str) – If this IView can make multiple plots, plot_name is the name of the plot to make. Must be one of the values retrieved from enum_plots.