Writing new cytoflow modules

Creating a new module in cytoflow ranges from easy (for simple things) to quite involved. I like to think that cytoflow follows the Perl philosophy of making the easy jobs easy and the hard jobs possible.

With that in mind, let’s look at the process of creating a new module, progressing from easy to involved.

Basics

All the APIs (both public and internal) are built using Traits. For operations and views in the cytoflow package, basic working knowledge of traits is sufficient. For GUI work, trait notification is used extensively.

The GUI wrappers also use TraitsUI because it makes wrapping traits with UI elements easy. Have a look at views, handlers, and of course the trait editors.

Finally, there are some principles that I expect new modules contributed to this codebase to follow:

  • Check for pathological errors and fail early. I really dislike the tendency of a number of libraries to fail with cryptic errors. (I’m looking at you, pandas.) Check for obvious errors and raise a CytoflowOpError or CytoflowViewError). If the problem is non-fatal, warn with CytoflowOpWarning or CytoflowViewWarning. The GUI will also know how to handle these gracefully.
  • Separate experimental data from module state. There are workflow that require estimating parameters with one data set, then applying those operations to another. Make sure your module supports them.
  • Estimate slow but apply fast. The GUI re-runs modules’ apply() methods automatically when parameters change. That means that the apply() method must run very quickly.
  • Write tests. I hate writing unit tests, but they are indispensible for catching bugs. Even in a view’s tests are just smoke tests (“It plots something and doesn’t crash”), that’s better than nothing.

New operations

The base operation API is fairly simple:

  • id - a required traits.Constant containing the UID of the operation

  • friendly_id - a required traits.Constant containing a human-readable name

  • apply() - takes an Experiment and returns a new Experiment with the operation applied. apply() should clone() the old experiment, then modify and return the clone. Don’t forget to add the operation to the new Experiment’s history. A good example of a simple operation is RatioOp.

  • estimate() - You may also wish to estimate the operation’s parameters from a data set. Crucially, this may not be the data set you are eventually applying the operation to. If your operation relies on estimating parameters, implement the estimate() function. This may involve selecting a subset of the data in the Experiment, or it may involve loading in an an additional FCS file. A good example of the former is KMeansOp; a good example of the latter is AutofluorescenceOp.

    You may also find that you wish to estimate different parameter sets for different sub-populations (as encoded in the Experiment’s conditions.) By convention, the conditions that you want to estimate different parameters for are passed using a trait named by, which takes a list of conditions and groups the data by unique combinations of those conditions’ values before estimating a paramater set for each. Look at KMeansOp for an example of this behavior.

  • default_view() - for some operations, you may want to provide a default view. This view may just be a base view parameterized in a particular way (like the HistogramView that is the default view of BinningOp), or it may be a visualization of the parameters estimated by the estimate() function (like the default view of AutofluorescenceOp.) In many cases, the view returned by this function is linked back to the operation that produced it.

New views

The base view API is very simple:

  • id - a required traits.Constant containing the UID of the operation
  • friendly_id - a required traits.Constant containing a human-readable name
  • plot() - plots Experiment.

As I wrote more views, however, I noticed a significant amount of code duplication, which led to bugs and lost time. So, I refactored the view code to use a short hierarchy of classes for particular types of views. You can take advantage of this functionality when writing a new module, or you can simply derive your new view from traits.HasTraits and implement the simple API above.

The view base classes are:

  • BaseView – implements a view with row, column and hue facets. After setting up the facet grid, it calls the derived class’s _grid_plot() to actually do the plotting. plot() also has parameters to set the plot style, legend, axis labels, etc.
  • BaseDataView – implements a view that plots an Experiment’s data (as opposed to a statistic.) Includes functionality for subsetting the data before plotting, and determining axis limits and scales.
  • Base1DView – implements a 1-dimensional data view. See HistogramView for an example.
  • Base2DView – implements a 2-dimensional data view. See ScatterplotView for an example.
  • BaseNDView – implements an N-dimensional data view. See RadvizView for an example.
  • BaseStatisticsView – implements a view that plots a statistic from an Experiment (as opposed to the underlying data.) These views have a “primary” variable, and can be subset as well.
  • Base1DStatisticsView – implements a view that plots one dimension of a statistic. See BarChartView for an example.
  • Base2DStatisticsView – implements a view that plots two dimensions of a statistic. See Stats2DView for an example.

New GUI operations

Wrapping an operation for the GUI sometimes feels like it requires more work than writing the operation in the first place. A new operation requires at least five things:

  • A plugin class implementing IOperationPlugin. It should also derive from PluginHelpMixin, which adds support for online help.
  • A class derived from the underlying cytoflow operation. The derived operation should:
    • Inherit from PluginOpMixin to add support for various GUI event-handling bits
    • Override attributes in the underlying cytoflow class to add metadata that tells the GUI how to react to changes. (See the PluginOpMixin docstring for details.)
    • Override the handler_factory attribute to be a callable that returns an OpHandlerMixin instance.
    • Provide an implementation of get_notebook_code(), to support exporting to Jupyter notebook.
    • If the module has an estimate() method, then implement clear_estimate() to clear those parameters.
    • If the module has a default_view() method, it should be overridden to return a GUI-enabled view class (see below.)
    • Optionally, override should_apply() and should_clear_estimate() to only do expensive operations when necessary.
  • A handler class that defines the default traits.View and provides supporting logic. This class should be derived from OpHandlerMixin and traits.Controller.
  • Serialization logic. cytoflow uses camel for sane YAML serialization; a dumper and loader for the class must save and load the operation’s parameters.
  • Tests. Because of cytoflowgui’s split between processes, testing GUI logic for modules can be kind of a synchronization nightmare. This is by design – because the same synchronization issues are present when running the software. See the cytoflowgui/tests directory for (many) examples.
  • (Optionally) default view implementations. If the operation has a default view, you should wrap it as well (in the operation plugin module.) See the next section for details.

New GUI views

A new view operation requires at least five things:

  • A plugin class implementing either IViewPlugin. It should also derive from PluginHelpMixin, which adds support for online help.
  • A class derived from the underlying cytoflow view. The derived view should:
    • Inherit from PluginViewMixin to add support for various GUI event-handling bits
    • Override attributes in the underlying cytoflow class to add metadata that tells the GUI how to react to changes. (See the PluginViewMixin docstring for details.)
    • Override the handler_factory attribute to be a callable that returns a ViewHandlerMixin instance.
    • Provide an implementation of get_notebook_code(), to support exporting to Jupyter notebook.
    • Override the plot_params attribute with an instance of an object containing plot parameters (see below).
    • Optionally, override should_plot() to only plot when necessary.
    • Optionally, overide plot_wi() to change whether plot() is called on the current WorkflowItem’s result or the previous one’s.
  • A handler class that defines the default traits.View and provides supporting logic. This class should be derived from ViewHandlerMixin and traits.Controller.
  • Serialization logic. cytoflow uses camel for sane YAML serialization; a dumper and loader for the class must save and load the operation’s parameters.
  • Plot parameters. The parameters to a view’s plot() method are stored in an object that derives from BasePlotParams or one of its decendants. Choose data types that are appropriate for the view, and include a default view. Set it as the class type for the view’s plot_params attribute. Don’t forget to write serialization code for it as well!
  • Tests. Because of cytoflowgui’s split between processes, testing GUI logic for modules can be kind of a synchronization nightmare. This is by design – because the same synchronization issues are present when running the software. See the cytoflowgui/tests directory for (many) examples. In the case of a view, most of these are “smoke tests”, testing that the view doesn’t crash with various sets of parameters.