Writing new cytoflow modules

Creating a new module in cytoflow ranges from easy (for simple things) to quite involved. I like to think that cytoflow follows the Perl philosophy of making the easy jobs easy and the hard jobs possible.

With that in mind, let’s look at the process of creating a new module, progressing from easy to involved.

Basics

All the APIs (both public and internal) are built using Traits. For operations and views in the cytoflow package, basic working knowledge of traits is sufficient. For GUI work, trait notification is used extensively.

The GUI wrappers also use TraitsUI because it makes wrapping traits with UI elements easy. Have a look at documentation for views, handlers, and of course the trait editors.

Finally, there are some principles that I expect new modules contributed to this codebase to follow:

  • Check for pathological errors and fail early. I really dislike the tendency of a number of libraries to fail with cryptic errors. (I’m looking at you, pandas.) Check for obvious errors and raise a CytoflowOpError or CytoflowViewError). If the problem is non-fatal, warn with CytoflowOpWarning or CytoflowViewWarning. The GUI will also know how to handle these gracefully.

  • Separate experimental data from module state. There are workflows that require estimating parameters with one data set, then applying those operations to another. Make sure your module supports them.

  • Estimate slow but apply fast. The GUI re-runs modules’ apply() methods automatically when parameters change. That means that the apply() method must run very quickly.

  • Write tests. I hate writing unit tests, but they are indispensible for catching bugs. Even in a view’s tests are just smoke tests (“It plots something and doesn’t crash”), that’s better than nothing.

New operations

The base operation API is fairly simple:

  • id - a required traits.Constant containing the UID of the operation

  • friendly_id - a required traits.Constant containing a human-readable name

  • apply() - takes an Experiment and returns a new Experiment with the operation applied. apply() should clone() the old experiment, then modify and return the clone. Don’t forget to add the operation to the new Experiment’s history. A good example of a simple operation is RatioOp.

    Note

    Be aware of the deep parameter for clone()! It defaults to Trueonly set it to False if you are only adding columns to the Experiment.

    Note

    The resulting Experiment must have a pandas.RangeIndex for its index – several modules rely on this! If you add or remove events from the Experiment, make sure you call pandas.DataFrame.reset_index on Experiment.data to make the index monotonic again.

  • estimate() - You may also wish to estimate the operation’s parameters from a data set. Crucially, this might not be the data set you are eventually applying the operation to. If your operation relies on estimating parameters, implement the estimate() function. This may involve selecting a subset of the data in the Experiment, or it may involve loading in an an additional FCS file. A good example of the former is KMeansOp; a good example of the latter is AutofluorescenceOp.

    You may also find that you wish to estimate different parameter sets for different sub-populations (as encoded in the Experiment’s conditions.) By convention, the conditions that you want to estimate different parameters for are passed using a trait named by, which takes a list of conditions and groups the data by unique combinations of those conditions’ values before estimating a paramater set for each. Look at KMeansOp for an example of this behavior.

  • default_view() - for some operations, you may want to provide a default view. This view may just be a base view parameterized in a particular way (like the HistogramView that is the default view of BinningOp), or it may be a visualization of the parameters estimated by the estimate() function (like the default view of AutofluorescenceOp.) In many cases, the view returned by this function is linked back to the operation that produced it.

New views

The base view API is very simple:

  • id - a required traits.Constant containing the UID of the operation

  • friendly_id - a required traits.Constant containing a human-readable name

  • plot() - plots Experiment.

As I wrote more views, however, I noticed a significant amount of code duplication, which led to bugs and lost time. So, I refactored the view code to use a short hierarchy of classes for particular types of views. You can take advantage of this functionality when writing a new module, or you can simply derive your new view from traits.HasTraits and implement the simple API above.

The view base classes are:

  • BaseView – implements a view with row, column and hue facets. After setting up the facet grid, it calls the derived class’s _grid_plot() to actually do the plotting. plot() also has parameters to set the plot style, legend, axis labels, etc.

  • BaseDataView – implements a view that plots an Experiment’s data (as opposed to a statistic.) Includes functionality for subsetting the data before plotting, and determining axis limits and scales.

  • Base1DView – implements a 1-dimensional data view. See HistogramView for an example.

  • Base2DView – implements a 2-dimensional data view. See ScatterplotView for an example.

  • BaseNDView – implements an N-dimensional data view. See RadvizView for an example.

  • BaseStatisticsView – implements a view that plots a statistic from an Experiment (as opposed to the underlying data.) These views have a “primary” variable, and can be subset as well.

  • Base1DStatisticsView – implements a view that plots one dimension of a statistic. See BarChartView for an example.

  • Base2DStatisticsView – implements a view that plots two dimensions of a statistic. See Stats2DView for an example.

New GUI operations

Wrapping an operation for the GUI sometimes feels like it requires more work than writing the operation in the first place. A new operation requires at least five things:

  • A class derived from the underlying cytoflow operation. The derived operation should be placed in a module in cytoflowgui.workflow.operations, and it should:

    • Inherit from WorkflowOperation to add support for various GUI event-handling bits (as well as the underlying cytoflow class, if appropriate)

    • Override attributes in the underlying cytoflow class to add metadata that tells the GUI how to react to changes. (See the IWorkflowOperation docstring for details.)

    • Provide an implementation of get_notebook_code(), to support exporting to Jupyter notebook.

    • If the module has an estimate() method, then implement clear_estimate() to clear those parameters.

    • If the module has a default_view() method, it should be overridden to return a GUI-enabled view class (see below.)

    • Optionally, override should_apply() and should_clear_estimate() to only do expensive operations when necessary.

  • Serialization logic. cytoflow uses camel for sane YAML serialization; a dumper and loader for the class must save and load the operation’s parameters. These should also go in cytoflowgui.workflow.operations.

  • A handler class that defines the default traits.View and provides supporting logic. This class should be derived from OpHandler and should be placed in cytoflowgui.op_plugins.

  • A plugin class derived from envisage.plugin.Plugin and implementing IOperationPlugin. It should also derive from cytoflowgui.op_plugins.op_plugin_base.PluginHelpMixin, which adds support for online help.

  • Tests. Because of cytoflowgui’s split between processes, testing GUI logic for modules can be kind of a synchronization nightmare. This is by design – because the same synchronization issues are present when running the software. See the cytoflowgui/tests directory for (many) examples.

  • (Optionally) default view implementations. If the operation has a default view, you should wrap it as well (in the operation plugin module.) See the next section for details.

New GUI views

A new view operation requires at least five things:

  • A class derived from the underlying cytoflow view. The derived view should be placed in cytoflowgui.workflow.views

    • Inherit from WorkflowView or one of its children to add support for various GUI event-handling bits

    • Override attributes in the underlying cytoflow class to add metadata that tells the GUI how to react to changes. (See the IWorkflowView docstring for details.)

    • Provide an implementation of get_notebook_code(), to support exporting to Jupyter notebook.

    • Optionally, override should_plot() to only plot when necessary.

  • Serialization logic. cytoflow uses camel for sane YAML serialization; a dumper and loader for the class must save and load the operation’s parameters. These should also go in cytoflowgui.workflow.views.

  • A handler class that defines the default traits.View and provides supporting logic. This class should be derived from ViewHandler and should be placed in cytoflowgui.view_plugins.

  • A plugin class derived from envisage.plugin.Plugin and implementing IViewPlugin. It should also derive from cytoflowgui.view_plugins.view_plugin_base.PluginHelpMixin,, which adds support for online help.

  • Plot parameters. The parameters to a view’s plot() method are stored in an object that derives from BasePlotParams or one of its decendants. Choose data types that are appropriate for the view, and include a default view named view_params_view in the handler class. Don’t forget to write serialization code for it as well!

  • Tests. Because of cytoflowgui’s split between processes, testing GUI logic for modules can be kind of a synchronization nightmare. This is by design – because the same synchronization issues are present when running the software. See the cytoflowgui/tests directory for (many) examples. In the case of a view, most of these are “smoke tests”, testing that the view doesn’t crash with various sets of parameters.

Note

Why the split between the classes in cytoflowgui.op_modules, cytoflowgui.workflow.operations, cytoflowgui.view_modules, and cytoflowgui.workflow.views? It’s because of the fact that cytoflow runs in two processes – one handles the GUI and the other operates on the workflow. If you load a module containing UI bits, even if you don’t explicitly create a QGuiApplication, it starts an event loop. That’s why older versions of Cytoflow had two icons in the task bar when running on a Mac. You know how sometimes you go to fix a “little” bug and end up re-writing the whole program? This was one of those times….