cytoflow.experiment¶

Defines Experiment, cytoflow’s main data structure.

Experiment – manages the data and metadata for a flow experiment.

class cytoflow.experiment.Experiment[source]¶

Bases: traits.has_traits.HasStrictTraits

An Experiment manages all the data and metadata for a flow experiment.

An Experiment is the central data structure in cytoflow: it wraps a pandas.DataFrame containing all the data from a flow experiment. Each row in the table is an event. Each column is either a measurement from one of the detectors (or a “derived” measurement such as a transformed value or a ratio), or a piece of metadata associated with that event: which tube it came from, what the experimental conditions for that tube were, gate membership, etc. The Experiment object lets you:

Add additional metadata to define subpopulations

Get events that match a particular metadata signature.

Additionally, the Experiment object manages channel- and experiment-level metadata in the metadata attribute, which is a dictionary. This allows the rest of the modules in cytoflow to track and enforce other constraints that are important in doing quantitative flow cytometry: for example, every tube must be collected with the same channel parameters (such as PMT voltage.)

Note

Experiment is not responsible for enforcing the constraints; ImportOp and the other modules are.

data¶

All the events and metadata represented by this experiment. Each event is a row; each column is either a measured channel (eg. a fluorescence measurement), a derived channel (eg. the ratio between two channels), or a piece of metadata. Metadata can be either experimental conditions (eg. induction level, timepoint) or added by operations (eg. gate membership).

Type: pandas.DataFrame

metadata¶

Each column in data has an entry in metadata whose key is the column name and whose value is a dict of column-specific metadata. Metadata is added by operations, and is occasionally useful if modules are expected to work together. See individual operations’ documentation for a list of the metadata that operation adds. The only “required” metadata is type, which can be channel (if the column is a measured channel, or derived from one) or condition (if the column is an experimental condition, gate membership, etc.)

Warning

There may also be experiment-wide entries in metadata that are not columns in data!

Type: Dict(Str : Dict(Str : Any)

history¶

The IOperation operations that have been applied to the raw data to result in this Experiment.

Type: List(IOperation)

statistics¶

The statistics and parameters computed by models that were fit to the data. The key is an (Str, Str) tuple, where the first Str is the name of the operation that supplied the statistic, and the second Str is the name of the statistic. The value is a multi-indexed pandas.Series: each level of the index is a facet, and each combination of indices is a subset for which the statistic was computed. The values of the series, of course, are the values of the computed parameters or statistics for each subset.

Type: Dict((Str, Str) : pandas.Series)

channels¶

The channels that this experiment tracks (read-only).

Type: List(String)

conditions¶

The experimental conditions and analysis groups (gate membership, etc) that this experiment tracks. The key is the name of the condition, and the value is a pandas.Series with that condition’s possible values.

Type: Dict(String : pandas.Series)

Notes

The OOP programmer in me desperately wanted to subclass pandas.DataFrame, add some flow-specific stuff, and move on with my life. (I may still, with something like https://github.com/dalejung/pandas-composition). A few things get in the way of directly subclassing pandas.DataFrame:

First, to enable some of the delicious syntactic sugar for accessing its contents, pandas.DataFrame redefines __getattribute__ and __setattribute__, and making it recognize (and maintain across copies) additional attributes is an unsupported (non-public) API feature and introduces other subclassing weirdness.

Second, many of the operations (like appending!) don’t happen in-place; they return copies instead. It’s cleaner to simply manage that copying ourselves instead of making the client deal with it. We can pretend to operate on the data in-place.

To maintain the ease of use, we’ll override __getitem__ and pass it to the wrapped pandas.DataFrame. We’ll do the same with some of the more useful pandas.DataFrame API pieces (like query); and of course, you can just get the data frame itself with Experiment.data.

Examples

>>> import cytoflow as flow
>>> tube1 = flow.Tube(file = 'cytoflow/tests/data/Plate01/RFP_Well_A3.fcs',
...                   conditions = {"Dox" : 10.0})
>>> tube2 = flow.Tube(file='cytoflow/tests/data/Plate01/CFP_Well_A4.fcs',
...                   conditions = {"Dox" : 1.0})
>>>
>>> import_op = flow.ImportOp(conditions = {"Dox" : "float"},
...                           tubes = [tube1, tube2])
>>>
>>> ex = import_op.apply()
>>> ex.data.shape
(20000, 17)
>>> ex.data.groupby(['Dox']).size()
Dox
1      10000
10     10000
dtype: int64

subset(conditions, values)[source]¶

Returns a subset of this experiment including only the events where each condition in condition equals the corresponding value in values.

Parameters

conditions (Str or List(Str)) – A condition or list of conditions
values (Any or Tuple(Any)) – The value(s) of the condition(s)

Returns

A new Experiment containing only the events specified in conditions and values.

Return type

Experiment

Note

This is a wrapper around pandas.DataFrame.groupby and pandas.core.groupby.GroupBy.get_group. That means you can pass other things in conditions – see the pandas.DataFrame.groupby documentation for details.

query(expr, **kwargs)[source]¶

Return an experiment whose data is a subset of this one where expr evaluates to True.

This method “sanitizes” column names first, replacing characters that are not valid in a Python identifier with an underscore _. So, the column name a column becomes a_column, and can be queried with an a_column == True or such.

Parameters

expr (string) – The expression to pass to pandas.DataFrame.query. Must be a valid Python expression, something you could pass to eval.
**kwargs (dict) – Other named parameters to pass to pandas.DataFrame.query.

Returns

A new Experiment, a clone of this one with the data returned by pandas.DataFrame.query

Return type

Experiment

clone(deep=True)[source]¶: Create a copy of this Experiment. metadata, statistics and history are deep copies; whether or not data is a deep copy depends on the value of the deep parameter.

Warning

The intent is that deep is set to False by operations that are only adding columns to the underlying pandas.DataFrame. This will improve memory performance. However, the resulting Experiment CANNOT BE MODIFIED IN-PLACE, because doing so will affect the other Experiment s that are clones of the one being modified.

add_condition(name, dtype, data=None)[source]¶

Add a new column of per-event metadata to this Experiment.

Note

add_condition operates in place.

There are two places to call add_condition.

As you’re setting up a new Experiment, call add_condition with data set to None to specify the conditions the new events will have.

If you compute some new per-event metadata on an existing Experiment, call add_condition to add it.

Parameters

name (String) – The name of the new column in data. Must be a valid Python identifier: must start with [A-Za-z_] and contain only the characters [A-Za-z0-9_].
dtype (String) – The type of the new column in data. Must be a string that pandas.Series recognizes as a dtype: common types are category, float, int, and bool.
data (pandas.Series (default = None)) – The pandas.Series to add to data. Must be the same length as data, and it must be convertable to a pandas.Series of type dtype. If None, will add an empty column to the Experiment … but the Experiment must be empty to do so!

Raises

CytoflowError – If the pandas.Series passed in data isn’t the same length as data, or isn’t convertable to type dtype.

Examples

>>> import cytoflow as flow
>>> ex = flow.Experiment()
>>> ex.add_condition("Time", "float")
>>> ex.add_condition("Strain", "category")

add_channel(name, data=None)[source]¶

Add a new column of per-event data (as opposed to metadata) to this Experiment: ie, something that was measured per cell, or derived from per-cell measurements.

Note

add_channel operates in place.

Parameters

name (String) – The name of the new column to be added to data.
data (pandas.Series) – The pandas.Series to add to data. Must be the same length as data, and it must be convertable to a dtype of float64. If None, will add an empty column to the Experiment … but the Experiment must be empty to do so!

Raises

CytoflowError – If the pandas.Series passed in data isn’t the same length as data, or isn’t convertable to a dtype float64.

Examples

>>> ex.add_channel("FSC_over_2", ex.data["FSC-A"] / 2.0)

add_events(data, conditions)[source]¶

Add new events to this Experiment. add_events operates in place, modifying the Experiment object that it’s called on.

Each new event in data is appended to data, and its per-event metadata columns will be set with the values specified in conditions. Thus, it is particularly useful for adding tubes of data to new experiments, before additional per-event metadata is added by gates, etc.

Note

Every column in data must be accounted for. Each column of type channel must appear in data; each column of metadata must have a key:value pair in conditions.

Parameters

tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with the same columns as channels
conditions (Dict(Str, Any)) – A dictionary of the tube’s metadata. The keys must match conditions, and the values must be coercable to the relevant numpy dtype.

Raises

CytoflowError – add_events pukes if: * there are columns in data that aren’t channels in the experiment, or vice versa. * there are keys in conditions that aren’t conditions in the experiment, or vice versa. * there is metadata specified in conditions that can’t be converted to the corresponding metadata dtype.

Examples

>>> import cytoflow as flow
>>> from fcsparser import fcsparser
>>> ex = flow.Experiment()
>>> ex.add_condition("Time", "float")
>>> ex.add_condition("Strain", "category")
>>> tube1, _ = fcparser.parse('CFP_Well_A4.fcs')
>>> tube2, _ = fcparser.parse('RFP_Well_A3.fcs')
>>> ex.add_events(tube1, {"Time" : 1, "Strain" : "BL21"})
>>> ex.add_events(tube2, {"Time" : 1, "Strain" : "Top10G"})