cytoflow.experiment¶

class cytoflow.experiment.Experiment[source]¶

Bases: traits.has_traits.HasStrictTraits

An Experiment manages all the data and metadata for a flow experiment.

An Experiment is the central data struture in cytoflow: it wraps a pandas.DataFrame containing all the data from a flow experiment. Each row in the table is an event. Each column is either a measurement from one of the detectors (or a “derived” measurement such as a transformed value or a ratio), or a piece of metadata associated with that event: which tube it came from, what the experimental conditions for that tube were, gate membership, etc. The Experiment object lets you:

Add additional metadata to define subpopulations

Get events that match a particular metadata signature.

Additionally, the Experiment object manages channel- and experiment-level metadata in the metadata attribute, which is a dictionary. This allows the rest of the cytoflow package to track and enforce other constraints that are important in doing quantitative flow cytometry: for example, every tube must be collected with the same channel parameters (such as PMT voltage.)

Note

Experiment is not responsible for enforcing the constraints; ImportOp and the other modules are.

data¶

All the events and metadata represented by this experiment. Each event is a row; each column is either a measured channel (eg. a fluorescence measurement), a derived channel (eg. the ratio between two channels), or a piece of metadata. Metadata can be either experimental conditions (eg. induction level, timepoint) or added by operations (eg. gate membership).

Type:	pandas.DataFrame

metadata¶

Each column in data has an entry in metadata whose key is the column name and whose value is a dict of column-specific metadata. Metadata is added by operations, and is occasionally useful if modules are expected to work together. See individual operations’ documentation for a list of the metadata that operation adds. The only “required” metadata is type, which can be channel (if the column is a measured channel, or derived from one) or condition (if the column is an experimental condition, gate membership, etc.)

Warning

There may also be experiment-wide entries in metadata that are not columns in data!

Type:	Dict(Str : Dict(Str : Any)

history¶

The IOperation operations that have been applied to the raw data to result in this Experiment.

Type:	List(IOperation)

statistics¶

The statistics and parameters computed by models that were fit to the data. The key is an (Str, Str) tuple, where the first Str is the name of the operation that supplied the statistic, and the second Str is the name of the statistic. The value is a multi-indexed pandas.Series: each level of the index is a facet, and each combination of indices is a subset for which the statistic was computed. The values of the series, of course, are the values of the computed parameters or statistics for each subset.

Type:	Dict((Str, Str) : pandas.Series)

channels¶

The channels that this experiment tracks (read-only).

Type:	List(String)

conditions¶

The experimental conditions and analysis groups (gate membership, etc) that this experiment tracks. The key is the name of the condition, and the value is a pandas.Series with that condition’s possible values.

Type:	Dict(String : pandas.Series)

Notes

The OOP programmer in me desperately wanted to subclass pandas.DataFrame, add some flow-specific stuff, and move on with my life. (I may still, with something like https://github.com/dalejung/pandas-composition). A few things get in the way of directly subclassing pandas.DataFrame:

First, to enable some of the delicious syntactic sugar for accessing its contents, pandas.DataFrame redefines __getattribute__() and __setattribute__(), and making it recognize (and maintain across copies) additional attributes is an unsupported (non-public) API feature and introduces other subclassing weirdness.

Second, many of the operations (like appending!) don’t happen in-place; they return copies instead. It’s cleaner to simply manage that copying ourselves instead of making the client deal with it. We can pretend to operate on the data in-place.

To maintain the ease of use, we’ll override __getitem__() and pass it to the wrapped pandas.DataFrame. We’ll do the same with some of the more useful DataFrame API pieces (like query()); and of course, you can just get the data frame itself with Experiment.data.

Examples

>>> import cytoflow as flow
>>> tube1 = flow.Tube(file = 'cytoflow/tests/data/Plate01/RFP_Well_A3.fcs',
...                   conditions = {"Dox" : 10.0})
>>> tube2 = flow.Tube(file='cytoflow/tests/data/Plate01/CFP_Well_A4.fcs',
...                   conditions = {"Dox" : 1.0})
>>>
>>> import_op = flow.ImportOp(conditions = {"Dox" : "float"},
...                           tubes = [tube1, tube2])
>>>
>>> ex = import_op.apply()
>>> ex.data.shape
(20000, 17)
>>> ex.data.groupby(['Dox']).size()
Dox
1      10000
10     10000
dtype: int64

subset(conditions, values)[source]¶

Returns a subset of this experiment including only the events where each condition in condition equals the corresponding value in values.

Parameters:	conditions (Str or Tuple(Str)) – A condition or list of conditions values (Any or Tuple(Any)) – The value(s) of the condition(s)
Returns:	A new `Experiment` containing only the events specified in `conditions` and `values`.
Return type:	Experiment

query(expr, **kwargs)[source]¶

Return an experiment whose data is a subset of this one where expr evaluates to True.

This method “sanitizes” column names first, replacing characters that are not valid in a Python identifier with an underscore _. So, the column name a column becomes a_column, and can be queried with an a_column == True or such.

Parameters:	expr (string) – The expression to pass to `pandas.DataFrame.query()`. Must be a valid Python expression, something you could pass to `eval()`. *kwargs (dict*) – Other named parameters to pass to `pandas.DataFrame.query()`.
Returns:	A new `Experiment`, a clone of this one with the data returned by `pandas.DataFrame.query()`
Return type:	Experiment

clone()[source]¶: Create a copy of this Experiment. metadata and statistics are deep copies; history is a shallow copy; and …..

add_condition(name, dtype, data=None)[source]¶

Add a new column of per-event metadata to this Experiment.

Note

add_condition() operates in place.

There are two places to call add_condition.

As you’re setting up a new Experiment, call add_condition() with data set to None to specify the conditions the new events will have.

If you compute some new per-event metadata on an existing Experiment, call add_condition() to add it.

Parameters:

name (String) – The name of the new column in data. Must be a valid Python identifier: must start with [A-Za-z_] and contain only the characters [A-Za-z0-9_].
dtype (String) – The type of the new column in data. Must be a string that pandas.Series recognizes as a dtype: common types are category, float, int, and bool.
data (pandas.Series (default = None)) – The pandas.Series to add to data. Must be the same length as data, and it must be convertable to a pandas.Series of type dtype. If None, will add an empty column to the Experiment … but the Experiment must be empty to do so!

Raises:

CytoflowError – If the pandas.Series passed in data isn’t the same length as data, or isn’t convertable to type dtype.

Examples

>>> import cytoflow as flow
>>> ex = flow.Experiment()
>>> ex.add_condition("Time", "float")
>>> ex.add_condition("Strain", "category")

add_channel(name, data=None)[source]¶

Add a new column of per-event data (as opposed to metadata) to this Experiment: ie, something that was measured per cell, or derived from per-cell measurements.

Note

add_channel() operates in place.

Parameters:	name (String) – The name of the new column to be added to `data`. data (pandas.Series) – The `pandas.Series` to add to `data`. Must be the same length as `data`, and it must be convertable to a dtype of `float64`. If `None`, will add an empty column to the `Experiment` … but the `Experiment` must be empty to do so!
Raises:	`CytoflowError` – If the `pandas.Series` passed in `data` isn’t the same length as `data`, or isn’t convertable to a dtype `float64`.

Examples

>>> ex.add_channel("FSC_over_2", ex.data["FSC-A"] / 2.0)

add_events(data, conditions)[source]¶

Add new events to this Experiment.

Each new event in data is appended to data, and its per-event metadata columns will be set with the values specified in conditions. Thus, it is particularly useful for adding tubes of data to new experiments, before additional per-event metadata is added by gates, etc.

Note

Every column in data must be accounted for. Each column of type channel must appear in data; each column of metadata must have a key:value pair in conditions.

Parameters:

tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with the same columns as channels
conditions (Dict(Str, Any)) – A dictionary of the tube’s metadata. The keys must match conditions, and the values must be coercable to the relevant numpy dtype.

Raises:

CytoflowError – add_events() pukes if:

there are columns in data that aren’t channels in the experiment, or vice versa.
there are keys in conditions that aren’t conditions in the experiment, or vice versa.
there is metadata specified in conditions that can’t be converted to the corresponding metadata dtype.

Examples

>>> import cytoflow as flow
>>> import fcsparser
>>> ex = flow.Experiment()
>>> ex.add_condition("Time", "float")
>>> ex.add_condition("Strain", "category")
>>> tube1, _ = fcparser.parse('CFP_Well_A4.fcs')
>>> tube2, _ = fcparser.parse('RFP_Well_A3.fcs')
>>> ex.add_events(tube1, {"Time" : 1, "Strain" : "BL21"})
>>> ex.add_events(tube2, {"Time" : 1, "Strain" : "Top10G"})