cytoflow.experiment¶
Defines Experiment
, cytoflow
’s main data structure.
Experiment
– manages the data and metadata for a flow experiment.
- class cytoflow.experiment.Experiment[source]¶
Bases:
traits.has_traits.HasStrictTraits
An Experiment manages all the data and metadata for a flow experiment.
An
Experiment
is the central data structure incytoflow
: it wraps apandas.DataFrame
containing all the data from a flow experiment. Each row in the table is an event. Each column is either a measurement from one of the detectors (or a “derived” measurement such as a transformed value or a ratio), or a piece of metadata associated with that event: which tube it came from, what the experimental conditions for that tube were, gate membership, etc. TheExperiment
object lets you:Add additional metadata to define subpopulations
Get events that match a particular metadata signature.
Additionally, the
Experiment
object manages channel- and experiment-level metadata in themetadata
attribute, which is a dictionary. This allows the rest of the modules incytoflow
to track and enforce other constraints that are important in doing quantitative flow cytometry: for example, every tube must be collected with the same channel parameters (such as PMT voltage.)Note
Experiment
is not responsible for enforcing the constraints;ImportOp
and the other modules are.- data¶
All the events and metadata represented by this experiment. Each event is a row; each column is either a measured channel (eg. a fluorescence measurement), a derived channel (eg. the ratio between two channels), or a piece of metadata. Metadata can be either experimental conditions (eg. induction level, timepoint) or added by operations (eg. gate membership).
- Type
- metadata¶
Each column in
data
has an entry inmetadata
whose key is the column name and whose value is a dict of column-specific metadata. Metadata is added by operations, and is occasionally useful if modules are expected to work together. See individual operations’ documentation for a list of the metadata that operation adds. The only “required” metadata istype
, which can bechannel
(if the column is a measured channel, or derived from one) orcondition
(if the column is an experimental condition, gate membership, etc.)- Type
Dict(Str : Dict(Str : Any)
- history¶
The
IOperation
operations that have been applied to the raw data to result in thisExperiment
.- Type
List(IOperation)
- statistics¶
The statistics and parameters computed by models that were fit to the data. The key is an
(Str, Str)
tuple, where the firstStr
is the name of the operation that supplied the statistic, and the secondStr
is the name of the statistic. The value is a multi-indexedpandas.Series
: each level of the index is a facet, and each combination of indices is a subset for which the statistic was computed. The values of the series, of course, are the values of the computed parameters or statistics for each subset.- Type
Dict((Str, Str) : pandas.Series)
- channels¶
The channels that this experiment tracks (read-only).
- Type
List(String)
- conditions¶
The experimental conditions and analysis groups (gate membership, etc) that this experiment tracks. The key is the name of the condition, and the value is a
pandas.Series
with that condition’s possible values.- Type
Dict(String : pandas.Series)
Notes
The OOP programmer in me desperately wanted to subclass
pandas.DataFrame
, add some flow-specific stuff, and move on with my life. (I may still, with something like https://github.com/dalejung/pandas-composition). A few things get in the way of directly subclassingpandas.DataFrame
:First, to enable some of the delicious syntactic sugar for accessing its contents,
pandas.DataFrame
redefines__getattribute__
and__setattribute__
, and making it recognize (and maintain across copies) additional attributes is an unsupported (non-public) API feature and introduces other subclassing weirdness.Second, many of the operations (like appending!) don’t happen in-place; they return copies instead. It’s cleaner to simply manage that copying ourselves instead of making the client deal with it. We can pretend to operate on the data in-place.
To maintain the ease of use, we’ll override
__getitem__
and pass it to the wrappedpandas.DataFrame
. We’ll do the same with some of the more usefulpandas.DataFrame
API pieces (likequery
); and of course, you can just get the data frame itself withExperiment.data
.Examples
>>> import cytoflow as flow >>> tube1 = flow.Tube(file = 'cytoflow/tests/data/Plate01/RFP_Well_A3.fcs', ... conditions = {"Dox" : 10.0}) >>> tube2 = flow.Tube(file='cytoflow/tests/data/Plate01/CFP_Well_A4.fcs', ... conditions = {"Dox" : 1.0}) >>> >>> import_op = flow.ImportOp(conditions = {"Dox" : "float"}, ... tubes = [tube1, tube2]) >>> >>> ex = import_op.apply() >>> ex.data.shape (20000, 17) >>> ex.data.groupby(['Dox']).size() Dox 1 10000 10 10000 dtype: int64
- subset(conditions, values)[source]¶
Returns a subset of this experiment including only the events where each condition in
condition
equals the corresponding value invalues
.- Parameters
conditions (Str or List(Str)) – A condition or list of conditions
values (Any or Tuple(Any)) – The value(s) of the condition(s)
- Returns
A new
Experiment
containing only the events specified inconditions
andvalues
.- Return type
Note
This is a wrapper around
pandas.DataFrame.groupby
andpandas.core.groupby.GroupBy.get_group
. That means you can pass other things inconditions
– see thepandas.DataFrame.groupby
documentation for details.
- query(expr, **kwargs)[source]¶
Return an experiment whose data is a subset of this one where
expr
evaluates toTrue
.This method “sanitizes” column names first, replacing characters that are not valid in a Python identifier with an underscore
_
. So, the column namea column
becomesa_column
, and can be queried with ana_column == True
or such.- Parameters
expr (string) – The expression to pass to
pandas.DataFrame.query
. Must be a valid Python expression, something you could pass toeval
.**kwargs (dict) – Other named parameters to pass to
pandas.DataFrame.query
.
- Returns
A new
Experiment
, a clone of this one with the data returned bypandas.DataFrame.query
- Return type
- clone(deep=True)[source]¶
Create a copy of this
Experiment
.metadata
,statistics
andhistory
are deep copies; whether or notdata
is a deep copy depends on the value of thedeep
parameter.Warning
The intent is that
deep
is set toFalse
by operations that are only adding columns to the underlyingpandas.DataFrame
. This will improve memory performance. However, the resultingExperiment
CANNOT BE MODIFIED IN-PLACE, because doing so will affect the otherExperiment
s that are clones of the one being modified.
- add_condition(name, dtype, data=None)[source]¶
Add a new column of per-event metadata to this
Experiment
.Note
add_condition
operates in place.There are two places to call
add_condition
.As you’re setting up a new
Experiment
, calladd_condition
withdata
set toNone
to specify the conditions the new events will have.If you compute some new per-event metadata on an existing
Experiment
, calladd_condition
to add it.
- Parameters
name (String) – The name of the new column in
data
. Must be a valid Python identifier: must start with[A-Za-z_]
and contain only the characters[A-Za-z0-9_]
.dtype (String) – The type of the new column in
data
. Must be a string thatpandas.Series
recognizes as adtype
: common types arecategory
,float
,int
, andbool
.data (pandas.Series (default = None)) – The
pandas.Series
to add todata
. Must be the same length asdata
, and it must be convertable to apandas.Series
of typedtype
. IfNone
, will add an empty column to theExperiment
… but theExperiment
must be empty to do so!
- Raises
CytoflowError – If the
pandas.Series
passed indata
isn’t the same length asdata
, or isn’t convertable to typedtype
.
Examples
>>> import cytoflow as flow >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category")
- add_channel(name, data=None)[source]¶
Add a new column of per-event data (as opposed to metadata) to this
Experiment
: ie, something that was measured per cell, or derived from per-cell measurements.Note
add_channel
operates in place.- Parameters
name (String) – The name of the new column to be added to
data
.data (pandas.Series) – The
pandas.Series
to add todata
. Must be the same length asdata
, and it must be convertable to a dtype offloat64
. IfNone
, will add an empty column to theExperiment
… but theExperiment
must be empty to do so!
- Raises
CytoflowError – If the
pandas.Series
passed indata
isn’t the same length asdata
, or isn’t convertable to a dtypefloat64
.
Examples
>>> ex.add_channel("FSC_over_2", ex.data["FSC-A"] / 2.0)
- add_events(data, conditions)[source]¶
Add new events to this
Experiment
.add_events
operates in place, modifying theExperiment
object that it’s called on.Each new event in
data
is appended todata
, and its per-event metadata columns will be set with the values specified inconditions
. Thus, it is particularly useful for adding tubes of data to new experiments, before additional per-event metadata is added by gates, etc.Note
Every column in
data
must be accounted for. Each column of typechannel
must appear indata
; each column of metadata must have a key:value pair inconditions
.- Parameters
tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with the same columns as
channels
conditions (Dict(Str, Any)) – A dictionary of the tube’s metadata. The keys must match
conditions
, and the values must be coercable to the relevantnumpy
dtype.
- Raises
CytoflowError –
add_events
pukes if: * there are columns indata
that aren’t channels in the experiment, or vice versa. * there are keys inconditions
that aren’t conditions in the experiment, or vice versa. * there is metadata specified inconditions
that can’t be converted to the corresponding metadatadtype
.
Examples
>>> import cytoflow as flow >>> from fcsparser import fcsparser >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category") >>> tube1, _ = fcparser.parse('CFP_Well_A4.fcs') >>> tube2, _ = fcparser.parse('RFP_Well_A3.fcs') >>> ex.add_events(tube1, {"Time" : 1, "Strain" : "BL21"}) >>> ex.add_events(tube2, {"Time" : 1, "Strain" : "Top10G"})