cytoflow.experiment¶
-
class
cytoflow.experiment.
Experiment
[source]¶ Bases:
traits.has_traits.HasStrictTraits
An Experiment manages all the data and metadata for a flow experiment.
An
Experiment
is the central data struture incytoflow
: it wraps apandas.DataFrame
containing all the data from a flow experiment. Each row in the table is an event. Each column is either a measurement from one of the detectors (or a “derived” measurement such as a transformed value or a ratio), or a piece of metadata associated with that event: which tube it came from, what the experimental conditions for that tube were, gate membership, etc. TheExperiment
object lets you:- Add additional metadata to define subpopulations
- Get events that match a particular metadata signature.
Additionally, the
Experiment
object manages channel- and experiment-level metadata in themetadata
attribute, which is a dictionary. This allows the rest of thecytoflow
package to track and enforce other constraints that are important in doing quantitative flow cytometry: for example, every tube must be collected with the same channel parameters (such as PMT voltage.)Note
Experiment
is not responsible for enforcing the constraints;ImportOp
and the other modules are.-
data
¶ All the events and metadata represented by this experiment. Each event is a row; each column is either a measured channel (eg. a fluorescence measurement), a derived channel (eg. the ratio between two channels), or a piece of metadata. Metadata can be either experimental conditions (eg. induction level, timepoint) or added by operations (eg. gate membership).
Type: pandas.DataFrame
-
metadata
¶ Each column in
data
has an entry inmetadata
whose key is the column name and whose value is a dict of column-specific metadata. Metadata is added by operations, and is occasionally useful if modules are expected to work together. See individual operations’ documentation for a list of the metadata that operation adds. The only “required” metadata istype
, which can bechannel
(if the column is a measured channel, or derived from one) orcondition
(if the column is an experimental condition, gate membership, etc.)Type: Dict(Str : Dict(Str : Any)
-
history
¶ The
IOperation
operations that have been applied to the raw data to result in thisExperiment
.Type: List(IOperation)
-
statistics
¶ The statistics and parameters computed by models that were fit to the data. The key is an
(Str, Str)
tuple, where the firstStr
is the name of the operation that supplied the statistic, and the secondStr
is the name of the statistic. The value is a multi-indexedpandas.Series
: each level of the index is a facet, and each combination of indices is a subset for which the statistic was computed. The values of the series, of course, are the values of the computed parameters or statistics for each subset.Type: Dict((Str, Str) : pandas.Series)
-
channels
¶ The channels that this experiment tracks (read-only).
Type: List(String)
-
conditions
¶ The experimental conditions and analysis groups (gate membership, etc) that this experiment tracks. The key is the name of the condition, and the value is a
pandas.Series
with that condition’s possible values.Type: Dict(String : pandas.Series)
Notes
The OOP programmer in me desperately wanted to subclass
pandas.DataFrame
, add some flow-specific stuff, and move on with my life. (I may still, with something like https://github.com/dalejung/pandas-composition). A few things get in the way of directly subclassingpandas.DataFrame
:- First, to enable some of the delicious syntactic sugar for accessing
its contents,
pandas.DataFrame
redefines__getattribute__()
and__setattribute__()
, and making it recognize (and maintain across copies) additional attributes is an unsupported (non-public) API feature and introduces other subclassing weirdness. - Second, many of the operations (like appending!) don’t happen in-place; they return copies instead. It’s cleaner to simply manage that copying ourselves instead of making the client deal with it. We can pretend to operate on the data in-place.
To maintain the ease of use, we’ll override
__getitem__()
and pass it to the wrappedpandas.DataFrame
. We’ll do the same with some of the more usefulDataFrame
API pieces (likequery()
); and of course, you can just get the data frame itself withExperiment.data
.Examples
>>> import cytoflow as flow >>> tube1 = flow.Tube(file = 'cytoflow/tests/data/Plate01/RFP_Well_A3.fcs', ... conditions = {"Dox" : 10.0}) >>> tube2 = flow.Tube(file='cytoflow/tests/data/Plate01/CFP_Well_A4.fcs', ... conditions = {"Dox" : 1.0}) >>> >>> import_op = flow.ImportOp(conditions = {"Dox" : "float"}, ... tubes = [tube1, tube2]) >>> >>> ex = import_op.apply() >>> ex.data.shape (20000, 17) >>> ex.data.groupby(['Dox']).size() Dox 1 10000 10 10000 dtype: int64
-
subset
(conditions, values)[source]¶ Returns a subset of this experiment including only the events where each condition in
condition
equals the corresponding value invalues
.Parameters: - conditions (Str or Tuple(Str)) – A condition or list of conditions
- values (Any or Tuple(Any)) – The value(s) of the condition(s)
Returns: A new
Experiment
containing only the events specified inconditions
andvalues
.Return type:
-
query
(expr, **kwargs)[source]¶ Return an experiment whose data is a subset of this one where
expr
evaluates toTrue
.This method “sanitizes” column names first, replacing characters that are not valid in a Python identifier with an underscore
_
. So, the column namea column
becomesa_column
, and can be queried with ana_column == True
or such.Parameters: - expr (string) – The expression to pass to
pandas.DataFrame.query()
. Must be a valid Python expression, something you could pass toeval()
. - **kwargs (dict) – Other named parameters to pass to
pandas.DataFrame.query()
.
Returns: A new
Experiment
, a clone of this one with the data returned bypandas.DataFrame.query()
Return type: - expr (string) – The expression to pass to
-
clone
()[source]¶ Create a copy of this
Experiment.
metadata
andstatistics
are deep copies;history
is a shallow copy; and …..
-
add_condition
(name, dtype, data=None)[source]¶ Add a new column of per-event metadata to this
Experiment
.Note
add_condition()
operates in place.There are two places to call add_condition.
- As you’re setting up a new
Experiment
, calladd_condition()
withdata
set toNone
to specify the conditions the new events will have. - If you compute some new per-event metadata on an existing
Experiment
, calladd_condition()
to add it.
Parameters: - name (String) – The name of the new column in
data
. Must be a valid Python identifier: must start with[A-Za-z_]
and contain only the characters[A-Za-z0-9_]
. - dtype (String) – The type of the new column in
data
. Must be a string thatpandas.Series
recognizes as adtype
: common types arecategory
,float
,int
, andbool
. - data (pandas.Series (default = None)) – The
pandas.Series
to add todata
. Must be the same length asdata
, and it must be convertable to apandas.Series
of typedtype
. IfNone
, will add an empty column to theExperiment
… but theExperiment
must be empty to do so!
Raises: CytoflowError
– If thepandas.Series
passed indata
isn’t the same length asdata
, or isn’t convertable to typedtype
.Examples
>>> import cytoflow as flow >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category")
- As you’re setting up a new
-
add_channel
(name, data=None)[source]¶ Add a new column of per-event data (as opposed to metadata) to this
Experiment
: ie, something that was measured per cell, or derived from per-cell measurements.Note
add_channel()
operates in place.Parameters: - name (String) – The name of the new column to be added to
data
. - data (pandas.Series) – The
pandas.Series
to add todata
. Must be the same length asdata
, and it must be convertable to a dtype offloat64
. IfNone
, will add an empty column to theExperiment
… but theExperiment
must be empty to do so!
Raises: CytoflowError
– If thepandas.Series
passed indata
isn’t the same length asdata
, or isn’t convertable to a dtypefloat64
.Examples
>>> ex.add_channel("FSC_over_2", ex.data["FSC-A"] / 2.0)
- name (String) – The name of the new column to be added to
-
add_events
(data, conditions)[source]¶ Add new events to this
Experiment
.Each new event in
data
is appended todata
, and its per-event metadata columns will be set with the values specified inconditions
. Thus, it is particularly useful for adding tubes of data to new experiments, before additional per-event metadata is added by gates, etc.Note
Every column in
data
must be accounted for. Each column of typechannel
must appear indata
; each column of metadata must have a key:value pair inconditions
.Parameters: - tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with
the same columns as
channels
- conditions (Dict(Str, Any)) – A dictionary of the tube’s metadata. The keys must match
conditions
, and the values must be coercable to the relevantnumpy
dtype.
Raises: CytoflowError
–add_events()
pukes if:- there are columns in
data
that aren’t channels in the experiment, or vice versa. - there are keys in
conditions
that aren’t conditions in the experiment, or vice versa. - there is metadata specified in
conditions
that can’t be converted to the corresponding metadatadtype
.
Examples
>>> import cytoflow as flow >>> import fcsparser >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category") >>> tube1, _ = fcparser.parse('CFP_Well_A4.fcs') >>> tube2, _ = fcparser.parse('RFP_Well_A3.fcs') >>> ex.add_events(tube1, {"Time" : 1, "Strain" : "BL21"}) >>> ex.add_events(tube2, {"Time" : 1, "Strain" : "Top10G"})
- tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with
the same columns as