cytoflow.experiment¶
Defines Experiment, cytoflow’s main data structure.
Experiment – manages the data and metadata for a flow experiment.
- class cytoflow.experiment.Experiment[source]¶
Bases:
traits.has_traits.HasStrictTraitsAn Experiment manages all the data and metadata for a flow experiment.
An
Experimentis the central data structure incytoflow: it wraps apandas.DataFramecontaining all the data from a flow experiment. Each row in the table is an event. Each column is either a measurement from one of the detectors (or a “derived” measurement such as a transformed value or a ratio), or a piece of metadata associated with that event: which tube it came from, what the experimental conditions for that tube were, gate membership, etc. TheExperimentobject lets you:Add additional metadata to define subpopulations
Get events that match a particular metadata signature.
Additionally, the
Experimentobject manages channel- and experiment-level metadata in themetadataattribute, which is a dictionary. This allows the rest of the modules incytoflowto track and enforce other constraints that are important in doing quantitative flow cytometry: for example, every tube must be collected with the same channel parameters (such as PMT voltage.)Note
Experimentis not responsible for enforcing the constraints;ImportOpand the other modules are.- data¶
All the events and metadata represented by this experiment. Each event is a row; each column is either a measured channel (eg. a fluorescence measurement), a derived channel (eg. the ratio between two channels), or a piece of metadata. Metadata can be either experimental conditions (eg. induction level, timepoint) or added by operations (eg. gate membership).
- Type
- metadata¶
Each column in
datahas an entry inmetadatawhose key is the column name and whose value is a dict of column-specific metadata. Metadata is added by operations, and is occasionally useful if modules are expected to work together. See individual operations’ documentation for a list of the metadata that operation adds. The only “required” metadata istype, which can bechannel(if the column is a measured channel, or derived from one) orcondition(if the column is an experimental condition, gate membership, etc.)- Type
Dict(Str : Dict(Str : Any)
- history¶
The
IOperationoperations that have been applied to the raw data to result in thisExperiment.- Type
List(IOperation)
- statistics¶
The statistics and parameters computed by models that were fit to the data. The key is an
(Str, Str)tuple, where the firstStris the name of the operation that supplied the statistic, and the secondStris the name of the statistic. The value is a multi-indexedpandas.Series: each level of the index is a facet, and each combination of indices is a subset for which the statistic was computed. The values of the series, of course, are the values of the computed parameters or statistics for each subset.- Type
Dict((Str, Str) : pandas.Series)
- channels¶
The channels that this experiment tracks (read-only).
- Type
List(String)
- conditions¶
The experimental conditions and analysis groups (gate membership, etc) that this experiment tracks. The key is the name of the condition, and the value is a
pandas.Serieswith that condition’s possible values.- Type
Dict(String : pandas.Series)
Notes
The OOP programmer in me desperately wanted to subclass
pandas.DataFrame, add some flow-specific stuff, and move on with my life. (I may still, with something like https://github.com/dalejung/pandas-composition). A few things get in the way of directly subclassingpandas.DataFrame:First, to enable some of the delicious syntactic sugar for accessing its contents,
pandas.DataFrameredefines__getattribute__and__setattribute__, and making it recognize (and maintain across copies) additional attributes is an unsupported (non-public) API feature and introduces other subclassing weirdness.Second, many of the operations (like appending!) don’t happen in-place; they return copies instead. It’s cleaner to simply manage that copying ourselves instead of making the client deal with it. We can pretend to operate on the data in-place.
To maintain the ease of use, we’ll override
__getitem__and pass it to the wrappedpandas.DataFrame. We’ll do the same with some of the more usefulpandas.DataFrameAPI pieces (likequery); and of course, you can just get the data frame itself withExperiment.data.Examples
>>> import cytoflow as flow >>> tube1 = flow.Tube(file = 'cytoflow/tests/data/Plate01/RFP_Well_A3.fcs', ... conditions = {"Dox" : 10.0}) >>> tube2 = flow.Tube(file='cytoflow/tests/data/Plate01/CFP_Well_A4.fcs', ... conditions = {"Dox" : 1.0}) >>> >>> import_op = flow.ImportOp(conditions = {"Dox" : "float"}, ... tubes = [tube1, tube2]) >>> >>> ex = import_op.apply() >>> ex.data.shape (20000, 17) >>> ex.data.groupby(['Dox']).size() Dox 1 10000 10 10000 dtype: int64
- subset(conditions, values)[source]¶
Returns a subset of this experiment including only the events where each condition in
conditionequals the corresponding value invalues.- Parameters
conditions (Str or List(Str)) – A condition or list of conditions
values (Any or Tuple(Any)) – The value(s) of the condition(s)
- Returns
A new
Experimentcontaining only the events specified inconditionsandvalues.- Return type
Note
This is a wrapper around
pandas.DataFrame.groupbyandpandas.core.groupby.GroupBy.get_group. That means you can pass other things inconditions– see thepandas.DataFrame.groupbydocumentation for details.
- query(expr, **kwargs)[source]¶
Return an experiment whose data is a subset of this one where
exprevaluates toTrue.This method “sanitizes” column names first, replacing characters that are not valid in a Python identifier with an underscore
_. So, the column namea columnbecomesa_column, and can be queried with ana_column == Trueor such.- Parameters
expr (string) – The expression to pass to
pandas.DataFrame.query. Must be a valid Python expression, something you could pass toeval.**kwargs (dict) – Other named parameters to pass to
pandas.DataFrame.query.
- Returns
A new
Experiment, a clone of this one with the data returned bypandas.DataFrame.query- Return type
- clone(deep=True)[source]¶
Create a copy of this
Experiment.metadata,statisticsandhistoryare deep copies; whether or notdatais a deep copy depends on the value of thedeepparameter.Warning
The intent is that
deepis set toFalseby operations that are only adding columns to the underlyingpandas.DataFrame. This will improve memory performance. However, the resultingExperimentCANNOT BE MODIFIED IN-PLACE, because doing so will affect the otherExperiments that are clones of the one being modified.
- add_condition(name, dtype, data=None)[source]¶
Add a new column of per-event metadata to this
Experiment.Note
add_conditionoperates in place.There are two places to call
add_condition.As you’re setting up a new
Experiment, calladd_conditionwithdataset toNoneto specify the conditions the new events will have.If you compute some new per-event metadata on an existing
Experiment, calladd_conditionto add it.
- Parameters
name (String) – The name of the new column in
data. Must be a valid Python identifier: must start with[A-Za-z_]and contain only the characters[A-Za-z0-9_].dtype (String) – The type of the new column in
data. Must be a string thatpandas.Seriesrecognizes as adtype: common types arecategory,float,int, andbool.data (pandas.Series (default = None)) – The
pandas.Seriesto add todata. Must be the same length asdata, and it must be convertable to apandas.Seriesof typedtype. IfNone, will add an empty column to theExperiment… but theExperimentmust be empty to do so!
- Raises
CytoflowError – If the
pandas.Seriespassed indataisn’t the same length asdata, or isn’t convertable to typedtype.
Examples
>>> import cytoflow as flow >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category")
- add_channel(name, data=None)[source]¶
Add a new column of per-event data (as opposed to metadata) to this
Experiment: ie, something that was measured per cell, or derived from per-cell measurements.Note
add_channeloperates in place.- Parameters
name (String) – The name of the new column to be added to
data.data (pandas.Series) – The
pandas.Seriesto add todata. Must be the same length asdata, and it must be convertable to a dtype offloat64. IfNone, will add an empty column to theExperiment… but theExperimentmust be empty to do so!
- Raises
CytoflowError – If the
pandas.Seriespassed indataisn’t the same length asdata, or isn’t convertable to a dtypefloat64.
Examples
>>> ex.add_channel("FSC_over_2", ex.data["FSC-A"] / 2.0)
- add_events(data, conditions)[source]¶
Add new events to this
Experiment.add_eventsoperates in place, modifying theExperimentobject that it’s called on.Each new event in
datais appended todata, and its per-event metadata columns will be set with the values specified inconditions. Thus, it is particularly useful for adding tubes of data to new experiments, before additional per-event metadata is added by gates, etc.Note
Every column in
datamust be accounted for. Each column of typechannelmust appear indata; each column of metadata must have a key:value pair inconditions.- Parameters
tube (pandas.DataFrame) – A single tube or well’s worth of data. Must be a DataFrame with the same columns as
channelsconditions (Dict(Str, Any)) – A dictionary of the tube’s metadata. The keys must match
conditions, and the values must be coercable to the relevantnumpydtype.
- Raises
CytoflowError –
add_eventspukes if: * there are columns indatathat aren’t channels in the experiment, or vice versa. * there are keys inconditionsthat aren’t conditions in the experiment, or vice versa. * there is metadata specified inconditionsthat can’t be converted to the corresponding metadatadtype.
Examples
>>> import cytoflow as flow >>> from fcsparser import fcsparser >>> ex = flow.Experiment() >>> ex.add_condition("Time", "float") >>> ex.add_condition("Strain", "category") >>> tube1, _ = fcparser.parse('CFP_Well_A4.fcs') >>> tube2, _ = fcparser.parse('RFP_Well_A3.fcs') >>> ex.add_events(tube1, {"Time" : 1, "Strain" : "BL21"}) >>> ex.add_events(tube2, {"Time" : 1, "Strain" : "Top10G"})