# cytoflow.utility.util_functions¶

Useful utility functions

cytoflow.utility.util_functions.iqr(a)[source]

Calculate the inter-quartile range for an array of numbers.

Parameters: a (array_like) – The array of numbers to compute the IQR for. The IQR of the data. float
cytoflow.utility.util_functions.num_hist_bins(a)[source]

Calculate number of histogram bins using Freedman-Diaconis rule.

Parameters: a (array_like) – The data to make a histogram of. The number of bins in the histogram int
cytoflow.utility.util_functions.geom_mean(a)[source]

Compute the geometric mean for an “arbitrary” data set, ie one that contains zeros and negative numbers.

Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray The geometric mean of the input array

Notes

The traditional geometric mean can not be computed on a mixture of positive and negative numbers. The approach here, validated rigorously in the cited paper, is to compute the geometric mean of the absolute value of the negative numbers separately, and then take a weighted arithmetic mean of that and the geometric mean of the positive numbers. We’re going to discard 0 values, operating under the assumption that in this context there are going to be few or no observations with a value of exactly 0.

References

 Geometric mean for negative and zero values
Elsayed A. E. Habib International Journal of Research and Reviews in Applied Sciences 11:419 (2012) http://www.arpapress.com/Volumes/Vol11Issue3/IJRRAS_11_3_08.pdf
cytoflow.utility.util_functions.geom_sd(a)[source]

Compute the geometric standard deviation for an “abitrary” data set, ie one that contains zeros and negative numbers. Since we’re in log space, this gives a dimensionless scaling factor, not a measure. If you want traditional “error bars”, don’t plot [geom_mean - geom_sd, geom_mean + geom_sd]; rather, plot [geom_mean / geom_sd, geom_mean * geom_sd].

Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray The geometric standard deviation of the distribution.

Notes

As with geom_mean(), non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean, then go about our business as per the Wikipedia page for geometric sd.

References

cytoflow.utility.util_functions.geom_sd_range(a)[source]

A convenience function to compute [geom_mean / geom_sd, geom_mean * geom_sd].

Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray A tuple, with (geom_mean / geom_sd, geom_mean * geom_sd)
cytoflow.utility.util_functions.geom_sem(a)[source]

Compute the geometric standard error of the mean for an “arbirary” data set, ie one that contains zeros and negative numbers.

Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray The geometric mean of the distribution.

Notes

As with geom_mean(), non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean. The geometric SEM is computed as in .

References

 The Standard Errors of the Geometric and Harmonic Means and Their Application to Index Numbers

Nilan Norris The Annals of Mathematical Statistics Vol. 11, No. 4 (Dec., 1940), pp. 445-448

http://www.jstor.org/stable/2235723?seq=1#page_scan_tab_contents

cytoflow.utility.util_functions.geom_sem_range(a)[source]

A convenience function to compute [geom_mean / geom_sem, geom_mean * geom_sem].

Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray A tuple, with (geom_mean / geom_sem, geom_mean * geom_sem)
cytoflow.utility.util_functions.cartesian(arrays, out=None)[source]

Generate a cartesian product of input arrays.

Parameters: arrays (list of array-like) – 1-D arrays to form the cartesian product of. out (ndarray) – Array to place the cartesian product in. out – 2-D array of shape (M, len(arrays)) containing cartesian products formed of input arrays. ndarray

Examples

>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])


References

Originally from http://stackoverflow.com/a/1235363/4755587

cytoflow.utility.util_functions.sanitize_identifier(name)[source]

Makes name a Python identifier by replacing all nonsafe characters with ‘_’

cytoflow.utility.util_functions.random_string(n)[source]

Makes a random string of ascii digits and lowercase letters of length n

cytoflow.utility.util_functions.is_numeric(s)[source]

Determine if a pandas.Series or numpy.ndarray is numeric from its dtype.

cytoflow.utility.util_functions.cov2corr(covariance)[source]

Compute the correlation matrix from the covariance matrix.