# cytoflow.utility.util_functions¶

Useful (mostly numeric) utility functions.

`iqr` – calculate the interquartile range for an array of numberes

`num_hist_bins` – calculate the number of histogram bins using Freedman-Diaconis

`geom_mean` – compute the geometric mean

`geom_sd` – compute the geometric standard deviation

`geom_sd_range` – compute [geom_mean / geom_sd, geom_mean * geom_sd]

`geom_sem` – compute the geometric standard error of the mean

`geom_sem_range` – compute [geom_mean / geom_sem, geom_mean * geom_sem]

`cartesian` – generate the cartesian product of input arrays

`sanitize_identifier` – makes a string a valid Python identifier by replacing all non-safe characters with ‘_’

`random_string` – Makes a random string of ascii digits and lowercase letters

`is_numeric` – determine if a `pandas.Series` or `numpy.ndarray` is numeric from its dtype

`cov2corr` – compute the correlation matrix from the covariance matric

cytoflow.utility.util_functions.iqr(a)[source]

Calculate the inter-quartile range for an array of numbers.

Parameters

a (array_like) – The array of numbers to compute the IQR for.

Returns

The IQR of the data.

Return type

float

cytoflow.utility.util_functions.num_hist_bins(a)[source]

Calculate number of histogram bins using Freedman-Diaconis rule.

Parameters

a (array_like) – The data to make a histogram of.

Returns

The number of bins in the histogram

Return type

int

cytoflow.utility.util_functions.geom_mean(a)[source]

Compute the geometric mean for an “arbitrary” data set, ie one that contains zeros and negative numbers.

Parameters

a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray

Return type

The geometric mean of the input array

Notes

The traditional geometric mean can not be computed on a mixture of positive and negative numbers. The approach here, validated rigorously in the cited paper, is to compute the geometric mean of the absolute value of the negative numbers separately, and then take a weighted arithmetic mean of that and the geometric mean of the positive numbers. We’re going to discard 0 values, operating under the assumption that in this context there are going to be few or no observations with a value of exactly 0.

References

 Geometric mean for negative and zero values

Elsayed A. E. Habib International Journal of Research and Reviews in Applied Sciences 11:419 (2012) http://www.arpapress.com/Volumes/Vol11Issue3/IJRRAS_11_3_08.pdf

cytoflow.utility.util_functions.geom_sd(a)[source]

Compute the geometric standard deviation for an “abitrary” data set, ie one that contains zeros and negative numbers. Since we’re in log space, this gives a dimensionless scaling factor, not a measure. If you want traditional “error bars”, don’t plot `[geom_mean - geom_sd, geom_mean + geom_sd]`; rather, plot `[geom_mean / geom_sd, geom_mean * geom_sd]`.

Parameters

a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray

Return type

The geometric standard deviation of the distribution.

Notes

As with `geom_mean`, non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean, then go about our business as per the Wikipedia page for geometric sd.

References

cytoflow.utility.util_functions.geom_sd_range(a)[source]

A convenience function to compute [geom_mean / geom_sd, geom_mean * geom_sd].

Parameters

a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray

Return type

A tuple, with `(geom_mean / geom_sd, geom_mean * geom_sd)`

cytoflow.utility.util_functions.geom_sem(a)[source]

Compute the geometric standard error of the mean for an “arbirary” data set, ie one that contains zeros and negative numbers.

Parameters

a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray

Return type

The geometric mean of the distribution.

Notes

As with `geom_mean`, non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean. The geometric SEM is computed as in .

References

 The Standard Errors of the Geometric and Harmonic Means and Their Application to Index Numbers

Nilan Norris The Annals of Mathematical Statistics Vol. 11, No. 4 (Dec., 1940), pp. 445-448

http://www.jstor.org/stable/2235723?seq=1#page_scan_tab_contents

cytoflow.utility.util_functions.geom_sem_range(a)[source]

A convenience function to compute [geom_mean / geom_sem, geom_mean * geom_sem].

Parameters

a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray

Return type

A tuple, with `(geom_mean / geom_sem, geom_mean * geom_sem)`

cytoflow.utility.util_functions.cartesian(arrays, out=None)[source]

Generate a cartesian product of input arrays.

Parameters
• arrays (list of array-like) – 1-D arrays to form the cartesian product of.

• out (ndarray) – Array to place the cartesian product in.

Returns

out – 2-D array of shape (M, len(arrays)) containing cartesian products formed of input arrays.

Return type

ndarray

Examples

```>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
```

References

Originally from http://stackoverflow.com/a/1235363/4755587

cytoflow.utility.util_functions.sanitize_identifier(name)[source]

Makes name a Python identifier by replacing all nonsafe characters with ‘_’

cytoflow.utility.util_functions.random_string(n)[source]

Makes a random string of ascii digits and lowercase letters of length `n`

cytoflow.utility.util_functions.is_numeric(s)[source]

Determine if a `pandas.Series` or `numpy.ndarray` is numeric from its dtype.

cytoflow.utility.util_functions.cov2corr(covariance)[source]

Compute the correlation matrix from the covariance matrix.