cytoflow.utility.util_functions¶
Useful utility functions
-
cytoflow.utility.util_functions.
iqr
(a)[source]¶ Calculate the inter-quartile range for an array of numbers.
Parameters: a (array_like) – The array of numbers to compute the IQR for. Returns: The IQR of the data. Return type: float
-
cytoflow.utility.util_functions.
num_hist_bins
(a)[source]¶ Calculate number of histogram bins using Freedman-Diaconis rule.
From http://stats.stackexchange.com/questions/798/
Parameters: a (array_like) – The data to make a histogram of. Returns: The number of bins in the histogram Return type: int
-
cytoflow.utility.util_functions.
geom_mean
(a)[source]¶ Compute the geometric mean for an “arbitrary” data set, ie one that contains zeros and negative numbers.
Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray Returns: Return type: The geometric mean of the input array Notes
The traditional geometric mean can not be computed on a mixture of positive and negative numbers. The approach here, validated rigorously in the cited paper[1], is to compute the geometric mean of the absolute value of the negative numbers separately, and then take a weighted arithmetic mean of that and the geometric mean of the positive numbers. We’re going to discard 0 values, operating under the assumption that in this context there are going to be few or no observations with a value of exactly 0.
References
- [1] Geometric mean for negative and zero values
- Elsayed A. E. Habib International Journal of Research and Reviews in Applied Sciences 11:419 (2012) http://www.arpapress.com/Volumes/Vol11Issue3/IJRRAS_11_3_08.pdf
-
cytoflow.utility.util_functions.
geom_sd
(a)[source]¶ Compute the geometric standard deviation for an “abitrary” data set, ie one that contains zeros and negative numbers. Since we’re in log space, this gives a dimensionless scaling factor, not a measure. If you want traditional “error bars”, don’t plot
[geom_mean - geom_sd, geom_mean + geom_sd]
; rather, plot[geom_mean / geom_sd, geom_mean * geom_sd]
.Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray Returns: Return type: The geometric standard deviation of the distribution. Notes
As with
geom_mean()
, non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean, then go about our business as per the Wikipedia page for geometric sd[1].References
[1] https://en.wikipedia.org/wiki/Geometric_standard_deviation
-
cytoflow.utility.util_functions.
geom_sd_range
(a)[source]¶ A convenience function to compute [geom_mean / geom_sd, geom_mean * geom_sd].
Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray Returns: Return type: A tuple, with (geom_mean / geom_sd, geom_mean * geom_sd)
-
cytoflow.utility.util_functions.
geom_sem
(a)[source]¶ Compute the geometric standard error of the mean for an “arbirary” data set, ie one that contains zeros and negative numbers.
Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray Returns: Return type: The geometric mean of the distribution. Notes
As with
geom_mean()
, non-positive numbers pose a problem. The approach here, though less rigorously validated than the one above, is to replace negative numbers with their absolute value plus 2 * geometric mean. The geometric SEM is computed as in [1].References
- [1] The Standard Errors of the Geometric and Harmonic Means and Their Application to Index Numbers
Nilan Norris The Annals of Mathematical Statistics Vol. 11, No. 4 (Dec., 1940), pp. 445-448
http://www.jstor.org/stable/2235723?seq=1#page_scan_tab_contents
-
cytoflow.utility.util_functions.
geom_sem_range
(a)[source]¶ A convenience function to compute [geom_mean / geom_sem, geom_mean * geom_sem].
Parameters: a (array-like) – A numpy.ndarray, or something that can be converted to an ndarray Returns: Return type: A tuple, with (geom_mean / geom_sem, geom_mean * geom_sem)
-
cytoflow.utility.util_functions.
cartesian
(arrays, out=None)[source]¶ Generate a cartesian product of input arrays.
Parameters: - arrays (list of array-like) – 1-D arrays to form the cartesian product of.
- out (ndarray) – Array to place the cartesian product in.
Returns: out – 2-D array of shape (M, len(arrays)) containing cartesian products formed of input arrays.
Return type: ndarray
Examples
>>> cartesian(([1, 2, 3], [4, 5], [6, 7])) array([[1, 4, 6], [1, 4, 7], [1, 5, 6], [1, 5, 7], [2, 4, 6], [2, 4, 7], [2, 5, 6], [2, 5, 7], [3, 4, 6], [3, 4, 7], [3, 5, 6], [3, 5, 7]])
References
Originally from http://stackoverflow.com/a/1235363/4755587
-
cytoflow.utility.util_functions.
sanitize_identifier
(name)[source]¶ Makes name a Python identifier by replacing all nonsafe characters with ‘_’
-
cytoflow.utility.util_functions.
random_string
(n)[source]¶ Makes a random string of ascii digits and lowercase letters of length
n
-
cytoflow.utility.util_functions.
is_numeric
(s)[source]¶ Determine if a
pandas.Series
ornumpy.ndarray
is numeric from its dtype.
-
cytoflow.utility.util_functions.
cov2corr
(covariance)[source]¶ Compute the correlation matrix from the covariance matrix.
From https://github.com/AndreaCensi/procgraph/blob/master/src/procgraph_statistics/cov2corr.py