cytoflow.utility.minisom#
Copied from JustGlowing/minisom Github commit c67f4e4 MIT licensed by Giuseppe Vettigli
- class cytoflow.utility.minisom.MiniSom(x, y, input_len, sigma=1, learning_rate=0.5, decay_function='asymptotic_decay', neighborhood_function='gaussian', topology='rectangular', activation_distance='euclidean', random_seed=None, sigma_decay_function='asymptotic_decay')[source]#
Bases:
object- Y_HEX_CONV_FACTOR = np.float64(0.8660254037844387)#
- get_euclidean_coordinates()[source]#
Returns the position of the neurons on an euclidean plane that reflects the chosen topology in two meshgrids xx and yy. Neuron with map coordinates (1, 4) has coordinate (xx[1, 4], yy[1, 4]) in the euclidean plane.
Only useful if the topology chosen is not rectangular.
- convert_map_to_euclidean(xy)[source]#
Converts map coordinates into euclidean coordinates that reflects the chosen topology.
Only useful if the topology chosen is not rectangular.
- update(x, win, t, max_iteration)[source]#
Updates the weights of the neurons.
- Parameters:
x (np.array) – Current pattern to learn.
win (tuple) – Position of the winning neuron for x (array or tuple).
t (int) – rate of decay for sigma and learning rate
max_iteration (int) –
- If use_epochs is True:
Number of epochs the SOM will be trained for
- If use_epochs is False:
Maximum number of iterations (one iteration per sample).
- quantization(data)[source]#
Assigns a code book (weights vector of the winning neuron) to each sample in data.
- random_weights_init(data)[source]#
Initializes the weights of the SOM picking random samples from data.
- pca_weights_init(data)[source]#
Initializes the weights to span the first two principal components.
This initialization doesn’t depend on random processes and makes the training process converge faster.
It is strongly reccomended to normalize the data before initializing the weights and use the same normalization for the training data.
- train(data, num_iteration, random_order=False, verbose=False, use_epochs=False, fixed_points=None, save_quant_history=False)[source]#
Trains the SOM.
- Parameters:
data (np.array or list) – Data matrix.
num_iteration (int) – If use_epochs is False, the weights will be updated num_iteration times. Otherwise they will be updated len(data)*num_iteration times.
random_order (bool (default=False)) – If True, samples are picked in random order. Otherwise the samples are picked sequentially.
verbose (bool (default=False)) – If True the status of the training will be printed each time the weights are updated.
use_epochs (bool (default=False)) – If True the SOM will be trained for num_iteration epochs. In one epoch the weights are updated len(data) times and the learning rate is constat throughout a single epoch.
fixed_points (dict (default=None)) – A dictionary k : (c_1, c_2), that will force the training algorithm to use the neuron with coordinates (c_1, c_2) as winner for the sample k instead of the best matching unit.
- train_random(data, num_iteration, verbose=False)[source]#
Trains the SOM picking samples at random from data.
- Parameters:
data (np.array or list) – Data matrix.
num_iteration (int) – Maximum number of iterations (one iteration per sample).
verbose (bool (default=False)) – If True the status of the training will be printed at each time the weights are updated.
- train_batch(data, num_iteration, verbose=False)[source]#
Trains the SOM using all the vectors in data sequentially.
- Parameters:
data (np.array or list) – Data matrix.
num_iteration (int) – Maximum number of iterations (one iteration per sample).
verbose (bool (default=False)) – If True the status of the training will be printed at each time the weights are updated.
- distance_map(scaling='sum')[source]#
Returns the distance map of the weights. If scaling is ‘sum’ (default), each cell is the normalised sum of the distances between a neuron and its neighbours. Note that this method uses the euclidean distance.
- Parameters:
scaling (string (default=’sum’)) – If set to ‘mean’, each cell will be the normalized by the average of the distances of the neighbours. If set to ‘sum’, the normalization is done by the sum of the distances.
- activation_response(data)[source]#
Returns a matrix where the element i,j is the number of times that the neuron i,j have been winner.
- quantization_error(data)[source]#
Returns the quantization error computed as the average distance between each input sample and its best matching unit.
- distortion_measure(data)[source]#
Returns the distortion measure computed as sum_i, sum_c (neighborhood(c, sigma) * || d_i - w_c ||^2
- topographic_error(data)[source]#
Returns the topographic error computed by finding the best-matching and second-best-matching neuron in the map for each input and then evaluating the positions.
A sample for which these two nodes are not adjacent counts as an error. The topographic error is given by the the total number of errors divided by the total of samples.
If the topographic error is 0, no error occurred. If 1, the topology was not preserved for any of the samples.