Introduction to Self Organising Maps

Self Organising Maps or SOM for short were originally invented by a Finnish professor, Teuvo Kohonen in the 1980s and therefore are sometimes called a Kohonen map. They are a form of neural network but are unusual in that they do not require labelled data to train (and are therefore an unsupervised learning tool). They are used to find clusters in data with high dimensionality (lots of parameters) and to reduce that dimensionality, normally to two dimensions, whilst preserving the topographical relationship of the input. SOMs are normally represented as two dimensional grids, through they could have one, three or more dimensions. Two tends to make most sense as SOMs are often used to graphically represent data clusters.

The SOM Algorithm

SOMs are deceptively simple. A grid of neurons is created with each neuron in the grid being connected to an input via an (initially random) weight. Think of this as every neuron containing a vector of dimension equal to the input data dimensionality. Each element of each vector is initially set to a random value in the range of the input data.

Training of the SOM follows the following steps:

  1. A random input element is selected and presented to the SOM
  2. The best matching unit (BMU) is selected by examining the weights to find the closest match to the input vector. In a two dimensional SOM this is just the Euclidean distance between the two vectors.
  3. A neighbourhood is calculated. This is a matrix of the same dimensions as the SOM itself. Each cell has a weight based upon its distance to the BMU. The value at the BMU is a 1 and the weights then fall off with the distance. A gaussian function is normally used to calculate the neighbourhood but other methods can also be used.
  4. The neighbourhood is then adjusted by the learning rate, with each element being reduced proportionately.
  5. The BMU and all cells in the neighbourhood have their weights adjusted to make it more closely match the input vector.
  6. This process is repeated for every element in the observed data set. This represents a single epoch.
  7. After this the function to calculate the neighbourhood and the learning rate are both reduced asymptoticly. In this way training in subsequent epochs is reduced and the SOM gradually approached being fully trained.
  8. Normally many epochs must be calculated and the error rate for each elemnt can be calculated by examining its BMU location and calculating the Euclidean distance between the weights at the BMU and the input vector.
  9. Once the SOM has been trained you can get the BMU for each input vector and a density map showing how many values have mapped to each location in the BMU.

Through this process the SOM is effectively mapped onto the input data in a lower dimension. The SOM is however topologically ordered. By this I mean that the special location of neurons in the grid corresponds to a particular domain or set of features in the input data.

Typically the SOM has a number of configurable meta parameters. Parameters which control the learning process. Indeed, the learning rate, or the amount that each weight W is adjusted for the BMU and its close neighbours is typically configurable, as is the size of the SOM itself as well as the number of epochs. The initial size of the BMU neighbourhood is also often configurable, as is the rate at which it shrinks during the learning process.

Once the learning process is complete, the BMU for any input vector is that’s inputs location within the SOM and the number of inputs matching that cells weights represents the density of the matches for that cell. It is thus easy to print or display the SOM graphically in a manner which imparts good insights into the nature of the input data.

Self Organising Map of security anomalies for a 40,000 person organisation

Another way of thinking about SOMs is as vector fields. A vector field can be visualized as assigning a vector to individual points within an n-dimensional space. Training a SOM is mapping the plane to some function which adjusts the vectors at each point to match the function at that point. Where a SOM is trained on a non-categorical data-set the vector field should be continuous which may give additional analysis options such as integrating a vector field along a curve, also called determining its line integral.

What training the SOM is actually doing is estimating the Dot Product or Euclidean distance (they are proportionate) of each input vector (observation) to every other one. We could do this directly but the number of calculations required would make this unworkable.

For an example of what you can use a SOM for in Information Security, read my post about building your own deep learning solution: https://infosecml.com/index.php/2020/11/01/so-you-want-to-build-your-own-deep-learning-solution/

About Us

Welcome to the home of advanced Information Security. Here you can learn about using Machine Learning and advanced analytics to improve your security environment.

In addition we will provide impartial advice about security technologies such as SIEM (Security Information and Event Management) and UEBA (User and Entity Behavioral Analysis) systems.

If you’d like help or advice on any of these subjects, or if you’d like to submit your own articles for consideration, then you can contact the site administrator through Linkedin. Check out the Contact page for more details.

Recent Posts

Categories