Bayes Theorem – Getting a better understanding of Probability

All Probabilities Are Conditional

Even when statisticians tell you that the probability of some event is A what they are actually saying is that given all of the background the probability is A. Something as simple as rolling a six sided die gives you a 1/6th probability of any individual number, that is conditional on the regularity of the die shape, how it is weighted, how it is rolled etc. We just tend to take this background context for granted and forget that it is there. This is normally fine for the rolls of dice, but it can be problematic when we are dealing with probabilities where the context is less easily ignored. Thomas Bayes was an 18th century minister and statistician who developed a way of taking the known context of a probability into account when calculating the probability itself.

To understand Bayes, we need to define a few simple terms.

The probability of A is written as P(A) which we call the marginal probability or the prior probability in Bayes theorem. That is, the probability of an event irrespective of any external context.

In addition, the joint probability, the probability of two simultaneous events, such as throwing two sixes is written P(A, B). We say that is is the probability of A and B.

The probability of an event given some context is written P(A|B) – The probability of A given B. We call this the conditional probability. One very important thing to note is that P(A|B) is not the same as P(B|A). The probability that your house is on fire given that you see smoke is not the same as the probability you see smoke given that your house is on fire. This is one of the most common mistakes made in probability, even it would seem, by trained statisticians.

Th conditional probability is given by:

P(A|B) = P(B|A) * P(A) / P(B) – This is Bayes theorem

The Probability of (A given B) is the probability of (B given A) times the probability of A divided by the probability of B.

By now you are probably wondering how this is useful and relevant to information security so let’s take a look at a simple though probably somewhat contrived example.

Let us suppose you are trying to improve detection of bad behaviour of accounts in your environment, either because they have been compromised or because a user is a threat. You decide to look at failed remote logins as an indicator and looking back through years of logs you discover that multiple failed remote logins on linux systems might be an indicator of a compromised account.

Looking further you notice:

0.5% of accounts have some sort of compromise or are being misused by employees, which means that 99.5% are not compromised.

Where an account is compromised, 80% of the time that is detectable through multiple failed remote logins, and therefore 20% of those compromised accounts can not be detected in that way.

11% of accounts which are not compromised also generate these multiple failed logins, Which means that failed logins is a good indicator of a compromised account 89% of the time.

So if we see multiple failed login what is the probability that the account if compromised? We can use Bayes to get a more accurate picture of this:

If we see multiple failed logins for an account it could be a real result or it could be a false positive. The chance that it is a true positive result is given by 0.5% * 80% = 0.004.

The chance of a false positive is 99.5% * 11% = 0.10945.

Using Bayes theorem

Bayes tells us that the probability of a hypothesis being correct given the evidence P(H|E), is the probability of seeing the evidence given the hypothesis P(E|H), multiplied by your initial probability of the hypothesis being correct, the prior probability, P(H), divided by the probability of seeing the evidence P(E).

The chance this is a compromised account given the multiple failed logins is give by:

P(H|E) = Chance of it being a compromised account given detections of multiple failed logins

P(E|H) = Chance of failed logins given that it is a compromised account = 80%

P(H) = Chance of it being a compromised account = 0.5%

P(not H) = Chance of it not being a compromised account = 99.5%

P(E|not H) = Chance of seeing failed logins from a non-compromised account = 10.945%

So the probability of seeing the evidence is given by:

P(E|H)P(H) + P(E|not H)P(not H) =  11.29%

Where E represents evidence and H is the hypothesis.

And plugging in the numbers we get:

Which gives us a chance of 3.54% that when we see multiple failed remote logins that the account attempting them is in some way compromised. Was that the number you were expecting? Of course what you now do with that probability is up to you.

In machine learning you will probably hear about Naive Bayes, which is a set of supervised learning algorithms which use Bayes theorem with a “naive” assumption of conditional independence – That is that conditions are independent and not driven by other elements. You may also encounter Bayesian networks where probabilities are lined together with conditional probabilities flowing back through the network.

Bayes theorem on its own is a good way to ensure you are thinking properly about probabilities and not ignoring the conditions around the probabilities your are measuring and estimating.

About Us

Welcome to the home of advanced Information Security. Here you can learn about using Machine Learning and advanced analytics to improve your security environment.

In addition we will provide impartial advice about security technologies such as SIEM (Security Information and Event Management) and UEBA (User and Entity Behavioral Analysis) systems.

If you’d like help or advice on any of these subjects, or if you’d like to submit your own articles for consideration, then you can contact the site administrator through Linkedin. Check out the Contact page for more details.

Recent Posts

Categories