One area of project management that stumps a lot of people is how we come up with the probability and impact data for quantitative analysis. This is something that is not discussed with any depth in the PMBoK, or even PMI’s Practice Standard for Project Risk Management. In fact, it is summed up succinctly in two paragraphs under Data Gathering and Representation techniques as basically “collect the data through interviews” and then “create a probability distribution”. For the record, I am definitely not criticizing PMI for this approach, as entire books are written, and fields of study based, upon what is described in those two paragraphs. Also, just a friendly reminder this is ONE way to prepare estimates for probability and impact.

If you’ve never heard of Bayes, you’re in for a treat. If I had to sum up the work of Thomas Bayes in one sentence, I would say that it allows you to make inferences with incomplete data. His work has evolved into fields of study from Game Theory to Statistics. Right now, I want to concentrate solely on Bayesian Statistics.

We can all agree that the initial period of a project life-cycle is when uncertainty is highest. This uncertainty is inherent in any project that is stood up, and it invariably decreases as planning is conducted. Project Risk is negatively correlated with Planning. This is not to say that planning can eliminate all risk, because that is impossible – but we can reduce uncertainty through concerted planning.

Risk management should begin at the start of a project. When I would find myself assigned to a project, one of the first things I sought to identify is what I needed to look into. *What uncertainties are out there that I must address? *Of course, the identification is the easy part! Relative estimation through qualitative risk analysis is the next step, and can be a fun exercise by ranking risk using animal names. Personally, I like to use chickens, horses and elephants. But what about when we got to quantitative analysis? Now we are no longer comparing one risk against another, but trying to determine numeric values for probability and impact.

Quantitative analysis can be especially difficult to do with any degree of accuracy if your organization has no historical experience in this type of work, or if the solution’s technological maturity is lacking. How can we make estimates about uncertainty, when we’re so uncertain about that with which we are uncertain? Management Reserves, per the PMBoK, are set aside for unidentified and unforeseeable risks – so it’s too late, we’ve already identified it, we own it, and it would be irresponsible to not plan for it.

Complexity and technical risk are not new challenges during quantitative analysis. I have read many papers on the topic, but I’m quite fond of a RAND Corp Working Paper by Lionel Galway which addresses the level of uncertainty inherent in complex projects, stating:

One argument against quantitative methods in advanced-technology projects is that there simply is not enough information with which to make judgments about time and cost. There may not even be enough information to specify the tasks and components.

I’m inclined to agree with Lionel that it is very difficult to make judgments with any degree of certainty when we’re lacking solid information. Risk data quality assessments are something called for by the PMBoK to test the veracity of the data we use. So how can we move forward?

Scott Ferson gives us a road map in a great article about Bayesian Methods in Risk Assessment. If you’d like to see the math side of this, please check out the article – I’m staying strictly conceptual. In this article, he used a scenario that described these concepts quite well: a bag of marbles. You have a cloth bag full of marbles. Well, you think it’s just marbles in there – but you don’t know and you can’t peek inside. Ferson is kind enough to tell us that there are five colors inside, including red – so we know red is possible but we don’t know if there is equal representation for all five colors. If we pull out one marble, what are the odds it will be red? This scenario has incomplete data, just like what most project teams have at the beginning of a project.

This comic does a great job introducing the two schools of thought for statistics that we’ll examine, and pretty quickly you will see why I am a fan of the Bayesian approach. This is not to totally discount frequentist probability, as I use it on a regular basis while conducting Six Sigma initiatives; however, it just does not work for our bag of marbles.

Determining a frequentist probability would require first establishing a key population parameter, its size: how many marbles are in the bag? Next, we would calculate a sample size based upon: population size, desired confidence level, precision level and the fact that we are working with discrete data. If the P value is too high, we can increase the sample size to increase of confidence that the results are not by chance. Based upon the sample, we can make statistical inferences about the population, and eventually we could establish the probability of drawing a red marble.

Just a couple problems here… I told you that you have a bag of marbles, but don’t forget that you’re not allowed to peek. You just have to tell me what the odds are of you drawing a red marble. But you don’t know the population size and you cannot draw a sample. The first marble you will see will be the one for which you were supposed to determine probability. Lacking any data makes frequentist probability calculations an impossibility, and having incomplete data severely inhibits its effectiveness. So let’s look at another method. Since it was developed to deal with incomplete data, Bayesian statistics allows us to approach everything very differently.

Bayesian statistics becomes more accurate as more information becomes available. The first marble will have the least accurate estimate, with the estimates getting better with every subsequent drawing. A simplified version of the formula becomes (n+1)/(N+c); where n=the number of red marbles we’ve seen so far, N=total number of marbles sampled, and c=number of colors possible. So for the first marble, the probability is calculated as (0+1)/(0+5)=0.20. While it may or may not be correct, it is a starting point. For every marble sampled and returned to the bag, the formula will change and the accuracy of future estimates will improve. Glickman and Vandyk expand upon this usage by expanding into the application of multilevel models with the use of Monte Carlo analysis.

I can’t ignore the very human aspects of Bayesian statistics, which was captured well by Haas et al. as they described three pillars of Bayesian Risk Management: Softcore BRM, Bayesian Due Diligence, and Hardcore BRM. Softcore relies upon subjective interpretation of uncertainty, think about consulting your subject matter experts. Hardcore leans on mathematical approaches and statistical inference, while the Due Diligence mitigates the triplet of opacity by ensuring that facts do not override the expertise of authoritative and learned people.

The reality is that the majority of people working on projects use software to determine their quantitative estimates for specific risks that have been identified. However, I’m not a fan of answering someone’s question by stating “don’t worry, there’s software for that!” While you may never have to calculate risks in an analogue manner, it is worth knowing that if you are dealing with unfamiliar risk a Bayesian approach makes more sense than a frequentist approach. If you have historical information available and a well-defined population, by all means, collect a sample. But keep in mind, as I often tell people when I work Six Sigma, “don’t make the data fit the test, select a test that fits the data”.