There are two worlds of risk assessment, qualitative and quantitative. Here are some thoughts of how to improve the tools and techniques you use for qualitative risk assessment.
By definition, qualitative risk assessment is about differentiating between risks and ranking them, and not about actually measuring them. I’ve often seen organisations getting a bit carried away in their qualitative approach, veering into a world of quasi-quantification which can lead to a false confidence in a questionable outcome. In my view that is to be avoided. If you really need to quantify risks, do it properly. I’ll leave for a separate blog.
However, you do want your qualitative thinking to be as objective and evidenced based as possible, and that is what motivates my thoughts here.
Don’t sit on the fence
Almost everyone assesses risks for both likelihood and impact, or frequency and severity; call them what you will. (If you’re not doing this you might want to think about it). Since we are working qualitatively, both assessments will be scalar e.g. a simple ‘High, Medium or Low’, or a more sophisticated ‘Very Low, Low, Moderate, High, Severe’. If you present folks with a scale that has an odd number of increments they can pick the middle option, and often do. They have ‘sat on the fence’.
In my earlier career, when introducing a risk framework where there had been none before, I thought to keep things simple initially (walk before you run etc.) and went for H,M,L. At the end of our first crank of the risk assessment handle I had 3000 risks, from 11 countries, 70% of which were medium!. Not much help with differentiation, ranking, and knowing where to focus. So don’t let your risk assessors sit on the fence, and save them from splinters where they really don’t want them. Simply remove the middle option by using an assessment scale with an even number of increments.
How precise is ‘precise enough’?
If an odd numbered scale presents a fence to sit on, the probability of people taking up their perch is greatest when the scale has few increments, H,M,L being the classic. So how granular does the scale need to be? Before tackling the question we should recognise there are typically three scales: impact, likelihood and exposure. Exposure is the combination of impact and likelihood.
Obviously enough, high likelihood combined with high impact results in high exposure. Similarly low likelihood and impact leads to low exposure. Other combinations sit somewhere in the middle (I know you know this). Typically each risk’s likelihood and impact is plotted on a risk matrix and each cell or intersection is given a ‘traffic light’ colour, representing exposure. Red is bad, green is good, amber is not so bad (or not so good). (I know you know this too).
Back to the question. We have to consider the granularity of all three scales. Back to my fledgling risk framework, for the next crank of the handle we went with a four increment scale for likelihood and impact, pulling the fence from under those comfortably balanced upon it. That worked and we got a better spread of assessment data, and so a better potential for differentiation.
However we also needed to move from a three way (red, amber, green) traffic light on our risk matrix to a four way light (red, amber, yellow, green). Otherwise, in combining likelihood with impact we would have lost differentiation we gained, and would still have seen 70% of our risks with amber exposure.. By also changing our traffic light we got: red 5%, orange 25%, yellow 40%, green 30%, or something like that. Now we had some credible differentiation. Job 1: focus on the reds. Concentrating on 1 in 20 risks is pretty focussed. Job 2: now also turn your attention to the ambers. 1 in 4 risks is still reasonably focussed.
Should I have been tempted to go further, to reach for greater precision? In my consulting days I have seen organisations running with a 9 x 9 risk matrix. That’s a scale of 81 different risk assessment outcomes. Each outcome is just 0.01% away from it’s neighbour. Can you really defend that degree of differentiation? I say that’s spurious precision. And what’s the point if you're then going to overlay a 3, 4 or even five way traffic light scheme upon it?
In my view 3 is too coarse, but 4 is workable. 9 is silly in a qualitative regime. If you really do need that degree of precision you should quantify your risks, and properly. That isn’t easy, so you had better really need it. Ask any bank or insurer who is operating a risk based capital model.
Spurious precision - of a different kind
In the search for precision some are tempted to weave some numerical thinking into their qualitative world. ‘If the impact of the risk is greater than £1 million then you must rate the impact as ‘High’; greater than £500,00 then its medium etc.’. This can be dangerous in two ways.
Firstly, in my consulting experience I seldom found any consistency around reliable, recorded rationale for why the assessor held the impact to be £X. Sometimes there was a one-liner. The risk is ‘potential for regulatory censure’ and someone in our sector was once fined £2 million. That’s OK (just) but for most risks, nothing. Just a gut feel. So how reliable are these numbers, and the assessment outcomes they prompt?
The second danger is succumbing to the temptation to do sums with these numbers. ‘Let’s add them up, average them, take the worst case number etc.’. Very dangerous in my view, because people will start to believe the answers, and stop thinking. Our goal is to improve risk thinking. This is an example of the quasi-quantification I mentioned at the outset. I say quantify risks properly where you need to, and elsewhere keep things strictly qualitative.
Where’s your evidence?
The lack of quantification rationale discussed above is an example of ‘missing evidence’, but let’s take a broader look.
It’s common to make two assessments of risk: what’s the raw exposure, in the absence of any attempt to mitigate the risk (the Inherent exposure), and what’s the actual exposure given the mitigating measures in place (the Residual exposure). The mitigating measures are known as controls. So, a risk assessor might say:
Inherent likelihood = High, residual likelihood = Medium.
Inherent impact = Medium, residual impact = Low.
Is their residual assessment objective? If it’s not, we may as well pack up and go home. We can test objectivity by seeking evidence. The first source of evidence is the controls. They too are assessed, for their operational effectiveness. And they come in two flavours. Preventative controls are designed and operated to reduce the likelihood of the risk occurring (or ‘crystallizing’ in posh circles). Corrective controls aim to reduce the impact of the risk, should it occur.
Countless times I have seen risk assessors claim the residual likelihood is lower then the inherent, but there’s not a preventative control in sight. So, there’s no evidence for that assessment being reliably objective. Similarly a claim for a lowered impact in the absence of corrective controls does not pass the test. If corrective controls are in place, if they have been assessed as operationally ineffective you can’t take credit for them. They must be present and operating effectively.
I have described a fundamental bit of risk thinking here. If your risk assessors don’t deploy this logic, or the data is not available to enable this thinking, then you are still on our risk assessment journey. You are not there yet. There are tell-tale signs e.g. no distinction between preventive and corrective controls. Even when the data is there you still can’t assume the risk assessor is doing this thinking, so simply oblige them to provide an assessment rationale, where they must declare their thinking, and read it. You need this evidence.
Beware the algorithm!
People love the blighters. I’m cautious, particularly when an algorithm is used in lieu of some essential thinking. Algorithm’s can’t think. Let’s take the bit of essential thinking described above as an example. I have often seen algorithms used to look at the controls in place against a given risk and deduce an overall control effectiveness score for that risk. They typically turn control assessments into numbers (wholly effective =3, partially effective =2, ineffective =1) and then add them up, average them, take the worst case, or whatever. Hey presto: the control effectiveness is seven! Emboldened, the algorithm developers go further.
Turn the inherent exposure assessment into a number, and do some kind of maths to combine that with control effectiveness score to deduce the residual exposure. And then believe the answer, and no longer require the risk assessor to do the thinking. Hogwash I say. The algorithm designers respond with further complexity. ‘We recognise that not all controls are equal, so we will use a weighted average to compute the overall control effectiveness’. Do the thinking, write it down, make an informed judgement.
Praise the algorithm!
Well make your mind up, I hear you say. Algorithms are very useful in drawing the risk assessor’s attention to something they should consider. An example I have used is to flag a control as red, or notable is some less alarming way if you prefer, if the controller assessor has given the control anything but an entirely clean bill of health. Controls are typically assessed both for their design and their operational effectiveness. So the simple algorithm says if both assessments are good the control is green, anything else, the control is red.
This is useful to the risk assessor in making their residual risk assessment. As we said earlier, control evidence is key. At the very least, in considering the controls, the risk assessor should look at the red ones, understand what is happening, and decide (which requires thinking) how that may affect their risk assessment. The red flag on a control might simply indicate that the design could be improved, perhaps from an efficiency perspective, but the control is working ok. This would not affect the risk assessment. On the other hand, red may indicate that the control is wholly ineffective in operation. That certainly would affect the risk assessment, and prompt action.
So, algorithms can guide attention, or tell you where to look, but they can’t compute the answer. The answer requires thought.
Keep it real
This could be thought of as a continuation of ‘Where’s your evidence?’ above. There I argued the importance of control effectiveness in making objective, evidence-based risks assessments. However, that control data we considered is really a view of what we expect to happen. Given our preventive controls, how often do we think this risk might actually crystallise? Given our corrective controls, what do we think the impact of this risk will be, should it arise.
There’s another whole dimension of evidence to consider. It is called reality: what is actually happening, regardless of our expectations. So what dimensions of reality do we track and make available to risk assessors to influence their risk assessments?
Firstly loss data, or risk event data as some called it. This is a record of ‘things going wrong’ which result in a tangible impact, or nearly did (the so-called near miss). Another way of looking at this data is to regard it as a record of when risks crystallized, and what happened when they did. You can hardly assess the likelihood and impact of a risk as low if it has crystallized three times in the last quarter with an aggregate loss of £800K. If that is what is actually happening you need to take that real-world evidence on board, adjust your assessment, and take action.
Secondly, Key Risk Indicators (KRIs). These are bits of management information which can better inform you about a risk. If you were considering the risk of mis-selling insurance contracts to retail customers, and the KRI which tracked how many customers cancelled their newly bought policy within the 14 day cooling off period has doubled in the last quarter, you should take note. You may find that the spike follows a revision of the telesales script, and your company may well be ‘strong arming’ customers to buy’. Whilst they couldn’t resist signing up on the call, they retrospectively regret their decision. This sort of metric may well influence risk assessments. In this example it certainly should.
The message here is that several forms of evidence should be considered when assessing risk and you need to take on board real world data. I would say that if you are only looking at controls, and not yet Risk Events and KRIs then, again you are still on your risk assessment journey. Take the next step!
This is my last hard-earned piece of risk assessment wisdom, or my last opinionated rant as you may see it! Once, when taking up a new CRO role, the Board’s risk committee was meeting on my first day. I was invited to observe and witnessed an animated bun fight over whether ‘people risk’ was red. The combatants were looking at the ‘risk’ from different perspectives. They were both right, given their perspective. The problem is that ‘people risk’ is not an individual risk, it’s a multifaceted subject area within which several risks may crystallise. Each combatant was each thinking of one of these different, actual risks within the broad canvas of ‘issues arising from human behaviour’.
There was a subsequent dispute over the status of conduct risk. Perhaps even more so, this is a multifaceted subject area, not an individual risk but a number of risks, relating to areas as varied as financial promotions, value for money, dealing with vulnerable customers and others.
The single people and conduct risk traffic lights were motivated by the desire to simplify Board reporting, by presenting a ‘rolled up’ or ‘aggregate’ position rather than a mass of detail. In my example, was the Board adequately informed? No. Were they confused? Yes.
There were two problems at play. Firstly they had not identified the individual, underlying risks, e.g. the ‘risk of not treating vulnerable customers properly’ within the subject area of conduct risk. So essentially they had no real risk assessment data to hand. Secondly, even if they had a number of factors in mind, there was no reliable aggregation or roll-up technique. They simply attached a judgemental traffic light to what was actually a risk category, and not a risk.
I later put my observations to the committee, and argued that you can really only reliably aggregate risks numerically. They decided that where they had ‘numbers’ they would aggregate, and where they didn’t, they wouldn’t. They first took aim at credit risk. They had valued credit risk in each trading division and wanted to add them up. The problem is the values were not reliable, as they were not the result of proper probability of default analysis, but the result of simple but flawed arithmetic. Even if they had been properly computed individually, you still can't simply add them up. That would result in an overstatement of credit risk. My point is that it takes reliable data and sophisticated analysis to quantitatively aggregate risks. If you really have the need, do it, but do it properly. Only then can you have a single and reliable traffic light. The alternative is to let go of the aggregation holy grail, break down the, say, conduct risk category into its constituent actual risks, and track and report on them. Pseudo-aggregation is misleading, perhaps dangerously so.
These are not black and white do’s and don’ts. Depending on where you sit on the risk maturity scale, and on how sophisticated you want or need to be, at some time these ideas may need to come into play. No distinction between preventive and corrective controls? Now may be the time to add that. No Risk Events and KRIs? Perhaps you should now adopt them. A desire to abandon pseudo-aggregation and move to risk quantification? Take a deep breath and go for it.
Here at Decision Focus we have a highly flexible risk solution that’s far from one size fits all. We can very quickly tailor it to what you need now by drawing on over twenty individuals ‘capsules of capability’. When it's time for you to move on, we simply bring in further capsules to keep pace with your ambition
Like to know how we achieve this?