KRI Loss Events and estimating Operational risk capital – an example

7 mins read

We use a simple illustrative example to show how Key Risk Indicators (KRI) and Loss Events can be used to estimate operational risk capital associated with a given risk.

RCSA results. Computer failure: –   Our RCSA exercise has determined a number of Key Risk Indicators (KRI). One of the  key KRI is No of Computer Failure per month. Given the abundance of logs and monitoring systems we also have the data for such failures.

KRI and its relation to Loss: – A relationship or map between KRI and its impact on business lines and their operations need to be developed so as to understand the impact and how this impact could create a risk for the company.

Numbers of computer failures per month now needs to be quantified in dollar terms. Our RCSA exercise also indicates that the primary issue with computer failure is unsuccessful transactions that are unable to post, commit or close because the underlying computer component has failed. This ranges from transaction failure at ATM networks when a customer is attempting to execute a function to a treasury systems transactions failing to post and leading to issues at day end or during central bank reporting because of a failed transaction that was not identified, rectified or corrected

To understand the relationship between this specific KRI and a given scenario we can use conditional probability to estimate the no of unsuccessful transactions that are not identified, rectified or corrected in time and that occurred because of a computer failure.

Loss Data: – Based on the frequency and severity estimation from using loss data from our system logs we can drill down to specific transactions and the amount of loss in dollar terms which has occurred for that given KRI.

Capital Estimation Model: – The loss data can be collated for all KRI’s and then grouped by business lines and then aggregation of this data can be used to estimate capital for Operational Risk.

KRI Loss Event Capital Estimation case study

Goodman Bank has setup a new Operation Risk management department. The first task of the bank is to find out potential processes, practices and activities which could lead to a risk and to find ways of risk assessment and control. In order to do that Risk Control Self Assessment (RCSA) is used.

Our Op risk management team identifies ‘Server Outage’ as one of the KRIs which could lead to potential loss.  The head of ERM department needs to develop a model which explains how this KRI affect the operational risk capital estimation

We start with finding out data regarding server outages from the IT department. We are able to find a list which has Server Outage details on daily basis, for the last 100 months along with the duration or downtime for the server.

Server Outage
Date Outage Downtime Duration ( HH : MM)
11-Jan-12 0 00:00
12-Jan-12 0 01:00
13-Jan-12 1 00:30
14-Jan-12 0 00:00
15-Jan-12 1 01:30
16-Jan-12 0 00:00
17-Jan-12 0 00:00
18-Jan-12 1 02:00
19-Jan-12 0 00:00
20-Jan-12 1 01:00
21-Jan-12 0 00:00
22-Jan-12 1 00:45
23-Jan-12 0 00:00
24-Jan-12 1 00:15
25-Jan-12 1 00:25
26-Jan-12 0 00:00
27-Jan-12 0 00:00
28-Jan-12 0 00:00
29-Jan-12 1 00:05
30-Jan-12 1 02:00
31-Jan-12 1 01:00

Table 1:  Operational risk – Server Outage Record along with Down Time Duration 

Relation between KRI and loss event

We understand that server outage can lead to a risk event, but do not find a direct relation with loss itself. Keeping this in his mind we look for a KRI which we have identified and an event which could be associated with that KRI and can lead to a loss.

We looks into different processes across the RCSA entities and find that failed (incomplete) transactions that are not caught, identified, reported and fixed in time can be a possible event which can be related to server outage. In order to confirm our initial assessment we look for data related to failed (incomplete) transactions from the treasury trading and sales department on daily basis. The reason for using treasury department data set and servers is that unlike core banking and ATM network failures, we actually have a closed system within which the impact of a failed transaction can be easily quantified in dollar terms.

Failed Transactions
Date Failed Transactions Number
11-Jan-12 0 0
12-Jan-12 1 11
13-Jan-12 0 0
14-Jan-12 1 18
15-Jan-12 1 5
16-Jan-12 0 0
17-Jan-12 1 8
18-Jan-12 0 0
19-Jan-12 1 8
20-Jan-12 0 0
21-Jan-12 0 0
22-Jan-12 1 12
23-Jan-12 0 0
24-Jan-12 1 13
25-Jan-12 1 14
26-Jan-12 0 0
27-Jan-12 0 0
28-Jan-12 1 16
29-Jan-12 1 9
30-Jan-12 0 0
31-Jan-12 0 0

Table 2: Failed Transactions

In this table 0 and 1 denotes occurrence of fail transactions for a given day. In order to develop a relationship between the failed transactions and server outage, we need to see what the probability of failed transactions occurring is given that there is a server failure or outage.

To calculate the conditional probability, we build another table which gives both failed transactions and server outage over a span of sixty day. Another column is added which is called the ‘joint failure’ column and takes the value 1 when there is a failed transaction as well as server outage and takes value 0 if none of the event takes place. The table is given on the next page.

With the help of the given table we can compute the probability of failure of transaction given there is a sever outage or failure.

The total number of possible outcomes:  60 (i.e. no. of days)

No of days when there were failed transactions (FT): 35

No of days when there were server failure/outages (SF): 28

No of days when both (FT) and (SF) occurred: 17

The probabilities for above events can be found by dividing each one of them with total possible outcomes which give us .58, .46, .28 for FT, SF and FT SF combined respectively.

Date Failed Transactions Outage Joint
10-Feb-12 0 0 0
11-Feb-12 0 0 0
12-Feb-12 1 0 0
13-Feb-12 0 1 0
14-Feb-12 1 0 0
15-Feb-12 1 1 1
16-Feb-12 0 0 0
17-Feb-12 1 0 0
18-Feb-12 0 1 0
19-Feb-12 1 0 0
20-Feb-12 0 1 0
21-Feb-12 1 0 0
22-Feb-12 1 1 1
23-Feb-12 0 0 0
24-Feb-12 1 1 1
25-Feb-12 1 1 1
26-Feb-12 0 0 0
27-Feb-12 1 0 0
28-Feb-12 0 1 0
29-Feb-12 1 1 1
01-Mar-12 0 0 0
02-Mar-12 1 1 1
03-Mar-12 0 1 0
04-Mar-12 1 1 1
05-Mar-12 0 0 0
06-Mar-12 1 0 0
07-Mar-12 1 1 1
08-Mar-12 0 0 0
09-Mar-12 1 1 1
10-Mar-12 1 0 0

 

Date Failed Transactions Outage Joint
 11-Jan-12 0 0 0
12-Jan-12 1 0 0
13-Jan-12 0 1 0
 14-Jan-12 1 0 0
15-Jan-12 1 1 1
16-Jan-12 0 0 0
17-Jan-12 1 0 0
18-Jan-12 0 1 0
19-Jan-12 1 0 0
20-Jan-12 0 1 0
21-Jan-12 0 0 0
22-Jan-12 1 1 1
23-Jan-12 0 0 0
24-Jan-12 1 1 1
25-Jan-12 1 1 1
26-Jan-12 0 0 0
27-Jan-12 0 0 0
28-Jan-12 1 0 0
29-Jan-12 1 1 1
30-Jan-12 0 1 0
31-Jan-12 1 1 1
01-Feb-12 0 1 0
02-Feb-12 0 1 0
03-Feb-12 1 0 0
04-Feb-12 1 0 0
05-Feb-12 1 1 1
06-Feb-12 1 0 0
07-Feb-12 1 1 1
08-Feb-12 1 0 0
09-Feb-12 1 0 0

Table 3:  Table with Transaction failure and Server Failure used for Estimation of conditional probability

Once the marginal probabilities for all three events are known, we find the probability of transaction failure given that there were server failure/outages. This can be found by the conditional probability formula which is given as:-

JointDistribution-1

Writing in terms of SF and FT and calculating:-

OpRisk-JointDist-2

This shows that 61pec Transaction failures are due to Server Outage or failure.

Risk Scenario and Loss Data

Now that we established a link we now need to quantify the losses in dollar terms.

After spending some time, we find the loss data associated with failed transaction on daily basis. Within the treasury system failed transactions take many forms of loss. Two such instances are highlighted below:

1) A failed limit update leads to a counterparty limit breach which is caught at day end and leads to the day end process being delayed. The delay leads to a vendor support call which gets charged out at the out of office overtime rate of the vendor as well as extended operating time at the treasury desk as well as the IT support group.

2) A failed transaction post leads to incorrect execution of a market transaction that has to be rolled back and re-executed at different market rates than committed to a counterparty leaving the bank with the mark to market loss on both ends of the transaction (roll back as well as re-exeuction).

The table of loss data collected from the treasury system follows:

Failed Transactions
Date Failed Transactions Number Loss Data  ($)
11-Jan-12 0 0 0
12-Jan-12 1 11 5600
13-Jan-12 0 0 0
14-Jan-12 1 18 8000
15-Jan-12 1 5  3000
16-Jan-12 0 0 0
17-Jan-12 1 8 4500
18-Jan-12 0 0  0
19-Jan-12 1 8 4300
20-Jan-12 0 0 0
21-Jan-12 0 0 0
22-Jan-12 1 12 6200
23-Jan-12 0 0 0
24-Jan-12 1 13 7300
25-Jan-12 1 14 7300
26-Jan-12 0 0 0
27-Jan-12 0 0 0
28-Jan-12 1 16 7700
29-Jan-12 1 9 4800
30-Jan-12 0 0 0
31-Jan-12 0 0 0

Table 4: Loss Data for Failed Transactions

It can be seen that the loss data varies on daily basis and its value depend on the number (frequency) of failed transactions.  Based on the conditional probability we found earlier we can find the average value of losses due to server outages over a period of month. This gives us an estimate of the loss amount which is due to the server outage/failure.

By following the same approach Mike we make monthly loss data table for the last 100 month based on the daily data.

Capital Estimate and Observations

The Capital estimation can be easily done based on the loss data we have and the loss distribution approach can be used.

A histogram for the loss data that is acquired for 100 months is given below

Oprisk-Histogram-LossData-1

Figure 1: Loss data for the 100 months period represented by a histogram

The Cumulative Distribution Function (CDF) for the data set. From the histogram it can be seen that the data we have represent a normal distribution. To further investigate this we can have a probability plot which can be made through regression analysis in excel.  The plot probability plot is given below.

OpRisk-LossDataPlot

Figure 2: Normal Probability Plot for the loss data

From the probability plot it can be seen that, even though there are some outliers but it is safe to assume that normal probability distribution is a good fit for our data set.

References:

  1. Jan Lubbe and Flippie Snyman, “The advance measurement approach for banks”.
  2. Nigel Da Costa Lewis: “Operational Risk with Excel and VBA”, John Wiley & Sons .

Comments are closed.