Browse By

KRI Loss Events and estimating Operational risk capital – an example

We use a simple illustrative example to show how Key Risk Indicators (KRI) and Loss Events can be used to estimate operational risk capital associated with a given risk.

RCSA results. Computer failure: –   Our RCSA exercise has determined a number of Key Risk Indicators (KRI). One of the  key KRI is No of Computer Failure per month. Given the abundance of logs and monitoring systems we also have the data for such failures.

KRI and its relation to Loss: – A relationship or map between KRI and its impact on business lines and their operations need to be developed so as to understand the impact and how this impact could create a risk for the company.

Numbers of computer failures per month now needs to be quantified in dollar terms. Our RCSA exercise also indicates that the primary issue with computer failure is unsuccessful transactions that are unable to post, commit or close because the underlying computer component has failed. This ranges from transaction failure at ATM networks when a customer is attempting to execute a function to a treasury systems transactions failing to post and leading to issues at day end or during central bank reporting because of a failed transaction that was not identified, rectified or corrected

To understand the relationship between this specific KRI and a given scenario we can use conditional probability to estimate the no of unsuccessful transactions that are not identified, rectified or corrected in time and that occurred because of a computer failure.

Loss Data: – Based on the frequency and severity estimation from using loss data from our system logs we can drill down to specific transactions and the amount of loss in dollar terms which has occurred for that given KRI.

Capital Estimation Model: – The loss data can be collated for all KRI’s and then grouped by business lines and then aggregation of this data can be used to estimate capital for Operational Risk.

KRI Loss Event Capital Estimation case study

Goodman Bank has setup a new Operation Risk management department. The first task of the bank is to find out potential processes, practices and activities which could lead to a risk and to find ways of risk assessment and control. In order to do that Risk Control Self Assessment (RCSA) is used.

Our Op risk management team identifies ‘Server Outage’ as one of the KRIs which could lead to potential loss.  The head of ERM department needs to develop a model which explains how this KRI affect the operational risk capital estimation

We start with finding out data regarding server outages from the IT department. We are able to find a list which has Server Outage details on daily basis, for the last 100 months along with the duration or downtime for the server.

Server Outage
Date Outage Downtime Duration ( HH : MM)
11-Jan-12000:00
12-Jan-12001:00
13-Jan-12100:30
14-Jan-12000:00
15-Jan-12101:30
16-Jan-12000:00
17-Jan-12000:00
18-Jan-12102:00
19-Jan-12000:00
20-Jan-12101:00
21-Jan-12000:00
22-Jan-12100:45
23-Jan-12000:00
24-Jan-12100:15
25-Jan-12100:25
26-Jan-12000:00
27-Jan-12000:00
28-Jan-12000:00
29-Jan-12100:05
30-Jan-12102:00
31-Jan-12101:00

Table 1:  Operational risk – Server Outage Record along with Down Time Duration 

Relation between KRI and loss event

We understand that server outage can lead to a risk event, but do not find a direct relation with loss itself. Keeping this in his mind we look for a KRI which we have identified and an event which could be associated with that KRI and can lead to a loss.

We looks into different processes across the RCSA entities and find that failed (incomplete) transactions that are not caught, identified, reported and fixed in time can be a possible event which can be related to server outage. In order to confirm our initial assessment we look for data related to failed (incomplete) transactions from the treasury trading and sales department on daily basis. The reason for using treasury department data set and servers is that unlike core banking and ATM network failures, we actually have a closed system within which the impact of a failed transaction can be easily quantified in dollar terms.

Failed Transactions
Date Failed Transactions Number
11-Jan-1200
12-Jan-12111
13-Jan-1200
14-Jan-12118
15-Jan-1215
16-Jan-1200
17-Jan-1218
18-Jan-1200
19-Jan-1218
20-Jan-1200
21-Jan-1200
22-Jan-12112
23-Jan-1200
24-Jan-12113
25-Jan-12114
26-Jan-1200
27-Jan-1200
28-Jan-12116
29-Jan-1219
30-Jan-1200
31-Jan-1200

Table 2: Failed Transactions

In this table 0 and 1 denotes occurrence of fail transactions for a given day. In order to develop a relationship between the failed transactions and server outage, we need to see what the probability of failed transactions occurring is given that there is a server failure or outage.

To calculate the conditional probability, we build another table which gives both failed transactions and server outage over a span of sixty day. Another column is added which is called the ‘joint failure’ column and takes the value 1 when there is a failed transaction as well as server outage and takes value 0 if none of the event takes place. The table is given on the next page.

With the help of the given table we can compute the probability of failure of transaction given there is a sever outage or failure.

The total number of possible outcomes:  60 (i.e. no. of days)

No of days when there were failed transactions (FT): 35

No of days when there were server failure/outages (SF): 28

No of days when both (FT) and (SF) occurred: 17

The probabilities for above events can be found by dividing each one of them with total possible outcomes which give us .58, .46, .28 for FT, SF and FT SF combined respectively.

Date Failed Transactions Outage Joint
10-Feb-12000
11-Feb-12000
12-Feb-12100
13-Feb-12010
14-Feb-12100
15-Feb-12111
16-Feb-12000
17-Feb-12100
18-Feb-12010
19-Feb-12100
20-Feb-12010
21-Feb-12100
22-Feb-12111
23-Feb-12000
24-Feb-12111
25-Feb-12111
26-Feb-12000
27-Feb-12100
28-Feb-12010
29-Feb-12111
01-Mar-12000
02-Mar-12111
03-Mar-12010
04-Mar-12111
05-Mar-12000
06-Mar-12100
07-Mar-12111
08-Mar-12000
09-Mar-12111
10-Mar-12100

 

Date Failed Transactions Outage Joint
 11-Jan-12000
12-Jan-12100
13-Jan-12010
 14-Jan-12100
15-Jan-12111
16-Jan-12000
17-Jan-12100
18-Jan-12010
19-Jan-12100
20-Jan-12010
21-Jan-12000
22-Jan-12111
23-Jan-12000
24-Jan-12111
25-Jan-12111
26-Jan-12000
27-Jan-12000
28-Jan-12100
29-Jan-12111
30-Jan-12010
31-Jan-12111
01-Feb-12010
02-Feb-12010
03-Feb-12100
04-Feb-12100
05-Feb-12111
06-Feb-12100
07-Feb-12111
08-Feb-12100
09-Feb-12100

Table 3:  Table with Transaction failure and Server Failure used for Estimation of conditional probability

Once the marginal probabilities for all three events are known, we find the probability of transaction failure given that there were server failure/outages. This can be found by the conditional probability formula which is given as:-

JointDistribution-1

Writing in terms of SF and FT and calculating:-

OpRisk-JointDist-2

This shows that 61pec Transaction failures are due to Server Outage or failure.

Risk Scenario and Loss Data

Now that we established a link we now need to quantify the losses in dollar terms.

After spending some time, we find the loss data associated with failed transaction on daily basis. Within the treasury system failed transactions take many forms of loss. Two such instances are highlighted below:

1) A failed limit update leads to a counterparty limit breach which is caught at day end and leads to the day end process being delayed. The delay leads to a vendor support call which gets charged out at the out of office overtime rate of the vendor as well as extended operating time at the treasury desk as well as the IT support group.

2) A failed transaction post leads to incorrect execution of a market transaction that has to be rolled back and re-executed at different market rates than committed to a counterparty leaving the bank with the mark to market loss on both ends of the transaction (roll back as well as re-exeuction).

The table of loss data collected from the treasury system follows:

Failed Transactions
Date Failed Transactions NumberLoss Data  ($)
11-Jan-12000
12-Jan-121115600
13-Jan-12000
14-Jan-121188000
15-Jan-1215 3000
16-Jan-12000
17-Jan-12184500
18-Jan-1200 0
19-Jan-12184300
20-Jan-12000
21-Jan-12000
22-Jan-121126200
23-Jan-12000
24-Jan-121137300
25-Jan-121147300
26-Jan-12000
27-Jan-12000
28-Jan-121167700
29-Jan-12194800
30-Jan-12000
31-Jan-12000

Table 4: Loss Data for Failed Transactions

It can be seen that the loss data varies on daily basis and its value depend on the number (frequency) of failed transactions.  Based on the conditional probability we found earlier we can find the average value of losses due to server outages over a period of month. This gives us an estimate of the loss amount which is due to the server outage/failure.

By following the same approach Mike we make monthly loss data table for the last 100 month based on the daily data.

Capital Estimate and Observations

The Capital estimation can be easily done based on the loss data we have and the loss distribution approach can be used.

A histogram for the loss data that is acquired for 100 months is given below

Oprisk-Histogram-LossData-1

Figure 1: Loss data for the 100 months period represented by a histogram

The Cumulative Distribution Function (CDF) for the data set. From the histogram it can be seen that the data we have represent a normal distribution. To further investigate this we can have a probability plot which can be made through regression analysis in excel.  The plot probability plot is given below.

OpRisk-LossDataPlot

Figure 2: Normal Probability Plot for the loss data

From the probability plot it can be seen that, even though there are some outliers but it is safe to assume that normal probability distribution is a good fit for our data set.

References:

  1. Jan Lubbe and Flippie Snyman, “The advance measurement approach for banks”.
  2. Nigel Da Costa Lewis: “Operational Risk with Excel and VBA”, John Wiley & Sons .
Comodo SSL