When Visa holds its post-incident review discussions into the outage on Friday that caused widespread ‘chaos’ for its customers across Europe, the company’s Board, CIO, CISO etc. should congratulate their technology and InfoSec teams for a brilliantly-executed disaster recovery plan.
When planning to mitigate technology risk, there are two generic risk tactics to be deployed and two main considerations to be made.
Tactic 1 – Reduce the Probability of a technology failure
In all fairness to Visa, most customers can’t recollect the last time their systems suffered a critical outage such as the incident of last Friday. In essence, Visa has successfully reduced the probability of an outage to once in every 15 years or so.
Tactic 2 – Reduce the Impact if or when the IT failure occurs
Once a risk materialises despite all best efforts, the next tactic is to minimise both the short-term and long-term impact to stakeholders. In VISA’s case, they clearly have millions of stakeholders. Were those stakeholders ‘inconvenienced’? Absolutely yes. Were the majority of customers impacted upon in the short or long-term? I’d say it could have been far worse for all of us.
With the Cybercrime industry worth a staggering $1.5tn per annum, and growing, most organisations should by now be aware that an ongoing technology outage markedly raises the prospects of cybercriminals exploiting any ensuing vulnerabilities in processes, people or systems.
When natural disasters strike, looters emerge. With TSBs outage, fraudsters and cybercriminals were quick to exploit the security vulnerabilities identified even as TSB’s IT teams worked hard to try to recover from the original systems failure; and here’s where VISA deserves kudos for their well-prepared and executed disaster recovery plan.
Two considerations to be made, when planning for the disaster recovery of technology systems are:
Do we ‘fail safe’, i.e. do we let users who are still able to use the system get on with doing so whilst we try and resolve the problem for the rest of the system’s users?’
Do we ‘fail secure’, i.e. do we shut off the system to everyone until we’re able to figure out what’s going on, fix things, and then give users access to the systems in a phased and controlled manner?’
Due to the heightened risk of cybercrime and fraud against users and firms during a technology outage – a CIO/CISO may choose to err on the side of caution and fail secure by shutting off all access rather than trade the short-term inconvenience of stakeholders for a prolonged period of fraud and financial discomfit against the same stakeholders. That’s what Visa did well.
We could all learn risk management lessons from VISA outage, as I always say that – at work and in life, we’re all risk managers.