article banner

Embedding operational resilience in change management

Paul Anthony Paul Anthony

When it comes to operational resilience, firms are often at their most vulnerable during times of change. Barbara Moldenhauer looks at how to manage change programmes to embed operational resilience requirements.

In the financial sector, operational resilience aims to restore service outages within pre-agreed tolerance levels, to reduce the potential for financial harm to consumers or the wider economy. The regulation applies to PRA and FCA regulated firms, and financial market infrastructure firms, and was prompted by high-profile service outages, predominantly caused by teething problems during IT upgrades. But as the saying goes, change is the only constant, so it’s essential that operational resilience is fully embedded into change management processes and procedures.

Most discretionary change programmes are transformative by nature, for example upgrading legacy platforms with new technologies or improving digital capabilities. These projects are often long overdue and reflect fundamental shifts in how the firm operates, making the business more robust and sustainable in the long term. This makes non-discretionary change programmes inherently higher risk, as they tend to look at the same critical functions that support key services under operational resilience.

Role of the project team

To embed operational resilience, firms must factor in the FCA, PRA and Bank of England’s requirements to the change programme, from inception to go-live. At the initiation stages, senior management should identify how the project will affect critical functions and the potential impact on key services. This includes making sure the operational resilience thresholds can be upheld while the project is underway and once it’s live. These points should be an integral part of the go/no go decision criteria.

Change programme delivery is generally separate from business as usual, and most interaction focuses on technology change management, service transition and training for the new systems. But embedding operational resilience relies on earlier interaction with business as usual teams and third parties, to make sure the project team fully understands the impact on critical functions and that the firm can continue to meet end-to-end resilience criteria.

Firms should also work with third parties on the new platform, to establish disaster recovery plans and potential substitutions to stay within the agreed thresholds in the event of an outage.

Role of non-functional testing

The success or failure of a project hangs on how it performs when it’s live. As the last hurdle to cross, testing can be make or break, and weak testing can undermine an otherwise well designed and well managed product.

Non-functional testing aims to assess the impact of a new platform or software, looking at anything that does not directly impact its functionality. It generally assesses the technological elements, looking for impacts on scalability, security or reliability, among other things. Testing should also explicitly look at the ability to stay within operational resilience thresholds, and take lessons learnt for other projects in the change portfolio.

The scope of non-functional testing often includes, but is not limited to, testing phases such as:

  • load, soak and stress tests – to review the performance of the new architecture under heavy usage and sustained activity, and find the limit for impaired output
  • penetration testing – to identify and begin to address security vulnerabilities
  • disaster recovery testing over key elements of the architecture – to help meet recovery time objectives and tolerance levels
  • testing to ascertain and demonstrate compliance with the Disability Discrimination Act.

Getting non-functional testing right

When looking at approaches to non-functional testing, it’s also important to consider the availability and suitability of test environments. This requires careful management and oversight, making sure engineers have adequate time for robust testing with access to properly configured test environments, as close to live scenario conditions as possible.

Not leaving enough time for testing is also a common pitfall with time often cut short to make up for earlier delivery delays. Robust testing will always identify bugs, and it is better to find them before the go-live date to reduce the potential for disruption or a service outage. Too little time can also sell the team short and put them under further time pressures to make the fixes, which can lead to more problems down the line.

Embedding operational resilience – what next?

Programme management relies on good communication and collaboration – and embedding operational resilience is really no different.

The business as usual teams can give a better understanding of the critical functions and their interdependencies, which will inform the project team's strategic approach, activities and testing.

Building contingency plans for new elements of architecture, as they are under development, will help firms stay within their operational resilience thresholds.

A mature testing model will include a variety of approaches to check the product can stay within the agreed threshold, including stress tests to establish the breaking point.

Finally, while operational resilience is regulatory requirement, applying the thought process more broadly can improve business continuity processes and help firms succeed in the long term.

Written by Barbara Moldenhauer.

Contact Paul Anthony for further information.

Change management delivery while working from home