Blogs
LIMA Weekly Insights

Cloud resiliency - why it matters

Marcel Hylkema, Solution Architect at Group 2000

Date April 20, 20224
Author Marcel Hylkema
Read 6 Min
View of Earth from space at night, showing illuminated city lights scattered across the dark surface under a starry sky.

Nowadays, deploying services in the cloud has become common practice. Both public and private clouds are being applied, each with their own characteristics regarding scalability, security, privacy, reliability, costs, etc. This blog will focus on the resiliency of deployments in the cloud and how to take this into account when planning a deployment.

Resiliency
Resiliency is “the capacity to recover quickly from difficulties; toughness” or “the ability of a substance or object to spring back into shape; elasticity” (Oxford Languages). When we translate this to a service, we may think of recovery from e.g. system failures, network failures, and application failures. All failures that degrade the availability and useability of the services, may cause loss of revenues and may lead to not complying with service level agreements and governmental laws. In addition, recovery costs should also be taken into account. Keep in mind that data and the security of your data is your most precious asset. Hardware can be replaced relatively easily, applications are typically recoverable but lost or tampered data cannot be replaced.

Cloud resiliency
When thinking about cloud deployments, it is easy to assume that the provided resources are reliable and will always be available. Especially as cloud environments are typically provided by reputable companies or departments. Yet, although availability numbers of components and services are high (typically considerably higher than for regular hardware deployments), by default the deployed services are not protected against common outages without taking appropriate measures.

Resilient deployments take both availability and disaster recovery into account. Availability means the part of the time that your services are available and is expressed in a percentage figure of 99.x% with the associated amount of outage time per period. Disaster recovery implies that it is possible to restore the service after a major disaster. Hardware must be replaced, software re-installed and service data restored to recover your services, so it is crucial that software backups and service data are still available after a disaster happened.

Possible causes for outage include virtual machine switch-over, host failures, maintenance windows, loss of stored data, network failures, and power outages. Disasters comprise e.g. fire on a cloud premises or even natural disasters like earthquakes or floods at regional level. Other possible risks are security issues like unauthorized access, denial-of-service attacks, viruses, ransomware, and even planned software updates that may cause more outages than anticipated.

Resiliency strategy
Cloud service providers provide us with numerous options to prevent serious consequences of outages and hence limit the resulting costs. To develop a strategy that makes the services more resilient, it is necessary to follow a structured path:

First, identify and quantify the risks and the possible consequences. Look at which risks must be alleviated and which can be taken for granted; such in line with your business objectives. Also, take into account that not all risks are known and that an outage or disaster can come in an unexpected way.

Then look at available mitigation options and select the ones that fit best with the deployed services, the business objectives, and the operational practices. Cloud service providers have quite a few options available to mitigate the various risks and keep your services running. Some options fit better than others or are more costly so it’s necessary to have a good knowledge of the available options. It is good to include a disaster recovery plan as a final resource.

Implement and test the chosen resiliency strategies. Simulate various scenarios and make sure that the selected mitigation options work as expected. Document the strategy extensively and keep this up to date.

Finally, keep the strategies up to date and evaluate them. Cloud deployments and risks may change, so it is crucial to keep the resiliency strategy up with the actual deployment and insights. And regularly test the implemented strategy.

Conclusion
Cloud resiliency starts with knowing your business goals and then developing a strategy that supports the required resiliency. Keep a solid disaster recovery program at hand as your insurance policy and keep your data safe. The technical architecture determines the daily operations of the services, and the applied resiliency strategy determines its potential for continuity.

If you are interested in a better understanding of how we help our LIMA Data Retention or LIMA Lawful lntercept customers in moving to the cloud while not making comprises on the resiliency of the solution, please feel free to contact Group 2000. Or leave your information in the contact form, and our experts will call/email you back.

Marcel Hylkema
Solution Architect

Talk to our experts

Do you want to know more about our solutions, or do you have a question or an interesting case? Get in touch with one of our experts.