Ransomware continues its steep rise. Regardless of the underlying reason, the frequency and intensity of natural disasters seem to be rising as well. Not long ago, there was no such thing as a public safety power shutoff. Now they are commonplace in a fire season that is approaching year-round. Accidental and intentional insider threat is a constant concern, especially with reduced staffing making separation of duties and two-hands changes difficult or impossible.
The information systems that the university depends on to carry out its mission must be resilient. The data they hold must be protected from natural and man-made disasters. A central feature of all disaster recovery is backups.
In modern cloud environments, backups may consist of the replication of data across geographic zones. Properly built, a system deployed across geographic zones can be highly available and resilient to natural disasters. They may be susceptible to insider threats and cyberattacks. The age and role for backups haven’t yet passed.
UCOP Policy
UC has a policy governing disaster recovery. You can find a copy of the IS-12 policy on the UCOP security website. UCSB will be rolling out this policy beginning in fiscal year 2022.
Until the rollout, you must have a backup plan for all data classified as P3 or above and for all systems classified as A3 and above. Remember, you must impose recovery requirements on third-party service providers unless you manage and backup the authoritative data. These recovery requirements must be written into the statement of work, included as part of a documented feature set, or included in contractual language.
Recovery Time Objective (RTO)
The policy requirements notwithstanding, there are things you should be incorporating now into your backup plans. First, the backup strategy should take into account the Recovery Time Objective (RTO.) That figure is simply a measure of how fast you can restore the operation of a system if you need to recover it from backup. For natural disasters, this time should include sourcing and configuring backup infrastructure. The required provisioning may be as simple as provisioning a new VM in a cluster or a cloud service. New hardware will take longer.
Regardless, operational requirements should drive the RTO. For example, teaching systems must be available to support the educational mission. How long can critical resources like Gauchospace be down? How long can GOLD be down during registration?
Recovery Point Objective (RPO)
The second consideration is Recovery Point Objective (RPO.) Simply put, this is answering the question about how much data you’re willing to lose if a disaster occurs. In its simplest form, if you take backups every 24 hours, the RPO is 24 hours; you can lose up to 24 hours worth of work if you have to restore from backups. How often you take backups should be driven by the RPO.
Backup Security: Availability
Backups themselves can suffer security problems. The first problem is availability. Where are the backups kept, and can you get to them? If housed in AWS Glacier, can the time it takes to recover and transmit them over the network meet your RTO?
Backup Security: Integrity
The second issue that plagues IT teams is integrity. Everybody makes backups, but not everybody tests them. A backup that can’t be read or is corrupt can prohibit recovery. Backups, whether to the cloud or a device, must be periodically tested. You must test backup recovery any time there are any changes to the backup process or technology, no matter how small they are.
Backup Security: Confidentiality
The third issue is confidentiality. IS-3 policy requires that all data classified as P3 and above be encrypted in transit and at rest. For backups, that means that the backup must be encrypted or reside on encrypted media. Many managed backup solutions, such as Rubrik and Cohesity, provide this capability. Many tape solutions provide encryption capabilities. For small backups, encrypted removable drives are available. Of course, most cloud solutions offer encryption in transit and at rest. Make sure that whatever solution you select includes encryption capability.
Be cautious of one thing, however: encryption requires the proper management of encryption keys. If encryption keys become lost or corrupt, you have an availability problem on your hands. If you can’t read a backup, it isn’t a backup. Test your key management along with the ability to read media and transfer data to the destination.