Tag Archives: disaster recovery

Backup is No Joke

Today is World Backup Day and a reminder to everyone about how important it is to backup your data. Why today? What better day than before April Fools Day to remember to be prepared for anything. You don’t want to be the fool who didn’t have a solid backup plan.

But what is a backup? Backing up business critical data is more complex than many people realize which may be why backup and disaster recovery plans fall apart in the hour of need. Let’s start with the basic definition: A backup is a second copy of your data you keep in case your primary data is lost or corrupted. Pretty simple. Unfortunately, that basic concept is not nearly enough to implement an effective backup strategy.  You need some additional considerations.

  1. Location – Where is your backup data stored? Is it on the same physical machine as your primary data? Is it in the same building? The closer your backup is to the primary data, the more chance your backup will suffer the same fate as your primary data. The best option is to have your backup offsite, physically removed from localized events that might cause data loss.
  2. Recovery Point Objective – If you needed to recover from your backup, how much recent data would you lose? Was your last backup taken an hour ago, a day ago, or a week ago? How much potential revenue could be lost along with the data you can’t recover? Taking backups as frequently as possible is the best way to prevent data loss.
  3. Recovery Time Objective – How long will it take to recover your data? If you are taking backups every hour but it takes you several hours or longer to recover from a backup, was the hourly backup effective? Recovery time is as important as recovery point. Have a plan for rapid recovery.
  4. System Backup – For a long time, backups only captured user and application data. Recovery was painful because the OS and applications needed to be rebuilt before restoring the data. These days, entire servers are usually what is backed up, increasing recovery speed.
  5. Multiple Points in Time – Early on, many learned the hard way that keeping one backup is not enough. Multiple backups from different points in time were required for a number of reasons. Sometimes backups failed, sometimes data needed to be recovered from further back in time, and for some businesses, backups need to be kept for years for compliance. The more backups, the more points in time that data can be recovered from.
  6. Backup Storage – One of the greatest challenges to backup over the decades has been storage. Keeping multiple copies of your data quickly starts consuming multiples of storage space. It just isn’t economical to require 10x or more of the storage of your primary data for backup. Incremental backups, compression, and deduplication have helped but backups still take lots of space. Calculating the storage requirements for your backup needs is essential.

Are snapshots backups? Sort of, but not really. Snapshots do provide recovery capabilities within a local system, but generally go down with the ship in any kind of real disaster. That being said, many backup solutions are designed around snapshots and use snapshots to create a real backup by copying the snapshot to an offsite location. These replicated snapshots are indeed backups that can be used for recovery just like any other form of backup.

Over the decades, there have been a variety of hardware, software, and service-based solutions to tackle backup and recovery. Within the last decade, there has been an increasing movement to include backup and recovery capabilities within operating systems, virtualization solutions, and storage solutions. This movement of turning backup into a feature rather than a secondary solution has only been gaining momentum.

With the hyperconvergence movement, where virtualization, servers, storage, and management are brought together into a single appliance-based solution, backup and disaster recovery are being included as well. Vendors like Scale Computing are providing all of the backup and disaster recovery capabilities you need. Scale Computing even offers their own cloud-based DRaaS as an option.

So today, on the eve of April Fools Day, let’s remember that backup is no joke. Businesses rely on data and it is our job as IT professionals to protect against the loss of that data with backup. Take some time to review your backup plans and find out if you need to be doing more to prevent the next data loss event lurking around the corner.

Behind the Scenes: Architecting HC3

Like any other solution vendor, at Scale Computing we are often asked what makes our solution unique. In answer to that query, let’s talk about some of the technical foundation and internal architecture of HC3 and our approach to hyperconvergence.

The Whole Enchilada

With HC3, we own the entire software stack which includes storage, virtualization, backup/DR, and management. Owning the stack is important because it means we have no technology barriers based on access to other vendor technologies to develop the solution. This allows us to build the storage system, hypervisor, backup/DR tools, and management tools that work together in the best way possible.

Storage

At the heart of HC3 is our SCRIBE storage management system. This is a complete storage system developed and built in house specifically for use in HC3. Using a storage striping model similar to RAID 10, SCRIBE stripes storage across every disk of every node in a cluster. All storage in the cluster is always part of a single cluster-wide storage pool, requiring no manual configuration. New storage added to the cluster is automatically added to the storage pool. The only aspect of storage that the administrator manages is creation of virtual disks for VMs.

The ease of use of HC3 storage is not even the best part. What is really worth talking about is how the virtual disks for VMs on HC3 are accessing storage blocks from SCRIBE as if it were direct attached storage to be consumed on a physical server–with no layered storage protocols. There is no iSCSI, no NFS, no SMB or CIFS, no VMFS, or any other protocol or file system. There is also no need in SCRIBE for any virtual storage appliance (VSA) VMs that are notorious resource hogs. The file system laid down by the guest OS in the VM is the only file system in the stack because SCRIBE is not a file system; SCRIBE is a block engine. The absence of these storage protocols that would exist between VMs and virtual disks in other virtualization systems means the I/O paths in HC3 are greatly simplified and thus more efficient.

Without our ownership of both the storage and hypervisor by creating our own SCRIBE storage management system there is no storage layer that would have allowed us to achieve this level of efficient integration with the hypervisor.

Hypervisor

Luckily we did not need to completely reinvent virtualization, but were able to base our own HyperCore hypervisor on industry-trusted, open-source KVM. Having complete control over our KVM-based hypervisor not only allowed us to tightly embed the storage with the hypervisor, but also allowed us to implement our own set of hypervisor features to complete the solution.

One of the ways we were able to improve upon existing standard virtualization features was through our thin cloning capability. We were able to take the advantages of linked cloning which was a common feature of virtualization in other hypervisors, but eliminate the disadvantages of the parent/child dependency. Our thin clones are just as efficient as linked clones but are not vulnerable to issues of dependency with parent VMs.

Ownership of the hypervisor allows us to continue to develop new, more advanced virtualization features as well as giving us complete control over management and security of the solution. One of the most beneficial ways hypervisor ownership has benefited our HC3 customers is in our ability to build in backup and disaster recovery features.

Backup/DR

Even more important than our storage efficiency and development ease, our ownership of the hypervisor and storage allows us to implement a variety of backup and replication capabilities to provide a comprehensive disaster recovery solution built into HC3. Efficient, snapshot-based backup and replication is native to all HC3 VMs and allows us to provide our own hosted DRaaS solution for HC3 customers without requiring any additional software.

Our snapshot-based backup/replication comes with a simple, yet very flexible, scheduling mechanism for intervals as small as every 5 minutes. This provides a very low RPO for DR. We were also able to leverage our thin cloning technology to provide quick and easy failover with an equally efficient change-only restore and failback. We are finding more and more of our customers looking to HC3 to replace their legacy third-party backup and DR solutions.

Management

By owning the storage, hypervisor, and backup/DR software, HC3 is able to have a single, unified, web-based management interface for the entire stack. All day-to-day management tasks can be performed from this single interface. The only other interface ever needed is a command line accessed directly on each node for initial cluster configuration during deployment.

The ownership and integration of the entire stack allows for a simple view of both physical and virtual objects within an HC3 system and at-a-glance monitoring. Real-time statistics for disk utilization, CPU utilization, RAM utilization, and IOPS allow administrators to quickly identify resource related issues as they are occurring. Setting up backups and replication and performing failover and failback is also built right into the interface.

Summary

Ownership of the entire software stack from the storage to the hypervisor to the features and management allows Scale Computing to fully focus on efficiency and ease of use. We would not be able to have the same levels of streamlined efficiency, automation, and simplicity by trying to integrate third party solutions.

The simplicity, scalability, and availability of HC3 happen because our talented development team has the freedom to reimagine how infrastructure should be done, avoiding inefficiencies found in other vendor solutions that have been dragged along from pre-virtualization technology.

How Important is DR Planning?

Disaster Recovery (DR) is a crucial part of IT architecture but it is often misunderstood, clumsily deployed, and then neglected. It is often unclear whether the implemented DR tools and plan will actually meet SLAs when needed. Unfortunately it often isn’t until a disaster has occurred that an organization realizes that their DR strategy has failed them. Even when organizations are able to successfully muddle through a disaster event, they often discover they never planned for failback to their primary datacenter environment.

Proper planning can ensure success and eliminate uncertainty, beginning before implementation and then enabling continued testing and validation of the DR strategy, all the way through disaster events. Planning DR involves much more than just identifying workloads to protect and defining backup schedules. A good DR strategy include tasks such as capacity planning, identifying workload dependencies, defining workload protection methodology and prioritization, defining recovery runbooks, planning user connectivity, defining testing methodologies and testing schedules, and defining a failback plan.

At Scale Computing, we take DR seriously and build in DR capabilities such as backup, replication, failover, and failback to our HC3 hyperconverged infrastructure.  In addition to providing the tools you need in our solution, we also offer our DR Planning Service to help you be completely successful in planning, implementing, and maintaining your DR strategy.

Our DR Planning Service, performed by our expert ScaleCare support engineers, provides a complete disaster recovery run-book as an end-to-end DR plan for your business needs. Whether you have already decided to implement DR to your own DR site or utilize our ScareCare Remote Recovery Service in our hosted datacenter, our engineers can help you with all aspects of the DR strategy.

The service also includes the following components:

  • Setup and configuration of clusters for replication
  • Completion of Disaster Recovery Run-Book (disaster recovery plan)
  • Best-practice review
  • Failover and failback demonstration
  • Assistance in facilitating a DR test

You can view a recording of our recent webinar on DR planning here.

Please let us know how we can help you with DR planning on your HC3 system by contacting ScaleCare support at 877-SCALE-59 or support@scalecomputing.com.

3-Node Minimum? Not So Fast

For a long time, when you purchased HC3, you were told there was a 3 node minimum. This minimum of 3 nodes is what is required to create a resilient, highly available cluster. HC3 architecture, based on this 3 node cluster design, prevents data loss even in the event of a whole node failure. Despite these compelling reasons to require 3 nodes, Scale Computing last week announced a new single node appliance configuration.  Why now?

Recent product updates have enhanced the replication and disaster recovery capabilities of HC3 to make a single node appliance a compelling solution in several scenarios. One such scenario is the distributed enterprise. Organizations with multiple remote or branch offices may not have the infrastructure requirements to warrant a 3 node cluster. Instead, they can benefit from a single node appliance as a right-sized solution for the infrastructure.

Screen Shot 2016-07-18 at 2.06.52 PM

In a remote or branch office, a single node can run a number of workloads and easily be managed remotely from a central office. In spite of the lack of clustered, local high availability, single nodes can easily be replicated for DR back to an HC3 cluster at the central office, giving them a high level of protection. Deploying single nodes in this way offers an infrastructure solution for distributed enterprise that is both simple and affordable.

Another compelling scenario where the single node makes perfect sense is as a DR target for an HC3 cluster. Built-in replication can be configured quickly and without extra software to a single HC3 node located locally or remotely. While you will likely want the local high available and data protection a 3-node cluster provides for primary production, a single node may suffice for a DR strategy where you only need to failover your most critical VMs to continue operations temporarily. This use of a single node appliance is both cost effective and provides a high level of protection for your business.

Replication

Finally, although a single node has no clustered high availability, for very small environments the single node appliance can be deployed with a second appliance as a DR target to give an acceptable level of data loss and availability for many small businesses. The ease of deployment, ease of management, and DR capabilities of a full blown HC3 cluster are the same reasons to love the single node appliance for HC3.

Find out more about the single node appliance configuration (or as I like to call it, the SNAC-size HC3) in our press release and solution brief.

Screenshot 2016-07-13 09.34.07

Disaster Recovery and Backup Strategies for the SMB

When infrastructure (server or storage) fails in a traditional, physical environment, there is typically resulting downtime while a complex and lengthy recovery from backups is reconstituted.  In most cases, this requires time obtaining and setting up identical replacement hardware, then additional time to recover the operating system, applications and data from the backups. Continue reading