Tag Archives: disaster recovery

Behind the Scenes: Architecting HC3

Like any other solution vendor, at Scale Computing we are often asked what makes our solution unique. In answer to that query, let’s talk about some of the technical foundation and internal architecture of HC3 and our approach to hyperconvergence.

The Whole Enchilada

With HC3, we own the entire software stack which includes storage, virtualization, backup/DR, and management. Owning the stack is important because it means we have no technology barriers based on access to other vendor technologies to develop the solution. This allows us to build the storage system, hypervisor, backup/DR tools, and management tools that work together in the best way possible.

Storage

At the heart of HC3 is our SCRIBE storage management system. This is a complete storage system developed and built in house specifically for use in HC3. Using a storage striping model similar to RAID 10, SCRIBE stripes storage across every disk of every node in a cluster. All storage in the cluster is always part of a single cluster-wide storage pool, requiring no manual configuration. New storage added to the cluster is automatically added to the storage pool. The only aspect of storage that the administrator manages is creation of virtual disks for VMs.

The ease of use of HC3 storage is not even the best part. What is really worth talking about is how the virtual disks for VMs on HC3 are accessing storage blocks from SCRIBE as if it were direct attached storage to be consumed on a physical server–with no layered storage protocols. There is no iSCSI, no NFS, no SMB or CIFS, no VMFS, or any other protocol or file system. There is also no need in SCRIBE for any virtual storage appliance (VSA) VMs that are notorious resource hogs. The file system laid down by the guest OS in the VM is the only file system in the stack because SCRIBE is not a file system; SCRIBE is a block engine. The absence of these storage protocols that would exist between VMs and virtual disks in other virtualization systems means the I/O paths in HC3 are greatly simplified and thus more efficient.

Without our ownership of both the storage and hypervisor by creating our own SCRIBE storage management system there is no storage layer that would have allowed us to achieve this level of efficient integration with the hypervisor.

Hypervisor

Luckily we did not need to completely reinvent virtualization, but were able to base our own HyperCore hypervisor on industry-trusted, open-source KVM. Having complete control over our KVM-based hypervisor not only allowed us to tightly embed the storage with the hypervisor, but also allowed us to implement our own set of hypervisor features to complete the solution.

One of the ways we were able to improve upon existing standard virtualization features was through our thin cloning capability. We were able to take the advantages of linked cloning which was a common feature of virtualization in other hypervisors, but eliminate the disadvantages of the parent/child dependency. Our thin clones are just as efficient as linked clones but are not vulnerable to issues of dependency with parent VMs.

Ownership of the hypervisor allows us to continue to develop new, more advanced virtualization features as well as giving us complete control over management and security of the solution. One of the most beneficial ways hypervisor ownership has benefited our HC3 customers is in our ability to build in backup and disaster recovery features.

Backup/DR

Even more important than our storage efficiency and development ease, our ownership of the hypervisor and storage allows us to implement a variety of backup and replication capabilities to provide a comprehensive disaster recovery solution built into HC3. Efficient, snapshot-based backup and replication is native to all HC3 VMs and allows us to provide our own hosted DRaaS solution for HC3 customers without requiring any additional software.

Our snapshot-based backup/replication comes with a simple, yet very flexible, scheduling mechanism for intervals as small as every 5 minutes. This provides a very low RPO for DR. We were also able to leverage our thin cloning technology to provide quick and easy failover with an equally efficient change-only restore and failback. We are finding more and more of our customers looking to HC3 to replace their legacy third-party backup and DR solutions.

Management

By owning the storage, hypervisor, and backup/DR software, HC3 is able to have a single, unified, web-based management interface for the entire stack. All day-to-day management tasks can be performed from this single interface. The only other interface ever needed is a command line accessed directly on each node for initial cluster configuration during deployment.

The ownership and integration of the entire stack allows for a simple view of both physical and virtual objects within an HC3 system and at-a-glance monitoring. Real-time statistics for disk utilization, CPU utilization, RAM utilization, and IOPS allow administrators to quickly identify resource related issues as they are occurring. Setting up backups and replication and performing failover and failback is also built right into the interface.

Summary

Ownership of the entire software stack from the storage to the hypervisor to the features and management allows Scale Computing to fully focus on efficiency and ease of use. We would not be able to have the same levels of streamlined efficiency, automation, and simplicity by trying to integrate third party solutions.

The simplicity, scalability, and availability of HC3 happen because our talented development team has the freedom to reimagine how infrastructure should be done, avoiding inefficiencies found in other vendor solutions that have been dragged along from pre-virtualization technology.

How Important is DR Planning?

Disaster Recovery (DR) is a crucial part of IT architecture but it is often misunderstood, clumsily deployed, and then neglected. It is often unclear whether the implemented DR tools and plan will actually meet SLAs when needed. Unfortunately it often isn’t until a disaster has occurred that an organization realizes that their DR strategy has failed them. Even when organizations are able to successfully muddle through a disaster event, they often discover they never planned for failback to their primary datacenter environment.

Proper planning can ensure success and eliminate uncertainty, beginning before implementation and then enabling continued testing and validation of the DR strategy, all the way through disaster events. Planning DR involves much more than just identifying workloads to protect and defining backup schedules. A good DR strategy include tasks such as capacity planning, identifying workload dependencies, defining workload protection methodology and prioritization, defining recovery runbooks, planning user connectivity, defining testing methodologies and testing schedules, and defining a failback plan.

At Scale Computing, we take DR seriously and build in DR capabilities such as backup, replication, failover, and failback to our HC3 hyperconverged infrastructure.  In addition to providing the tools you need in our solution, we also offer our DR Planning Service to help you be completely successful in planning, implementing, and maintaining your DR strategy.

Our DR Planning Service, performed by our expert ScaleCare support engineers, provides a complete disaster recovery run-book as an end-to-end DR plan for your business needs. Whether you have already decided to implement DR to your own DR site or utilize our ScareCare Remote Recovery Service in our hosted datacenter, our engineers can help you with all aspects of the DR strategy.

The service also includes the following components:

  • Setup and configuration of clusters for replication
  • Completion of Disaster Recovery Run-Book (disaster recovery plan)
  • Best-practice review
  • Failover and failback demonstration
  • Assistance in facilitating a DR test

You can view a recording of our recent webinar on DR planning here.

Please let us know how we can help you with DR planning on your HC3 system by contacting ScaleCare support at 877-SCALE-59 or support@scalecomputing.com.

3-Node Minimum? Not So Fast

For a long time, when you purchased HC3, you were told there was a 3 node minimum. This minimum of 3 nodes is what is required to create a resilient, highly available cluster. HC3 architecture, based on this 3 node cluster design, prevents data loss even in the event of a whole node failure. Despite these compelling reasons to require 3 nodes, Scale Computing last week announced a new single node appliance configuration.  Why now?

Recent product updates have enhanced the replication and disaster recovery capabilities of HC3 to make a single node appliance a compelling solution in several scenarios. One such scenario is the distributed enterprise. Organizations with multiple remote or branch offices may not have the infrastructure requirements to warrant a 3 node cluster. Instead, they can benefit from a single node appliance as a right-sized solution for the infrastructure.

Screen Shot 2016-07-18 at 2.06.52 PM

In a remote or branch office, a single node can run a number of workloads and easily be managed remotely from a central office. In spite of the lack of clustered, local high availability, single nodes can easily be replicated for DR back to an HC3 cluster at the central office, giving them a high level of protection. Deploying single nodes in this way offers an infrastructure solution for distributed enterprise that is both simple and affordable.

Another compelling scenario where the single node makes perfect sense is as a DR target for an HC3 cluster. Built-in replication can be configured quickly and without extra software to a single HC3 node located locally or remotely. While you will likely want the local high available and data protection a 3-node cluster provides for primary production, a single node may suffice for a DR strategy where you only need to failover your most critical VMs to continue operations temporarily. This use of a single node appliance is both cost effective and provides a high level of protection for your business.

Replication

Finally, although a single node has no clustered high availability, for very small environments the single node appliance can be deployed with a second appliance as a DR target to give an acceptable level of data loss and availability for many small businesses. The ease of deployment, ease of management, and DR capabilities of a full blown HC3 cluster are the same reasons to love the single node appliance for HC3.

Find out more about the single node appliance configuration (or as I like to call it, the SNAC-size HC3) in our press release and solution brief.

Screenshot 2016-07-13 09.34.07

Disaster Recovery and Backup Strategies for the SMB

When infrastructure (server or storage) fails in a traditional, physical environment, there is typically resulting downtime while a complex and lengthy recovery from backups is reconstituted.  In most cases, this requires time obtaining and setting up identical replacement hardware, then additional time to recover the operating system, applications and data from the backups. Continue reading