Let’s Talk About Cloud: Part 1
In a series of short posts, Dave Sloan, Chief Technology Officer in Microsoft’s Worldwide Public Sector team, addresses key issues in understanding the cloud and shares his perspectives from discussions with governments around the world.
Through the posts below, Dave Sloan, Chief Technology Officer in Microsoft’s Worldwide Public Sector team, outlines some of the differentiators that make genuine cloud so powerful, and makes the distinction clear between the hyperscale cloud and everything else.
This is the first of three articles on this topic. In his upcoming posts, Dave will cover considerations including cost, elasticity, and sustainability.
What makes a cloud?
In my conversations with government officials and agencies around the world, I’ve observed that we can be far too presumptuous when having discussions about cloud. Our customers have been lectured to for many years about the advisability and necessity of adopting cloud, but in too many cases decision makers have not been provided with the tools, education, and awareness to distinguish what distinguishes a hyperscale cloud. This can lead to ill-advised initiatives with disappointing outcomes around so-called “private clouds”. The cloud should not be an arbitrary title bestowed on an on-premise data center with some virtualization software installed. Cloud is a designation that should be earned by its transformative characteristics.
Managed Operations
Over time, on-premise IT management has required a limited pool of system administration talent to be spread across a broad set of individual data centers. Heterogeneous ecosystems of tools and processes yield diverse results in terms of monitoring and maintenance, and enterprises and agencies are forced to build IT management skills outside their core competencies or face information technology systems that repeatedly disappoint and may fail at precisely the wrong times. A lack of standardization definitionally leads to a lack of automation and optimization.
Choosing the hyperscale cloud outsources this competence to the standard, certified processes, trained workforce, and automated tools of a global provider with an extensive track record, with their entire business model resting on the success and steadiness of their management.
Reliability and availability
Reliability and availability fall in the category of “non-functional requirements”… things that nobody asks for that everyone needs. Public Sector is no different – both citizens and the public sector workforce have expectations that government services are up when they need them. Downtime has societal and financial implications.
On-premise systems too often attempt to engineer availability in after the fact, which can result in late additions of expensive and hard-to-manage failover and replication schemes.
Hyperscale cloud delivers high availability by default. Individual data centers incorporate close instrumentation and predictive modeling of equipment failure across a fleet of IT assets. This allows for accurate prediction of asset failure for drives, cables and chips so that spare capacity is on-hand or they can be replaced or refreshed before they fail. Automatic shifting of workloads ensures seamless operation when these failures inevitably occur. These capabilities are layered over redundant networking and power together with best-in-industry automated, standardized procedures that minimize human error. And that’s just in one data center! Zonal and multi-region architectures increase geographic diversity allowing for availability even during catastrophic failures, natural disasters, or times of conflict.
This reliability is not just a good idea, in the hyperscale cloud, it’s a contractual commitment. For critical applications where downtime can translate into lost opportunities or even lost lives, this makes hyperscale cloud the clear choice.
Scalability
One of the most challenging parts of designing systems is attempting to forecast utilization to properly manage your infrastructure. Conservative usage estimates artificially limit the growth and success of applications, tying their maximum utilization to an additional procurement and configuration cycle of hardware. Inflated usage estimates create wasteful budgets for wildly overprovisioned systems that go unutilized or dramatically underutilized. These usage estimates are notoriously inaccurate, especially for new systems. But even accurate estimates still suffer from the need to provision to the high watermark of utilization. For example, if you have a spiky or even highly seasonal workload – something like a school registration system, a vaccine drive, or systems around tax filing deadlines – then even if you properly provision your system for the day on which the system is busiest, you’re still saddled with the cost and maintenance of an underutilized system for the other 364 days of the year.
Everything I’ve described so far is a challenge of on-prem systems. In the hyperscale cloud, customers are granted access to a shared pool of interchangeable hardware that gives them the ability to operate as if they have infinite and instantaneous supply. This means that systems can be provisioning rapidly in anticipation or if there is evidence of increasing need, and deprovisioned as that need subsides. This can be done manually for foreseeable, scheduled events, or automatically in reaction to unexpected spikes. This not only reduces the unreliable guesswork that goes into systems planning, it also increases the resilience of the system, allowing for easy adjustments as those initial estimates are disproved. It prevents the commitment of resources based on crystal ball gazing, so that the cost curve tightly hugs the utilization curve, both in terms of new allocations and opportunities for savings.
Scalability allows system architects in the hyperscale to dramatically shorten development cycles, release solutions faster, avoid artificial limitations on successful applications, take the guesswork out of planning, build more resilient applications, and save money.
Compliance
Certification for cybersecurity standards is essential. Given today’s threat environment, deploying a compliant environment has never been more critical. Unfortunately, it’s also never been more costly. Certification schemes are proliferating both at the international and the national level. Even organizations with significant budgets struggle to justify the initial, continuing, and potentially perpetual investment to bring in experienced auditors, interact with them in an efficient fashion, and certify compliance to all the relevant standards. This applies even to organizations with great governance, yielding standardized configurations for homogeneous sets of hardware and software architecture… not a common description for an on-premise data center! The problem multiplies exponentially with one-off customized builds, a proliferation of frameworks, custom code, and patchwork documentation. Too often, the entire certification process takes longer, costs more than it’s intended to, and yields a long list of risks to be mitigated or accepted just to get it over the finish line.
Hyperscale cloud adoption changes the compliance game. By dedicating professionals to liaison with auditors, absorbing the cost into the cloud platform, standardizing the evidence and interactions with the controls, and adhering to a predictable and transparent cycle, cloud is the easy button for compliance. This is why the Microsoft Cloud has more than 100 certifications that customers can take advantage of just by onboarding on to the cloud platform. While client certification will likely still be necessary for the thin outer layer of customized software assets – a small percentage of overall IT holdings – by certifying a single common platform at the global level, maintaining a rich library of audited responses, and publishing certification results publicly, hyperscale cloud adoption ensures that all of those processes are taken care of automatically and perpetually, shifting the cost and complexity to the cloud service provider, where it belongs.
Agility
Those of us who grew up designing on-premise systems remember (without fondness) the arduous preparation that was needed before being able to pilot an application – long stretches of requirements collection, hardware procurement, data center configuration, software installation, and troubleshooting, all of which were necessary before an experiment could be run. You had to pick your experiment carefully, because the time and money that was invested in even running it once was significant. Small tweaks could be accommodated, but true alternative architectures weren’t viable to run.
Cloud creates agile environments where alternate architectures are available in minutes, and can be thrown away minutes later. Alternative tools are pre-installed and ready for use with zero configuration on the part of the developer, and can be integrated and discarded just as easily. Pay as you go means that no significant bills are run up by short term explorations, and multiple can be run in parallel. By reducing the cost of failure to virtually zero, a culture of experimentation and positive risk-tasking can be fostered in an organization, increasing responsiveness to requirements, and allowing for more thorough exploration of hypotheses, ultimately yielding superior solutions.
Redundancy and resilience
Cloud has completely changed the game in terms of expectations around the resilience of architectures. On-premise data centers are still focused on back-up policies, restore points, and mean time to recovery. In the meantime, the proliferation of hyperscale data centers, the broad adoption of availability zones, the ease of multi-region deployments, and the adoption of at-scale storage solutions which stores many copies of critical data across multiple failure-resistant frameworks has created a standard that on-premise can simply never meet, regardless of investment level. This means that cloud has become the default and best option for data sets which must be preserved at all costs, and applications that must remain resilient under any circumstances.
About the Center of Expertise
Microsoft’s Public Sector Center of Expertise brings together thought leadership and research relating to digital transformation in the public sector. The Center of Expertise highlights the efforts and success stories of public servants around the globe, while fostering a community of decision makers with a variety of resources from podcasts and webinars to white papers and new research. Join us as we discover and share the learnings and achievements of public sector communities.
Questions or suggestions?