Infrastructure as Code (IaC) is a process that automates the provisioning and management of cloud resources. IaC software takes some input scripts describing the desired state and then communicates with the cloud vendor(s), typically through APIs, to make the reality match that desired state.
This article will cover the important aspects of IaC, starting with how it came to life (i.e., which problems it solved), followed by its benefits, and finally how to integrate IaC into your organization.
Once upon a time, when a business wanted to run software, its only option was to order some physical equipment and internet access from a network provider. These were on-site data centers, where companies had to order servers and networking equipment weeks or even months in advance based on anticipated traffic and then manually provision them on-site. This required a physical location with cooling systems and countless hours to perform installations and maintenance operations.
But then, public data centers came along that could manage the servers of other businesses.
Operating data centers became a viable business on its own, with great advantages for clients:
The advent of virtualization brought yet another evolution: the cloud. In a public (or private) cloud, the physical equipment is located in the data center of the cloud vendor, which still requires manual handling. Virtual servers became available to businesses through web interfaces, allowing them to provision servers and other resources in a matter of seconds (or minutes for the largest resources). At this stage, although virtualization allowed for very fast provisioning, most operations were still manual.
A final development came with the advent of IaC concepts and tools. Once the cloud was accessible through an API, the provisioning and management of resources could be handled by scripts and automated tools rather than humans. So now, once the physical equipment is installed and connected (still manual operations), everything else can be automated, including the provisioning of all virtual hardware resources.
The ability to programmatically access public clouds allowed for the rise of IaC. Before the advent of IaC, system engineers had to manually go through web interfaces to provision and configure resources. With IaC, the provisioning and configuration of resources are described in scripts, which are read by tools that communicate with the public cloud API to make sure reality matches the desired state.
As mentioned above, IaC tools use input from scripts; these scripts are written by humans and describe a desired state for the given cloud resources. The tools communicate with the cloud vendor through its APIs to create, update, or delete resources so that the reality matches the desired state described in the input scripts. Compared to manual provisioning and configuration, IaC offers a single source of truth (the input scripts), thus eliminating most human error.
Running an IaC script is a repeatable operation, which will produce the exact same result every time. This can help in many ways, for example to:
The IaC scripts can be saved in a git repository, giving you a history of your infrastructure. As an added bonus, since the scripts are just text, it is possible to compare versions to see what has been added, changed, or removed.
Another benefit is that IaC allows a junior sysadmin or non-technical person to create an entire workload without technical knowledge. If you configure your cloud account properly, you can even allow a user with limited permissions to create such a workload through IaC tools, even if the user does not have the rights to create the resources directly. You can also leverage additional tooling and templates to ensure security policies are implemented earlier to limit the chance for security leakages and misconfigurations by whomever instantiates the IaC stack.
Besides exact repeatability, one of the best advantages of IaC over manual operations is that it is scalable. Indeed, you only need to write the IaC scripts once, and workloads can then be instantiated as many times as you want, nearly instantaneously. Finally, by spending more time to craft the right permissions into your IaC scripts, the typical downside of manual work granting too many permissions to roles and resources can be avoided.
Typically, you would want to automate some operations you are currently performing manually. So the first step is for you to document the manual steps required to build the infrastructure needed for your workload. These are the steps you will automate through IaC.
You then need to choose an IaC software. This should not be a difficult choice, as there are only a few, and all three major cloud providers have their own: Amazon Web Services offers CloudFormation, Microsoft Azure offers Azure Resource Manager, and Google Cloud Platform offers Google Cloud Deployment Manager. The most well-known option that is vendorindependent is Terraform, which not only supports the three cloud vendors mentioned above but many more.
Next, you will need to write some scripts for your IaC tool of choice to reproduce the manual steps you documented. It is usually a good idea to test these as you go along. In other words: Write a bit of IaC code, deploy it, test it, and when you are satisfied it looks good, move on to the next bit of code. If you write everything in one go, major flaws can arise in your code that you will only discover after many hours of work, meaning you might have to rewrite a significant portion of your scripts.
In addition, the topic of Shift Left continues to trend. This essentially means you start testing as early as possible and focusing on preventing problems (as opposed to detecting and solving them after they occur). The idea is that overall quality and security will improve as a result.
Ideally, this Shift Left should leverage automation as much as possible. Indeed, there are various tools available to automate certain aspects of writing IaC scripts, like security and compliance. These tools would scan the code before any deployments to reduce the incidence of problems such as misconfigurations, overly permissive settings, and known vulnerabilities. Some use cases in regard to this are available here for your perusal.
In order to properly use IaC, the person writing the IaC scripts must have a deep knowledge of the cloud platform being used. Therefore, it is advisable to ensure that the most critical parts of your IaC work are done by senior DevOps engineers.
It is usually a good idea to have a team of senior DevOps engineers (or at least one) in charge of leading your IaC effort. This team will be able to focus on best practices and security and will thus provide a blueprint for more junior engineers to follow. It will also be able to write generic modules that can be reused across IaC scripts within your organization, providing pre-vetted building blocks readily available to more junior engineers.
If tight security is important and it most likely is this team can be responsible for vetting publicly available modules and software as well. It is quite easy to find some IaC modules online; Terraform even has an official repository of these. However, such publicly available modules might not conform to the security standards applied within your organization or project. It is therefore important to ensure only vetted modules are used by your IaC team.
Additionally, it would be a good idea for your SecOps team to work alongside your DevOps team. Such cooperation will allow DevOps processes to be optimized in terms of security early in the project. Errors detected after a production deployment can be very costly, especially in terms of customer relationships. Ensuring high quality early in the process will go a long way toward avoiding any such disaster.
Although quite a recent development, IaC should nowadays be part and parcel of the provisioning strategy of any organization requiring cloud resources and should at a minimum be evaluated for inclusion within your teams. Whatever the size of your organization, you most probably want IaC to manage at least part of your workloads.