4Matt Technology

Terraform for cloud cost governance

If you've come to this Hashicorp Terraform article, chances are you have DevOps (or Cloud architecture) responsibilities and you're wondering, “Why should I care about cloud cost optimization? That's not my responsibility, I'm not in governance or FinOps finance, right?” Wrong!

Cloud engineers and architects are becoming the new information technology finance managers as governance and/or finance teams begin to lose some of their direct control over new on-demand cloud infrastructure consumption models.

So the question a lot of our customers ask us about cloud governance is: What should be the processes, the team model and the technologies I can use to support this digital transformation that the cloud has accelerated?

Cloud Center of Excellence and Terraform

If you're interested in defining cloud center of excellence (CcoE) processes, or just defining roles and responsibilities for cloud governance and implementing a cost optimization process using hashiCorp Terraform Cloud, you're in the right article. Let's go together!

This guide-format article will provide:

  • A RASCI model assigning responsibilities to the team that manages and operates cloud costs;
  • A preview of how cloud cost management can be done within a Terraform Provisioning Workflow;
  • Planning recommendations for cloud cost management and predictability (forecast) with Terraform;
  • An introduction to using Terraform's cost estimating capabilities;
  • Instructions on how to integrate and use cost optimization tools from AWS, Azure, and GCP (Google Cloud Platform) cloud vendors, as well as third-party, with Terraform workflows;
  • Terraform policy examples using codes with Sentinel, enabling automatically blocking overspending using cost rules, instance types and tags.

Most of the features we cite in this article will focus on Terraform's paid functionality, such as Cost Estimation and Governance & Policy, but the key use case around cost optimization can only be achieved with the open source version.

A survey of unnecessary cloud spending

With the shift to consumption-based costs of infrastructure and operations; that is, using cloud service providers (CSPs) or simply cloud providers, you pay for what you use, but you also pay for what you provision and don't use.

If you don't have an ongoing governance and optimization process, then there's a high chance that your cloud spending is in excess, paying for the unnecessary.

A recent Densify survey of cloud spending found that:

  • 45% of organizations reported they were over budget for their cloud spend;
  • More than 55% of respondents are using complex manual processes or are simply not implementing actions and changes to optimize their cloud resources;
  • 34,15% of respondents believe they can save up to 25% of their cloud spending and 14,51% believe they can save up to 50%. The most impacting of the survey was that, 27,46% said: “I don't know”.

Why Cloud Engineers and Architects are Becoming the Financial Controllers of Cloud Costs

When moving to the cloud, most organizations have been thinking about basic governance models where a team, sometimes referred to as the Cloud Center of Excellence, focuses on areas such as strategy, architecture, operations, and cost.

Most of these teams contain a mix of technical experts in IT and cloud management and IT common finance areas. However, finance teams are primarily charged with cost planning, financial migration forecasting, and software contract optimization.

Due to financial pressures, they tend to say, "We need to control costs, savings, forecasts for upcoming periods, etc." but we do not have direct control over the use of costs.

It is the cloud engineers and architects who directly manage the infrastructure and costs and, in an even more risky way, the cloud brokers who “take care” of these costs.

In this case, we see a paradigm shift in Information Technology and financial governance where:

  • Cloud engineers and architects are no longer just responsible for operations, but also for costs;
  • Cloud engineers and architects now have the Cloud Management Platform (CMP) tools and capabilities to directly automate and manage cost controls;
  • Cost planning and forecast (future estimate) for the execution of Workflows in the cloud are not easily understood by the financial areas;
  • Traditional forms of financial budgeting and on-prem hardware demand planning (such as contract-based budgeting and Capex purchases) do not explain or control the complexity of costs in consumption-based (ie, cloud) models.

As a focus and for quick wins, IT finance needs to maintain control for the two main areas of cost reduction:

  • Pre-provisioning: Limited governance and control in the resource provisioning phase.
  • Post-provisioning: Limited governance and control in applying infrastructure changes to reduce costs.

In the following sections, we will define together best practices for: People, Processes and Technologies (platform) associated with managing cloud financial practices with Terraform.

To simplify things, we'll assume they have a cloud governance structure — that is, the Cloud Center of Excellence (CCoE) — who is responsible for managing overall cloud governance.

In this team that we have structured, there are four main roles:

  • IT Management (IT Management);
  • Finance (FinOps);
  • Cloud Engineering (composed of DevOps and Infrastructure and Operations);
  • Security (SecOps);

This model can be used as a baseline of roles and actions for this Cloud Governance team.

This is a model that we use, along with similar models like Software Asset Management (SAM), to define Cloud Center of Excellence roles and responsibilities for many organizations.

Planning, Optimization and Governance

How can cloud engineers and architects use Terraform at every level of the cloud cost management process to deliver value and minimize additional work?

To get started, see how the visualization illustrates Terraform's role in the cloud cost management lifecycle. Start at the top with the Planning or planning phase:

Terraform Planning
4Matt-Terraform-Planning

We now summarize the steps:

  • Start by identifying workloads that are moving to the cloud;
  • Create the Terraform configuration;
  • Execute the terraform plan to execute the cost estimate;
  • Perform the terraform apply for the provision of resources;
  • Once provisioned, the workloads will run and the CMP (cloud Management platform) tools will provide optimization recommendations;
  • Integrate a vendor's optimization recommendations into Terraform and/or your CI/CD pipeline;
  • Investigate and analyze optimization recommendations and implement Terraform sentinel policies for cost and security controls;
  • Update the Terraform configuration and execute the plan and apply;
  • Newly optimized and supported features are now provisioned.

Planning — Pre-Migration and Ongoing Cost Forecasting

Cloud migrations require a multi-point assessment to determine if it makes sense to move an application/workload to the cloud. The primary factors for the assessment are:

  • Architecture and Righsizing of the current environment;
  • Business Case;
  • Estimated cost for the change;
  • Ongoing budgeted/anticipated utilization costs for the next 1-3 years on average.

Since cloud engineers and architects are now taking on some of these responsibilities, it makes sense to use engineering tools to address them.

Terraform helps cloud engineers and architects take on these new responsibilities with cost estimation, which helps calculate the infrastructure costs of each provisioning performed based on the actual deployment plan.

Using Terraform configuration files as a standard definition of how the cost of an application/workflow is estimated, you can now use Terraform Cloud & Enterprise APIs to automatically provide financial information with estimated financial data in the cloud or use Terraform's user interface to provide direct financial access to costs.

By doing this, you can eliminate many processes that have not been optimized for cost reduction.

Planning recommendations:

  • Use terraform configuration files as the default cloud planning and cost forecasting definition in AWS, Microsoft Azure and GCP (Google Cloud Platform), and provide this information through Terraform API or role-based access controls within the user interface Terraform to provide a financial standard for automated provisioning workflow.

Note: Many organizations perform financial planning within Excel, Google Sheets, and Cloud Management Platform (CMP) tools. To connect and utilize the data within these above systems, we recommend using Terraform's Cost Estimates API to extract the data.

  • Use Terraform modules as standard units of defined infrastructure for cloud costing and demand planning.
    • Example: Define a standard set of modules for a standard Java application, so that module A + B + C = $X per month. We plan to move 5 Java apps this year. This can be a quick methodology to assess the potential costs of running applications before defining the actual Terraform configuration files.
  • Use Terraform to understand application/workload cost growth over time, ie cloud expansion and/or migration costs.
  • Try to structurally align the naming conventions of Terraform, Workspace, and Resource Naming Organizations with the budget/forecast process.

There is a guide to get started with Terraform Cost Estimation. Once enabled, when a Terraform plan is executed, Terraform will call the AWS, Microsoft Azure and/or GCP Cost Estimating APIs to present the estimated cost for that plan, which can be used according to your financial workflow . You can also export this estimate report as JSON.

Cost Estimating Example using Terraform

TerraForm Cost Estimation
4Matt -TerraForm-cost-estimation

Terraform Cost Estimating JSON API Load Example

Terraform Example Json
4Matt-Terraform-Example-Json

Be aware that the terraform Cost Estimate provides costs based on a view of the workspace.

If you need a more accurate workspace view with cross-workspace, you will need to leverage the terraform cost estimating API in conjunction with a reporting tool of your choice, such as Microsoft PowerBI.

Useful tools for visualizing costs across multiple clouds

HashiCorp Terraform has built a simple open source tool that can give you that taller, cross-cutting workspace view. The tool is called Tint and you can visit this blog post and creator Peyton Casper's GitHub repository GitHub repository to learn how to use it.

TerraForm Dashboard
4Matt-TerraForm-Dashboard

If you have complex reporting requirements, or you already have an existing enterprise reporting product (eg Microsoft BI, Tableau, etc.), HashiCorp Terraform's cost estimate data will also work with these solutions.

Optimization — Operationalizing and realizing continuous cost savings

Cloud Optimization is the ongoing practice of evaluating the cost-benefit ratio of current use of cloud infrastructure.

Cloud vendors (Amazon AWS, Microsoft Azure, GCP, Oracle Cloud, etc.) and other third-party tools might start with some optimization recommendations, but some organizations don't always take advantage of the recommendations, due to an automation gap.

Cloud engineers and architects don't always get involved with platforms and optimization topics, leaving them with no insight into how their work is doing in terms of cost impact.

Even in cases where they are involved with these CMP platforms, there is often a high level of manual intervention required to reduce costs.

Automating optimization insights along with the Provisioning Workflow

It is safe to say that the main CSPs and the vast majority of CMP (Cloud Management Platform) tools such as CloudHealth VMware and Snow Software Embotics, allow you to export optimization recommendations via an API or an alternative method (references: AWS,  Azure,  GCP).

In this guide, we'll focus on the most basic approach to automating optimization data ingestion, which will come directly from CSPs or third parties like Densify that maintain a HashiCorp Terraform Module. We will use Densify in the examples that follow.

There are also many hashicorp users who want to create their own Terraform providers for similar processes.

TerraForm Densify
4Matt-TerraForm-densify

The concepts and codes can be used as a template for your own deployments.

Each vendor provides a different set of recommendations, but they all provide insights into processing (computing), so let's focus on that.

Any insight you receive (eg compute and processing, storage, DB, etc.) may be consumed based on the pattern below.

Basic standards for optimization recommendations with terraform

To establish a mechanism for Terraform to access optimization recommendations, we present some common patterns:

  • manual work flow — Review of optimization recommendations from the provider portal and manual update of HashiCorp Terraform files. As there is no automation, this is not ideal, but a feedback loop for optimization can start from here.
  • Workflow of files — Create a mechanism where optimization recommendations are imported into a local repository through a scheduled process (usually daily).
    • For example, densify clients use a script to export recommendations into a densify.auto.tfvars file and this is downloaded and stored in a locally accessible repository. Then the function search Terraform is used to look for specific optimization updates that have been defined as variables.
  • API Workflow — Create a mechanism for optimization recommendations to be pulled directly from the vendor and stored within an accessible data repository using http functionality data_source from Terraform to run the dataset import reference.
  • Ticketing Workflow — This workflow is similar to file and API workflows, but some companies enter an intermediate step where optimization recommendations first go to an IT operation and service desk control system such as ServiceNow or Jira. Within these systems there is a workflow and built-in approval logic where a flag is set for approved changes and is passed as a variable to be consumed later in the process.

Code Optimization: HashiCorp Code Update Examples terraform

In any of these cases, especially if automation processes are implemented, it is important to keep key resource data as variables.

Optimization insight tools will provide a size recommendation for resources or services (ie compute, DB, storage, etc.). In this example, we will use computational resources, but the example is representative for everyone.

At a minimum, we recommend that you have three variables defined to perform the optimization update in Terraform with some basic logic: new_recommendations, current_fallback and resource_unique_id.

TerraForm Policies
4Matt-TerraForm-policies

As we mentioned, we will use Densify. You can find the millstonedDensify Terraform  via the Terraform Registry and the Repo Do Densify-dev on GitHub.

On Terraform's blog you will find several codes created, we recommend access and research.

Governance — Ensuring Future Cost Savings with HashiCorp Terraform

The last and critical component of the cloud cost management lifecycle is having policies created and defined to stop cost overruns and provide reports for cloud operation.

In Terraform, you can automate these reports with Sentinel, a policy like code framework built into HashiCorp Terraform for governance and policy (Sentinel can also be used in other HashiCorp products).

Cost Compliance as Code and Sentinel Policy as Code

Sentinel includes a domain-specific language (DSL) to write policy definitions that evaluate any and all data defined in a Terraform file.

You can use Sentinel to ensure that your provisioned resources are: secure, tagged, and within usage and cost restrictions.

For costs, Terraform customers implement policy primarily around three areas: (but remember, you're not limited to just these three):

Cost control areas:

  • The amount — Control the amount of expenses;
  • Size provisioned — Control resource size/use and rightsizing;
  • Lifetime — Control the lifetime (TTL) of the resource.

In all three areas, you can apply policy controls around resources such as Terraform workspaces for example: applications/workloads, environments: Production, Development & Approval, and tag policies to optimize resources and avoid expense unnecessary.

The following is an example of sentinel policy output when executing terraform plane. Let's focus on three policies:

  1. aws-global/limit-cost-by-workspace-type
  2. aws-compute-nonprod/restrict-ec2-instance-type
  3. aws-global/enforce-obligations-tags
Terraform Policy Check
4Matt-Terraform-policy-check

Sentinel has three runlevels: Advisory, Soft-Mandatory and Hard-Mandatory — see the provided link for definitions. The runlevel will dictate the workflow and resolution of policy violations.

If you are using Terraform Cloud for Business or Terraform Enterprise, users can interact with Terraform UI, CLI or API to fully integrate into their CD/CD pipelines for policy workflow control and into VCS systems like GitLab, GitHub and BitBucket for policy creation and management.

Cost Governance Code Examples Using Sentinel in terraform

In politics aws-global/limit-cost-by-workspace-type defined for this workspace (which can be individually or globally defined) we apply monthly spending limits and an execution level.

You can see excerpts in the link which show cost limiters (US$ 200 for development, US$ 500 for QA, and so on).

The run level as soft mandatory, which means administrators can override policy failures if there is a legitimate reason, but that will prevent most users from spending up to that amount.

The path to cloud cost management

As organizations increasingly use cloud infrastructure, the DevOps philosophy can no longer be ignored. As silos between developers and operators break down, silos can also break down between finance and engineering/architecture.

Cloud engineers and architects are much more empowered to deploy the infrastructure they need right away. This means more responsibility to control these costs themselves. Technologies like Terraform and Sentinel give engineering the automated, finance-monitored workflows they need to manage costs and recover unused resources — all within the tool most of them already use.

This helps organizations avoid a complicated and cross-platform approach while avoiding the chaos and waste of Shadow IT.

Source: https://www.hashicorp.com/blog/a-guide-to-cloud-cost-optimization-with-hashicorp-terraform

Tag: terraform enterprise, terraform plan, terraform cli, google cloud, infrastructure as code, cloud infrastructure, google cloud platform, public key, virtual network, hashicorp configuration language, terraform by hashicorp, oracle cloud, security analysis, need to create, code with terraform, human readable, maintf file, cloud providers, environment variables, aws s3, execution plan, let's create, aws resource, write code, safely and predictably, declarative configuration, microsoft azure, cloud platform, configuration files, visual studio, databases, infrastructure as code, virtual machines, high availability, configuration language, terraform for provisioning, declarative configuration files, terraform cloud business tier, cloud providers, cloud provider, infrastructure as code, azure devops, Cloud Center Of Execellence, Software Asset Management, infrastructure as code, provision infrastructure, code solution, command line, github actions, ambi production entity, via terraform, execution plan, modules from the registry, terraform can help, infrastructure versions, apis into declarative configuration, generate a plan, infrastructure integrations, command plan, infrastructure as a solution, access control, private clouds, infrastructure module, running terraform, typical workflow, infrastructure management, define infrastructure, vmware vsphere, aws resources, config files, what are your applications, terraform vs, best practices, module registry, various clouds, cdk for terraform, terraform plan, terraform registry, data sources, contact us, terraform init, resource dependencies, terraform module, hashicorp cloud platform, aws infrastructure, version control, ibm cloud, terraform associate, resource graph, terraform enterprise, google cloud, infrastructure as code , cloud providers, lifecycle, hashicorp configuration language, infrastructure they needed, infrastructure in cloud, are using, terraform by hashicorp, oracle cloud, safely and predictably, execution plan, cloud infrastructure, google cloud platform, code with terraform, databases, microsoft azure, declarative configuration files

Related Posts