Terraform Best Practices for Enterprise Cloud Infrastructure
After years of working with Terraform across multiple enterprise environments, I've learned that while Infrastructure as Code (IaC) is powerful, it requires discipline and best practices to maintain at scale. Here are the key practices that have served me well in production environments.
1. Structure Your Code for Scale
Directory Organization
The foundation of maintainable Terraform code starts with proper organization:
terraform/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── storage/
├── shared/
│ ├── variables.tf
│ └── outputs.tf
└── policies/
└── security/
This structure separates concerns and makes it easy for team members to understand the codebase at a glance.
Use Modules Religiously
Every reusable piece of infrastructure should be a module. This isn't just about code reuse—it's about creating reliable, tested building blocks:
module "azure_vm" {
source = "../modules/compute/virtual-machine"
vm_name = var.vm_name
resource_group = var.resource_group
subnet_id = var.subnet_id
vm_size = var.vm_size
admin_username = var.admin_username
tags = local.common_tags
}
2. State Management Strategy
Remote State is Non-Negotiable
Never store Terraform state locally in production. Use remote backends with locking:
terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "terraformstateaccount"
container_name = "tfstate"
key = "prod.terraform.tfstate"
}
}
Environment Isolation
Each environment should have its own state file. This prevents accidental changes from affecting the wrong environment and allows for independent evolution.
3. Security First
Sensitive Data Handling
Never hardcode secrets in your Terraform files. Use Azure Key Vault or similar services:
data "azurerm_key_vault_secret" "db_password" {
name = "database-password"
key_vault_id = var.key_vault_id
}
resource "azurerm_sql_server" "main" {
name = var.sql_server_name
resource_group_name = var.resource_group_name
location = var.location
version = "12.0"
administrator_login = var.admin_username
administrator_login_password = data.azurerm_key_vault_secret.db_password.value
}
Least Privilege Access
Configure your Terraform service principal with minimal required permissions. Use custom roles when Azure built-in roles are too broad.
4. Version Control and CI/CD
Terraform Versioning
Pin your Terraform version and provider versions:
terraform {
required_version = ">= 1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
}
}
Automated Testing
Implement automated testing with tools like Terratest:
func TestTerraformAzureExample(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../examples/azure-vm",
Vars: map[string]interface{}{
"resource_group_name": "test-rg",
"vm_name": "test-vm",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
}
5. Documentation and Standards
Self-Documenting Code
Use meaningful variable names and descriptions:
variable "vm_size" {
description = "The size of the Virtual Machine. Must be a valid Azure VM size."
type = string
default = "Standard_B2s"
validation {
condition = contains([
"Standard_B1s", "Standard_B2s", "Standard_D2s_v3"
], var.vm_size)
error_message = "VM size must be a valid Azure VM size."
}
}
README Templates
Every module should have a comprehensive README with:
- Purpose and use cases
- Input variables
- Output values
- Usage examples
- Requirements and dependencies
6. Performance and Cost Optimization
Use Data Sources Wisely
Fetch existing resources rather than recreating them:
data "azurerm_subnet" "existing" {
name = "existing-subnet"
virtual_network_name = "existing-vnet"
resource_group_name = "existing-rg"
}
Implement Cost Controls
Use Azure Policy and Terraform to enforce cost controls:
resource "azurerm_policy_assignment" "vm_size_restriction" {
name = "restrict-vm-sizes"
scope = data.azurerm_subscription.current.id
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/cccc23c7-8427-4f53-ad12-b6a63eb452b3"
parameters = jsonencode({
listOfAllowedSKUs = {
value = ["Standard_B1s", "Standard_B2s", "Standard_D2s_v3"]
}
})
}
Common Pitfalls to Avoid
- Circular Dependencies: Design your modules to avoid circular references
- Over-Engineering: Start simple and add complexity only when needed
- Ignoring State: Always plan before applying, especially in production
- Provider Confusion: Be explicit about which provider version you're using
Conclusion
Terraform is incredibly powerful, but with great power comes great responsibility. These practices have helped me manage infrastructure across multiple Azure subscriptions and environments while maintaining security, reliability, and team productivity.
The key is to start with good practices from day one—it's much harder to refactor poor Terraform code than it is to write it correctly from the beginning.
What are your favorite Terraform best practices? Have you encountered any of these challenges in your infrastructure projects? Let me know in the comments or reach out on LinkedIn.